Regarding DS<->DS TCP slowness (and DSi XL hang)
Regarding DS<->DS TCP slowness (and DSi XL hang)
I have some findings in the DS Wifi layer that I'm currently trying to use with my homebrew game to communicate DS<->DS using tcp. I have experienced problems with slow communication and that the communication hangs after a while. As of now I have found out that the hang was caused by using a DSi XL. And that the main cause of the slowness was that all tcp-packets are always being replied to causing packet flooding for communication between two DS units.
Here are my current notes:
- The communication hang seems to have been caused by using a DS Lite/DSi XL combo when testing. When using two DS Lite it will work without any hangs. Still slow though. Tested the DSi flashcart in the DS Lite and it worked so the DSi XL is to blame.
- The game is supposed to send 16 bytes using tcp in both directions between units every 3:rd frame. Debug printouts of WSTAT_RXPACKETS/WSTAT_TXPACKETS reveals that there are a LOT (100+) of packets being sent and received between each of these 16 byte packets.
- The lag seems get longer the communication has been going on. Looking at printouts at game startup it looks like it starts with a few extra packets between the 16 bytes but the extra packets gets more and more up to 100+.
- After a lot of testing and debug printing the main source of the slowness problem was found to be that tcp-packets were always being replied to. This caused flooding of tcp-packets when using DS<->DS communication. When another tcp-layer is one of the endpoints imagine that this will not happen.
- On a sidenote, at first I thought that it was stuff like nagle and delayed ack that caused the slowness. While examining the inner workings of the tcp layer I discovered some limitations. I think that the update timer should be triggered more often than every 50ms. There are things like SGIP_TCP_TRANSMIT_DELAY set to 25ms which will not work as expected with current timers. I imagine fiddling with these parameters in the future, but for this to do any difference a 5ms-10ms timer would be nice. I will do some more testing.
What I did that made things work a lot better for me was a change at line 408 in sgIP_TCP.c From:
if(shouldReply || delta1>=0) { // send a packet in reply, ha!
To:
if(shouldReply || delta1>0) { // send a packet in reply, ha!
This will cause direct replies to tcp-packets only being sent when there is new payload data received. This change actually reverts a change introduced in revision 1507. The change comment was "Fixed bug causing lib to not re-ack data packets that were resent.". However, the correct solution for this problem can not be to ack every packet causing packet floods when the other side acts the same way. I think a solution that would fix also fix "re-ack of data resent data packets" would be to ack every packet with payload data; even if there was no *new* data. A check of datalen before it is adapted for already received data. I will do some testing with this approach.
I thought I would share my findings even though I will still continue to investigate my problems. The communication is still a bit too slow for my game. For my exchange of 16 bytes in both directions there are still 3 packets in both directions. I believe that it should be 2+2 packets, a data packet and a direct ack-reply in both directions. I also imagine to see if I can find some method of further reducing latency in message handling for my game. It is very possible that I will switch to UDP to avoid all tcp-quirks; as many games do.
Regards,
Bengt
Here are my current notes:
- The communication hang seems to have been caused by using a DS Lite/DSi XL combo when testing. When using two DS Lite it will work without any hangs. Still slow though. Tested the DSi flashcart in the DS Lite and it worked so the DSi XL is to blame.
- The game is supposed to send 16 bytes using tcp in both directions between units every 3:rd frame. Debug printouts of WSTAT_RXPACKETS/WSTAT_TXPACKETS reveals that there are a LOT (100+) of packets being sent and received between each of these 16 byte packets.
- The lag seems get longer the communication has been going on. Looking at printouts at game startup it looks like it starts with a few extra packets between the 16 bytes but the extra packets gets more and more up to 100+.
- After a lot of testing and debug printing the main source of the slowness problem was found to be that tcp-packets were always being replied to. This caused flooding of tcp-packets when using DS<->DS communication. When another tcp-layer is one of the endpoints imagine that this will not happen.
- On a sidenote, at first I thought that it was stuff like nagle and delayed ack that caused the slowness. While examining the inner workings of the tcp layer I discovered some limitations. I think that the update timer should be triggered more often than every 50ms. There are things like SGIP_TCP_TRANSMIT_DELAY set to 25ms which will not work as expected with current timers. I imagine fiddling with these parameters in the future, but for this to do any difference a 5ms-10ms timer would be nice. I will do some more testing.
What I did that made things work a lot better for me was a change at line 408 in sgIP_TCP.c From:
if(shouldReply || delta1>=0) { // send a packet in reply, ha!
To:
if(shouldReply || delta1>0) { // send a packet in reply, ha!
This will cause direct replies to tcp-packets only being sent when there is new payload data received. This change actually reverts a change introduced in revision 1507. The change comment was "Fixed bug causing lib to not re-ack data packets that were resent.". However, the correct solution for this problem can not be to ack every packet causing packet floods when the other side acts the same way. I think a solution that would fix also fix "re-ack of data resent data packets" would be to ack every packet with payload data; even if there was no *new* data. A check of datalen before it is adapted for already received data. I will do some testing with this approach.
I thought I would share my findings even though I will still continue to investigate my problems. The communication is still a bit too slow for my game. For my exchange of 16 bytes in both directions there are still 3 packets in both directions. I believe that it should be 2+2 packets, a data packet and a direct ack-reply in both directions. I also imagine to see if I can find some method of further reducing latency in message handling for my game. It is very possible that I will switch to UDP to avoid all tcp-quirks; as many games do.
Regards,
Bengt
Re: Regarding DS<->DS TCP slowness (and DSi XL hang)
I am thinking you are going to be much better off using UDP for such high frequency low bandwidth transmissions.
Re: Regarding DS<->DS TCP slowness (and DSi XL hang)
Yeah, you are probably right. At the first I simply wanted to get something up and running and selected TCP instead of UPD. I still imagine switching to UDP. But I think I will spend some time trying to optimize tcp-communication in dswifi for my game. After all it took some time to learn how things fit together and I think that others may benefit from this if I get lucky ...
Before the change above latencies at 1s or more was standard. After the change I more or less get the 50ms I need, on my local network. That latency was caused by 100+ empty packets playing ping-pong between my DS:es. I believe that it's possible to still get some more performance as I think that this TCP-layer does not do delayed acking and therefore should avoid a common delay-problem:
http://www.stuartcheshire.org/papers/NagleDelayedAck/
BTW; I do not do adhoc DS<->DS communication. The stuff goes through my router.
Before the change above latencies at 1s or more was standard. After the change I more or less get the 50ms I need, on my local network. That latency was caused by 100+ empty packets playing ping-pong between my DS:es. I believe that it's possible to still get some more performance as I think that this TCP-layer does not do delayed acking and therefore should avoid a common delay-problem:
http://www.stuartcheshire.org/papers/NagleDelayedAck/
BTW; I do not do adhoc DS<->DS communication. The stuff goes through my router.
Re: Regarding DS<->DS TCP slowness (and DSi XL hang)
dswifi does not handle direct DS to DS communication. There are also a lot off issues with TCP in dswifi and the author has been working on a new version for a while now (years maybe?). I am pretty sure naggle is disabled in dswifi and that packets are sent as soon as possible.
Re: Regarding DS<->DS TCP slowness (and DSi XL hang)
Aight, then I probably won't spend that much time trying to get more out of TCP before I start using UPD instead. Thanks for the info.
-
- Developer
- Posts: 10
- Joined: Fri Aug 12, 2005 5:13 am
- Location: Camping on an Oxygen atom
- Contact:
Re: Regarding DS<->DS TCP slowness (and DSi XL hang)
Yeah, there are certainly still some serious issues in the sgIP TCP implementation, though it does generally work reasonably well. I've been meaning to go back and rewrite it for quite some time now.
I'll try to take a closer look at this segment and see if there is a serious problem that can be fixed, It's not necessarily as straightforward as you describe (though, maybe it is.)
-Stephen
I'll try to take a closer look at this segment and see if there is a serious problem that can be fixed, It's not necessarily as straightforward as you describe (though, maybe it is.)
-Stephen
http://blog.akkit.org/ - http://wiki.akkit.org/ - Creator of DSWifi library - Authority on ARM ASM - Memorizer of DS Hardware Information
Re: Regarding DS<->DS TCP slowness (and DSi XL hang)
Ok, thx!
If you wait a week or so I will probably have a patch with some small changes for what works good for me and you can review it and see if it makes sense. One thing I am curious about is if the comment "Fixed bug causing lib to not re-ack data packets that were resent." to the original change still is valid. Is there perhaps some other timer-based mechanism that will cause this to work anyway? Anyhow; the method of acking all packages with payload (even if it is resent data) will probably work without causing packet floods. I will test this now.
I was wondering if the tcp-layer was written from scratch or if you adopted some other layer?
If you wait a week or so I will probably have a patch with some small changes for what works good for me and you can review it and see if it makes sense. One thing I am curious about is if the comment "Fixed bug causing lib to not re-ack data packets that were resent." to the original change still is valid. Is there perhaps some other timer-based mechanism that will cause this to work anyway? Anyhow; the method of acking all packages with payload (even if it is resent data) will probably work without causing packet floods. I will test this now.
I was wondering if the tcp-layer was written from scratch or if you adopted some other layer?
Re: Regarding DS<->DS TCP slowness (and DSi XL hang)
Well, it looks like replying to packets with payload works as good as my first change. This should also handle the re-ack of resent data case.
Some more comments:
- If I read the source correctly SGIP_TCP_TRANSMIT_IMMTHRESH=40 will avoid sending a packet directly until the send buffer contains at least 40 bytes. (At least if all received data already is acked.) This was verified to be the case when I sent my small messages, these were sent by the timer instead.
- The timer only being triggered every 50ms will cause the SGIP_TCP_TRANSMIT_DELAY=25ms to actually work as anything between 0 and 50ms. This further adds to my problem.
Something like setsockopt(TCP_NODELAY) would be nice to "force" sending small packets directly. However, I would say that lowering SGIP_TCP_TRANSMIT_DELAY to 5ms and the timer to 5ms as well would also work nice for me. This will cause a maximum delay before sending to ~1/3 of a frame. This solution would still concat consecutive small sends together for 5ms before sending the packet. Will this give too much CPU overhead? 10ms?
I have verified that setting SGIP_TCP_TRANSMIT_DELAY=5ms and calling Wifi_Update() every 5ms works better for my game. Almost there. I've also verified that increasing my message size to more than 40 bytes causing packets to be sent directly will also do the trick, slightly better.
I will do some more testing. I think that there may some limiting factor in my game now. It feels like it gets in lag "mode" every now and then. There is also the fact that 6 packets are used for each message exchange between the DS:es, I will look into this as well.
Regards,
Bengt
Some more comments:
- If I read the source correctly SGIP_TCP_TRANSMIT_IMMTHRESH=40 will avoid sending a packet directly until the send buffer contains at least 40 bytes. (At least if all received data already is acked.) This was verified to be the case when I sent my small messages, these were sent by the timer instead.
- The timer only being triggered every 50ms will cause the SGIP_TCP_TRANSMIT_DELAY=25ms to actually work as anything between 0 and 50ms. This further adds to my problem.
Something like setsockopt(TCP_NODELAY) would be nice to "force" sending small packets directly. However, I would say that lowering SGIP_TCP_TRANSMIT_DELAY to 5ms and the timer to 5ms as well would also work nice for me. This will cause a maximum delay before sending to ~1/3 of a frame. This solution would still concat consecutive small sends together for 5ms before sending the packet. Will this give too much CPU overhead? 10ms?
I have verified that setting SGIP_TCP_TRANSMIT_DELAY=5ms and calling Wifi_Update() every 5ms works better for my game. Almost there. I've also verified that increasing my message size to more than 40 bytes causing packets to be sent directly will also do the trick, slightly better.
I will do some more testing. I think that there may some limiting factor in my game now. It feels like it gets in lag "mode" every now and then. There is also the fact that 6 packets are used for each message exchange between the DS:es, I will look into this as well.
Regards,
Bengt
-
- Developer
- Posts: 10
- Joined: Fri Aug 12, 2005 5:13 am
- Location: Camping on an Oxygen atom
- Contact:
Re: Regarding DS<->DS TCP slowness (and DSi XL hang)
Heh, yeah, it's worse than I remember. I thought there was a "Send immediately" flag implemented for some reason.
The system doesn't do terribly much in the update, so 5ms should still be practical, you can probably live with that. May mess up sensitive timing around vblank though, if you have any.
-Stephen
The system doesn't do terribly much in the update, so 5ms should still be practical, you can probably live with that. May mess up sensitive timing around vblank though, if you have any.
-Stephen
http://blog.akkit.org/ - http://wiki.akkit.org/ - Creator of DSWifi library - Authority on ARM ASM - Memorizer of DS Hardware Information
Re: Regarding DS<->DS TCP slowness (and DSi XL hang)
Thanks for your reply! I have some new comments.
The reason it still sends 3 packets in both directions for each exchange is that sgIP_TCP_ReceivePacket() will set shouldReply whenever data was acked in the incoming packet. This means that the the first packet is the payload packet, this is directly acked in an empty reply packet (as intended) and then the sending side will send an empty packet with no payload data. This happens in both directions giving 6 packets. This was verified by removing shouldReply=1 which gives 4 packets instead.
When checking sgIP_TCP_SendPacket() it looks like it will always send data from the beginning of the buffer. This means if another packet is sent while the first hasn't been acked it will cause the same data to be sent again. If I understand things correctly.
This makes me a bit uncertain about how the changes I've implemented will affect things when trying to send much data. To DS, to PC and from PC. If the timer rate is increased too much it will probably cause the same data to be sent more often. I think. And the current method of forcing a packet to be sent for each packet received maybe can result in more throughput. However, tcp implementations often delay-ack every other packet so it may be timer-driven as well. Complicated stuff...
The current idea is to reply to every packet containing payload data and also always reply directly if data was acked AND there is data available to send. I kind of chickened out and is currently using a 25ms timer to avoid changing current values too much. To get the best throughput probably requires tuning the timer interval and my game isn't fit for this. When I pad my messages to >40 bytes this seems to be working nicely in my game, best performance yet.
/Bengt
The reason it still sends 3 packets in both directions for each exchange is that sgIP_TCP_ReceivePacket() will set shouldReply whenever data was acked in the incoming packet. This means that the the first packet is the payload packet, this is directly acked in an empty reply packet (as intended) and then the sending side will send an empty packet with no payload data. This happens in both directions giving 6 packets. This was verified by removing shouldReply=1 which gives 4 packets instead.
When checking sgIP_TCP_SendPacket() it looks like it will always send data from the beginning of the buffer. This means if another packet is sent while the first hasn't been acked it will cause the same data to be sent again. If I understand things correctly.
This makes me a bit uncertain about how the changes I've implemented will affect things when trying to send much data. To DS, to PC and from PC. If the timer rate is increased too much it will probably cause the same data to be sent more often. I think. And the current method of forcing a packet to be sent for each packet received maybe can result in more throughput. However, tcp implementations often delay-ack every other packet so it may be timer-driven as well. Complicated stuff...
The current idea is to reply to every packet containing payload data and also always reply directly if data was acked AND there is data available to send. I kind of chickened out and is currently using a 25ms timer to avoid changing current values too much. To get the best throughput probably requires tuning the timer interval and my game isn't fit for this. When I pad my messages to >40 bytes this seems to be working nicely in my game, best performance yet.
/Bengt
Who is online
Users browsing this forum: No registered users and 2 guests