receive hangs; was TCP send() performance and eventual hangs

zigg
Posts: 28
Joined: Wed Jul 01, 2009 2:42 pm

Re: receive hangs; was TCP send() performance and eventual hangs

Post by zigg » Wed Jul 08, 2009 7:25 pm

Patch submitted: https://sourceforge.net/tracker/?func=d ... tid=668553

Thanks for all your help, everyone!

sgstair
Developer
Posts: 10
Joined: Fri Aug 12, 2005 5:13 am
Location: Camping on an Oxygen atom
Contact:

Re: receive hangs; was TCP send() performance and eventual hangs

Post by sgstair » Sat Jul 11, 2009 10:25 am

Heh,
This thread is full of all sorts of crackpots :)

Please go back and reread the fifo code. interrupts are always disabled when callback functions are called, this was a very important design feature to allow stuff like this to work properly.

Whatever it is your patch has done, it certainly hasn't done what you think. I find myself seriously doubting it does anything at all; try it without the patch, you made other changes too. It should behave just as well.

As for the other comments; I will need a bit of time to respond, but I thought I should at least say that first :)

-Stephen
http://blog.akkit.org/ - http://wiki.akkit.org/ - Creator of DSWifi library - Authority on ARM ASM - Memorizer of DS Hardware Information

zigg
Posts: 28
Joined: Wed Jul 01, 2009 2:42 pm

Re: receive hangs; was TCP send() performance and eventual hangs

Post by zigg » Sat Jul 11, 2009 12:30 pm

With all due humility, because yes, I certainly am really green with this stuff:
sgstair wrote:Please go back and reread the fifo code. interrupts are always disabled when callback functions are called, this was a very important design feature to allow stuff like this to work properly.
This is what I'm looking at. Given that this code is setting REG_IME before calling the value32 handler, this sequence of events seems quite probable to me:

1. The fifo system, having turned REG_IME back on, calls the value32 handler (wifiValue32Handler), which calls Wifi_Sync, which calls Wifi_Update. Wifi_Update starts processing received packets, but then

2. IRQ_TIMER3 fires, Timer_50ms, Wifi_Timer, Wifi_Update. And so Wifi_Update has now been re_entered.
sgstair wrote:Whatever it is your patch has done, it certainly hasn't done what you think. I find myself seriously doubting it does anything at all; try it without the patch, you made other changes too. It should behave just as well.
I was very careful to make sure that it still worked at each stage of the game. I recompiled my own code against, in all three test runs, stock libraries and my own patched libnds (first try) and dswifi (second and final tries). In the first two I masked IRQ_TIMER3 in some fashion; and the last is as you see it.

Each time, the stock version hung within a handful of megabytes/minutes, and the patched version sent 70-150 megabytes and was only stopped by me deciding running it for 30-60 minutes was enough.

Each time I ran them head-to-head, my code was the same; the only variable was whether I was using stock or patched libraries. Yeah, I did make some changes to my own code along the way as well, but mostly I was trying to make sure I wasn't doing anything stupid. And that's still a remote possibility, but it certainly seems like the less likely option to my admittedly less-experienced eyes.

zigg
Posts: 28
Joined: Wed Jul 01, 2009 2:42 pm

Re: receive hangs; was TCP send() performance and eventual hangs

Post by zigg » Sat Jul 11, 2009 12:49 pm

By the way this patch doesn't purport to (nor does it, in fact) fix the TCP congestion problem. Though it seems that at least experimentally, it means it doesn't eventually hang anymore. I certainly haven't given it the hardcore testing I gave the UDP code.

elhobbs
Posts: 358
Joined: Thu Jul 02, 2009 1:19 pm

Re: receive hangs; was TCP send() performance and eventual hangs

Post by elhobbs » Sat Jul 11, 2009 5:55 pm

sgstair wrote:Heh,
This thread is full of all sorts of crackpots :)

Please go back and reread the fifo code. interrupts are always disabled when callback functions are called, this was a very important design feature to allow stuff like this to work properly.

Whatever it is your patch has done, it certainly hasn't done what you think. I find myself seriously doubting it does anything at all; try it without the patch, you made other changes too. It should behave just as well.

As for the other comments; I will need a bit of time to respond, but I thought I should at least say that first :)

-Stephen
yeah, you do not want to wait too long to call people "crackpots". it is best to do it first before you take a look at the issue. if you take a look at the issue first you may miss the opportunity :)

I am not really sure that this particular patch does resolve the actual issue. I think it may just make it less likely to happen. when I tried this patch on cquake it seamed to create a lot of lag. it ran longer - about 10 minutes vs about 30-60 seconds without the patch. and I will be the first to admit that the code in cquake is not the best. I am not a professional programmer. that being said it ran for a long time when compiled with the pre-fifo version of libnds. I did need to change a couple sections of code though to get it to compile. I had to swap the IPCFifo library calls that I was using for the libnds fifo calls.

sgstair
Developer
Posts: 10
Joined: Fri Aug 12, 2005 5:13 am
Location: Camping on an Oxygen atom
Contact:

Re: receive hangs; was TCP send() performance and eventual hangs

Post by sgstair » Sat Jul 11, 2009 5:57 pm

Ok, for the record I'm stupid :)

What I said in the last post is in fact the opposite of the truth, I'm not sure how I misremebered it so badly;

Yes, I see the possibility exists for the wifi timer interrupt to interrupt the wifi update from the fifo, which isn't protected and is a very clear race condition- I've talked to WinterMute about the best way to fix this.

Sorry about that,
-Stephen
http://blog.akkit.org/ - http://wiki.akkit.org/ - Creator of DSWifi library - Authority on ARM ASM - Memorizer of DS Hardware Information

zigg
Posts: 28
Joined: Wed Jul 01, 2009 2:42 pm

Re: receive hangs; was TCP send() performance and eventual hangs

Post by zigg » Sun Jul 12, 2009 10:11 pm

sgstair, while I have your attention and to give myself another shot at crackpottery ;) did you have any ideas about the TCP problem? It's a different issue than the hang.

Check the first 5 posts: http://forums.devkitpro.org/viewtopic.php?f=23&t=1425

sgstair
Developer
Posts: 10
Joined: Fri Aug 12, 2005 5:13 am
Location: Camping on an Oxygen atom
Contact:

Re: receive hangs; was TCP send() performance and eventual hangs

Post by sgstair » Sun Jul 12, 2009 10:38 pm

Haha, don't take me too literally there :)

I have a couple of ideas about the TCP slowdown, and have for quite some time. The lib isn't doing anything outright "bad" per se, just a lot of things that could be done better.

To be specific, I think it's the combination of the following elements causing the slowdown:
* DS Wifi chipset isn't particularly awesome (especially the antenna), and can lose packets, and may not behave correctly in all circumstances.
* DS Wifi library code based on incomplete understanding of the hardware, and does some things probably incorrectly (like: packet retransmit when it doesn't receive indication from the AP that the packet was received.)
* DS wifi TCP implementation flow control is very basic, and retransmission delay is LONG, and limited by resolution of 50ms timer (several design decisions here)

I've pieced the following incomplete picture of how this sort of cascade failure happens, from various packet captures. My memory isn't perfect (see previous posts :P) and I am going from memory here, but this might give you an idea of what's going wrong.

First, DS wifi library sends a new packet for a TCP stream under two circumstances: one, an ACK is received that pushes the window forward and more data exists to be sent (data is sent immediately), and two, if it hasn't received an ACK in a while and still has data to send, it will retransmit a full data packet.

Second, DS Wifi library sets the hardware to retransmit if not acknowledged 15 times (if I remember right). This is to deal with packet loss, and for the most part works. Wifi HW sets some status values that aren't presently understood or checked. There may be some incorrect behavior as a result.

Third, DS library transmits packets with 1420 bytes of data, this produces packets with near maximum ethernet packet size. This makes them more unreliable than most packets. (Most TCP libs use packet sizes of closer to 500 bytes data, for this reason).

Now all of this works reasonably well in a balance, but it seems to be the case you can get into a situation where: DS sends packet and it's received, but DS doesn't receive the wifi ACK from the AP. So, DS retransmits the same packet again. These are ~1500 byte packets at 2mbit, this is a notrivial ~6ms of time being used up by the retransmit.
Occasionally this chains - the AP has other stuff to send, and the DS may not respect that always (not entirely sure), so more packets are sent on top of each other, it's possible to lose TCP ACKs from the PC in this way, and then DS then runs out of it's 15 retransmits (~100 ms) and, depending, can wait another few multiples of the 50ms timer to try to retransmit, to start it up again.

Ok, so that's a bit of an incoherent description. Probably not entirely accurate. I intend to be looking at this all again soon when I start writing the new lib (which I've been promising for a while now), I have quite a few ideas on how to do all of these things better :)
But I'll be happy to answer other questions if you have them.

-Stephen
http://blog.akkit.org/ - http://wiki.akkit.org/ - Creator of DSWifi library - Authority on ARM ASM - Memorizer of DS Hardware Information

zigg
Posts: 28
Joined: Wed Jul 01, 2009 2:42 pm

Re: receive hangs; was TCP send() performance and eventual hangs

Post by zigg » Thu Jul 16, 2009 6:13 pm

sgstair wrote:Third, DS library transmits packets with 1420 bytes of data, this produces packets with near maximum ethernet packet size. This makes them more unreliable than most packets. (Most TCP libs use packet sizes of closer to 500 bytes data, for this reason).
Out of curiosity—have you ever tried a 500-odd byte MSS?

sgstair
Developer
Posts: 10
Joined: Fri Aug 12, 2005 5:13 am
Location: Camping on an Oxygen atom
Contact:

Re: receive hangs; was TCP send() performance and eventual hangs

Post by sgstair » Thu Jul 16, 2009 6:43 pm

No. For the simple reason that the lib doesn't keep more than the MSS in transit at any given time. So doing so would effectively cap the transfer speed at 1/3 what it is, for latency-bound connections.

Like I said, many things need to be improved :)

-Stephen
http://blog.akkit.org/ - http://wiki.akkit.org/ - Creator of DSWifi library - Authority on ARM ASM - Memorizer of DS Hardware Information

Post Reply

Who is online

Users browsing this forum: No registered users and 3 guests