EmbeddedRelated.com
Forums
Memfault Beyond the Launch

Windows tcp Rx hanging

Started by Didi September 10, 2009
When I upload data from DPS (where my tcp implementation is) to
windows
(an ftp server takes the data there), the windows tcp hangs at times.
It happens only at 100 MbpS. Something like 8 megabytes/S get
sustained
for a few seconds with several retries, then at some point the windows
tcp
acks only part of the last segment it could take with a window size of
0.
 My tcp retransmits that segment, but windows repeatedly (for many
seconds,
until I time out) only resends its last ack with the 0 window size.
 Clearly an error at the windows side - the window must have opened
(and the
rest its tcp/ip system keep on working), but then two windows systems
talking to each other don't lock up in this manner - at least I have
not
seen it (never researched them in depth).
 What am I supposed to do? I suspect sending some probing segment of
a different size will make the windows side stop repeating its last
(probably
queued) response. Generally I will manage to fix the issue, what I
want to
know is how widespread this issue is. Do other people observe it?
I am using window scaling, I let windows define the shift factor which
IIRC
is 3 (ftp DPS client windows server, non-passive mode - the issue
occurs
on the data connection, obviously).
 It is not an ethernet buffering issue, my side never sees any "xoff"
packets
and windows says it has sent none - and they are enabled. Clearly the
tcp
buffer gets full and things halt; it may even be the ftp server
(filezilla). I tried
to dramatically increas the server buffer size (it is configurable) to
no effect.

Thanks for any insight,

Dimiter

------------------------------------------------------
Dimiter Popoff               Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

On Sep 10, 10:58=A0am, Didi <d...@tgi-sci.com> wrote:

> When I upload data from DPS (where my tcp implementation is) to > windows > (an ftp server takes the data there), the windows tcp hangs at times.
Considering that ftp doesn't utilize tcp, I think you left something out of your explanation, such as what program is servicing the windows end of the tcp connection. I would suspect that it isn't keeping up with the data flow.
On Sep 10, 12:17=A0pm, Chris Stratton <cs07...@gmail.com> wrote:
> On Sep 10, 10:58=A0am, Didi <d...@tgi-sci.com> wrote: > > > When I upload data from DPS (where my tcp implementation is) to > > windows > > (an ftp server takes the data there), the windows tcp hangs at times. > > Considering that ftp doesn't utilize tcp,
...open foot, insert mouth
You might have to adjust your system's MTU, MSS, or other networking
parameters. I'm not certain how to do this in Windows, or why you would
be having problems on a 100 Mbps Ethernet link.

Often this is the solution to such problems on dial up (PPP) or other
oddball kinds of links.

-- 
Paul Hovnanian     mailto:Paul@Hovnanian.com
------------------------------------------------------------------
Applying information technology is simply finding the right wrench
to pound in the correct screw.
On Sep 10, 11:25=A0pm, "Paul Hovnanian P.E." <p...@hovnanian.com> wrote:
> You might have to adjust your system's MTU, MSS, or other networking > parameters. I'm not certain how to do this in Windows, or why you would > be having problems on a 100 Mbps Ethernet link.
I did play with them - and they are nothing special anyway (e.g. 1460 bytes tcp segment size etc.), things are pretty much "normal". And I must say it only happens with the ftp server, so it may be its fault after all (get clogged at some point and stays stuck with the socket buffer being full, hence the constant 0 window - while the rest of the network is OK and other tcp connections keep on working). I'll try to locate and install some other ftp server and see what happens. Dimiter ------------------------------------------------------ Dimiter Popoff Transgalactic Instruments http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/ Original message: http://groups.google.com/group/comp.arch.embedded/msg/ae4= 306db65c18410?dmode=3Dsource
On Sep 10, 7:58=A0am, Didi <d...@tgi-sci.com> wrote:

> =A0My tcp retransmits that segment, but windows repeatedly (for many > seconds, > until I time out) only resends its last ack with the 0 window size. > =A0Clearly an error at the windows side - the window must have opened > (and the > rest its tcp/ip system keep on working), but then two windows systems > talking to each other don't lock up in this manner - at least I have > not > seen it (never researched them in depth).
Actually are you sure your TCP is doing the right thing? This sounds a classic case of misimplementing TCP retransmission. Is this your own TCP implementation or an unusual one? You must retransmit *DATA*, not packets and not segments. Otherwise, TCP can deadlock (for example, if two ACKs drop at the same time, one in each direction). DS
On Sep 10, 11:07=A0pm, Dimiter Popoff <d...@tgi-sci.com> wrote:
> On Sep 10, 11:25=A0pm, "Paul Hovnanian P.E." <p...@hovnanian.com> wrote: > > > You might have to adjust your system's MTU, MSS, or other networking > > parameters. I'm not certain how to do this in Windows, or why you would > > be having problems on a 100 Mbps Ethernet link. > > I did play with them - and they are nothing special anyway (e.g. 1460 > bytes > tcp segment size etc.), things are pretty much "normal".
Well, clearly I did not play enough. Just tried to *upload* from a wintel system to the same filezilla server - and guess what, it got stuck in the very same manner..... Apparently all tests I did for wintel - wintel communication have involved only download from that server so far. I guess I should thank that mosquito which bit bit me to wake me up and I did that test. Dimiter ------------------------------------------------------ Dimiter Popoff Transgalactic Instruments http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
On Sep 11, 4:22=A0am, David Schwartz <dav...@webmaster.com> wrote:
> ... Is this your own TCP implementation or an unusual one?
It is mine.
> You must retransmit *DATA*, not packets and not segments.
I was thinking about that myself. There is nothing which says I need to change the size of the last segment I have retransmitted; and there is no sane reason to do that. In this case, it was a 1460 bytes long segment; the ack was coming for about 1200 bytes (the latter figure is imprecise, don't remember the exact one). But what was making it clear it was the peers fault was the fact that it kept repeating that same ack with a window size of 0 for 30 seconds - and the windows system was OK, the ftp server running there would even accept "ABOR" via the control connection, close the hanging data connection and recover. Now that I saw this gets locked up the same way when uploading to that server from another tcp (windows'), things are clear. Filezilla (the server software) gets stuck with its buffer (local connection buffer, I guess) full because of some bug - mind you, this does not occur at 10 MbpS (or did I not hold it long enough... I think I did, though) - and poor thing keeps on acking what it can take (< 1 segment) and reporting a 0 window.... Anyway, none of my problem. I was eager to make sure I am going out of tcp porting mode without leaving any (known) issues behind, well, that's done.
> Otherwise, > TCP can deadlock (for example, if two ACKs drop at the same time, one > in each direction).
Well, segments with a 0 size do not get retransmitted, this is how tcp works, of course. Zero size segments are sent as acks only in reply to non-0 sized segments (apart from syn and fin segments, that is, syn is 0 and fin can be 0 sized). Thanks for the insight, Dimiter ------------------------------------------------------ Dimiter Popoff Transgalactic Instruments http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/ Original message: http://groups.google.com/group/comp.arch.embedded/msg/4cb= b87608c90732e?dmode=3Dsource
On Sep 10, 6:42=A0pm, Dimiter Popoff <d...@tgi-sci.com> wrote:
> On Sep 11, 4:22=A0am, David Schwartz <dav...@webmaster.com> wrote: > > > ... Is this your own TCP implementation or an unusual one? > > It is mine.
Then I'll almost bet your retransmit algorithm is busted. Post a dump of the last few packets before the hang and the hang.
> > You must retransmit *DATA*, not packets and not segments.
> I was thinking about that myself. There is nothing which says > I need to change the size of the last segment I have retransmitted; > and there is no sane reason to do that.
You *cannot* retransmit segments. That will cause TCP to deadlock. You *must* retransmit data.
> In this case, it was a > 1460 bytes long segment; the ack was coming for about 1200 > bytes (the latter figure is imprecise, don't remember the exact one). > But what was making it clear it was the peers fault was the > fact that it kept repeating that same ack with a window size > of 0 for 30 seconds - and the windows system was OK, the > ftp server running there would even accept "ABOR" via the > control connection, close the hanging data connection and > recover.
Let me guess -- you kept sending it the same data it had already ACKed and it kept telling you it had already ACKed it. See the problem? You are refusing to honor the peer's ACK until it ACKs more data. The peer is refusing to ACK more data until you accept what it has already ACKed. You are in the wrong. Again, TCP *does* *not* have any segment retransmission or packet retransmission and that WILL NOT work. Really.
> Now that I saw this gets locked up the same way when uploading > to that server from another tcp (windows'), things are clear. > Filezilla (the server software) gets stuck with its buffer > (local connection buffer, I guess) full because of some > bug =A0- mind you, this does not occur at 10 MbpS (or did I not > hold it long enough... I think I did, though) - and poor thing > keeps on acking what it can take (< 1 segment) and reporting > a 0 window....
Why do you keep refusing to accept the peer's ACK? Why do you keep retransmitting data it has already ACKed, forcing it to ACK it again?
> Anyway, none of my problem. I was eager to make sure I am > going out of tcp porting mode without leaving any (known) > issues behind, well, that's done.
Nonsense, unless I'm misunderstanding you. DS
On Thu, 10 Sep 2009 13:07:05 -0700 (PDT), Dimiter Popoff <dp@tgi-sci.com>
wrote:

>On Sep 10, 11:25&#4294967295;pm, "Paul Hovnanian P.E." <p...@hovnanian.com> wrote: >> You might have to adjust your system's MTU, MSS, or other networking >> parameters. I'm not certain how to do this in Windows, or why you would >> be having problems on a 100 Mbps Ethernet link. > >I did play with them - and they are nothing special anyway (e.g. 1460 >bytes >tcp segment size etc.), things are pretty much "normal". > > And I must say it only happens with the ftp server, so it may be its >fault after all (get clogged at some point and stays stuck with the >socket buffer being full, hence the constant 0 window - while the rest >of the network is OK and other tcp connections keep on working). > > I'll try to locate and install some other ftp server and see what >happens. > >Dimiter >
There are also some settings on the card itself usually (the device). In the device settings dialogs, there is a page where hard 100Mb/s is selected or "auto", which is what the whole world typically uses. Hardly anyone hard sets a link to 100Mb/s. Auto negotiation has usually been the rule. Anyway, there are quite a few other settings there as well. This is not registry settings, it is for the card itself in device manager (there are also other ways to get to this device dialog). Try examining that. Perhaps you hard set it to 100Mb/s, which was "auto". If so, that is one of your mistakes.

Memfault Beyond the Launch