On 2019-09-25 David Brown wrote in comp.arch.embedded:
> On 25/09/2019 06:55, upsidedown@downunder.com wrote:
>> On Tue, 24 Sep 2019 14:22:51 +0200, Stef
>> <stef33d@yahooI-N-V-A-L-I-D.com.invalid> wrote:
>> 
>>>
>>> What seems weird to me is that all ACK are followed by Dup ACK. Is this normal? See that al lot in other TCP traffic as well, but not on all ACK.
>>  
>> The difference between Ack and duplicate Ack is only 8-9 us. Strange.
>> 
>
> It is, I think, due to lost packets.  (Google "wireshark dup ack" for
> suggestions.)

Even stranger, when I just ping the board:

229	5.081046	192.168.125.128	192.168.125.133	ICMP	74	Echo (ping) request  id=0x0001, seq=3492/41997, ttl=128 (no response found!)
230	5.081051	192.168.125.128	192.168.125.133	ICMP	74	Echo (ping) request  id=0x0001, seq=3492/41997, ttl=128 (reply in 234)
234	5.081615	192.168.125.133	192.168.125.128	ICMP	74	Echo (ping) reply    id=0x0001, seq=3492/41997, ttl=255 (request in 230)

A duplicate request goes out from W10 with only 5 us difference? The same
happens on the rest of the pings, one second apart.

Nothing to do with the eval kit with LwIP, if I ping my router:

380	7.010693	192.168.125.128	192.168.125.254	ICMP	74	Echo (ping) request  id=0x0001, seq=3539/54029, ttl=128 (no response found!)380	7.010693	192.168.125.128	192.168.125.254	ICMP	74	Echo (ping) request  id=0x0001, seq=3539/54029, ttl=128 (no response found!)
381	7.010700	192.168.125.128	192.168.125.254	ICMP	74	Echo (ping) request  id=0x0001, seq=3539/54029, ttl=128 (reply in 382)
382	7.011670	192.168.125.254	192.168.125.128	ICMP	74	Echo (ping) reply    id=0x0001, seq=3539/54029, ttl=255 (request in 381)

And after looking a little better, it seems to happen on almost all
packets transmitted from this PC, but not on TLSv1.2 packets.

Something weird in W10 or Wireshark? Or the PC network interface?


-- 
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail)

Intolerance is the last defense of the insecure.

On 2019-09-25 David Brown wrote in comp.arch.embedded:
...
> What are your other lwip config numbers?  I have:
>
> #define MEM_SIZE                (48*1024)
> #define MEMP_NUM_PBUF           32
> #define MEMP_NUM_UDP_PCB        6
> #define MEMP_NUM_TCP_PCB        128
> #define MEMP_NUM_TCP_PCB_LISTEN 8
> #define MEMP_NUM_TCP_SEG        16
> #define MEMP_NUM_SYS_TIMEOUT    10
> #define PBUF_POOL_SIZE          32
> #define PBUF_POOL_BUFSIZE       1518

In lwipopt.h:
#define MEM_SIZE                        (32 * 1024)
#define MEMP_NUM_SYS_TIMEOUT            300
#define PBUF_POOL_SIZE                  64

Most were not in my lwipopt.h, so at default from opt.h:
#define MEMP_NUM_PBUF                   16
#define MEMP_NUM_UDP_PCB                4
#define MEMP_NUM_TCP_PCB                5
#define MEMP_NUM_TCP_PCB_LISTEN         8
#define MEMP_NUM_TCP_SEG                16
#define PBUF_POOL_BUFSIZE               LWIP_MEM_ALIGN_SIZE(TCP_MSS+40+PBUF_LINK_HLEN)

The last one expands to: ((1460+40+14)+3) & ~3 = 1516

> I can't say I have studied the usage in detail, so I might have much
> more than I need for some of these.  But if any of your values are
> hugely lower than mine, check them to be sure.  /Something/ is giving
> you memory allocation errors, and you need to find and fix that something.

Tried changing MEMP_NUM_TCP_PCB, MEMP_NUM_PBUF and
MEMP_NUM_SYS_TIMEOUT to your values, but no change in behaviour.

> It is also possible that the rest of program structure means you are
> getting slow handling of the packets - I can't guess anything about
> that, because I don't know your code at all.

It's just a loop calling the lwip handlers (the standard stand alone
echo example) and a function that checks a timer for when to send a
(dummy) packet. So really lightweight, just to test if I can get packets
out at the required rate.

I'm giving up the TCP version for now and go with the UDP, which seems
to work fine (with the same settings). I'll give TCP another try some
time.

-- 
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail)

No hardware designer should be allowed to produce any piece of hardware
until three software guys have signed off for it.
		-- Andy Tanenbaum

On 25/09/2019 13:41, Stef wrote:
> On 2019-09-25 David Brown wrote in comp.arch.embedded:
>> On 24/09/2019 14:22, Stef wrote:
>>>>> On 9/23/2019 15:04, David Brown wrote:
> ... 
>>> Then one port, 20 ms intervals;
>>> Runs for a while but ultimately ends up in repeated msDelay().
>>> Last statistics output (run time not always equal):
>>> - TCP
>>>         xmit: 1088
>>>         recv: 827
>>>         fw: 0
>>>         drop: 0
>>>         chkerr: 0
>>>         lenerr: 0
>>>         memerr: 270
>>>         rterr: 0
>>>         proterr: 0
>>>         opterr: 0
>>>         err: 0
>>>         cachehit: 0
>>>
>>> In the previous outputs you can see the gap between xmit and recv
>>> increasing as well as the memerr.
>>
>> If you are getting any memerr results, you have not given LWIP enough
>> resources.  I don't know off-hand what buffer types or other resources
>> can lead to memerr counts, but you are low on something.  These
>> statistics show that a quarter of your communications are failing to get
>> the buffers they need - not good at all.
> 
> No, not good. But 64 PBUF_POOL_SIZE and 32k mem should be okay for the
> few packets that could be in-flight if the PC responds fast? I see no
> other obviously releated tunable items in my lwipopts.h
> 
> But even from the start of transmission, the packets come out too slow.
> Sometimes 512 byte packets after a > 20ms interval, mostly 1024 byte
> packets at > 40 ms (mostly 50 ms). So this would quickly cause the buffers
> to fill up as the I keep feeding them at 20 ms intervals.
> 
> I'll proceed with the UDP approach for now. Looks a lot cleaner.
> 
> 

What are your other lwip config numbers?  I have:

#define MEM_SIZE                (48*1024)
#define MEMP_NUM_PBUF           32
#define MEMP_NUM_UDP_PCB        6
#define MEMP_NUM_TCP_PCB        128
#define MEMP_NUM_TCP_PCB_LISTEN 8
#define MEMP_NUM_TCP_SEG        16
#define MEMP_NUM_SYS_TIMEOUT    10
#define PBUF_POOL_SIZE          32
#define PBUF_POOL_BUFSIZE       1518

I can't say I have studied the usage in detail, so I might have much
more than I need for some of these.  But if any of your values are
hugely lower than mine, check them to be sure.  /Something/ is giving
you memory allocation errors, and you need to find and fix that something.

It is also possible that the rest of program structure means you are
getting slow handling of the packets - I can't guess anything about
that, because I don't know your code at all.

On 2019-09-25 David Brown wrote in comp.arch.embedded:
> On 24/09/2019 14:22, Stef wrote:
>>>> On 9/23/2019 15:04, David Brown wrote:
... 
>> Then one port, 20 ms intervals;
>> Runs for a while but ultimately ends up in repeated msDelay().
>> Last statistics output (run time not always equal):
>> - TCP
>>         xmit: 1088
>>         recv: 827
>>         fw: 0
>>         drop: 0
>>         chkerr: 0
>>         lenerr: 0
>>         memerr: 270
>>         rterr: 0
>>         proterr: 0
>>         opterr: 0
>>         err: 0
>>         cachehit: 0
>> 
>> In the previous outputs you can see the gap between xmit and recv
>> increasing as well as the memerr.
>
> If you are getting any memerr results, you have not given LWIP enough
> resources.  I don't know off-hand what buffer types or other resources
> can lead to memerr counts, but you are low on something.  These
> statistics show that a quarter of your communications are failing to get
> the buffers they need - not good at all.

No, not good. But 64 PBUF_POOL_SIZE and 32k mem should be okay for the
few packets that could be in-flight if the PC responds fast? I see no
other obviously releated tunable items in my lwipopts.h

But even from the start of transmission, the packets come out too slow.
Sometimes 512 byte packets after a > 20ms interval, mostly 1024 byte
packets at > 40 ms (mostly 50 ms). So this would quickly cause the buffers
to fill up as the I keep feeding them at 20 ms intervals.

I'll proceed with the UDP approach for now. Looks a lot cleaner.

-- 
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail)

"You're very sure of your facts, " he said at last, "I 
couldn't trust the thinking of a man who takes the Universe 
- if there is one - for granted. "

On 2019-09-25 Stef wrote in comp.arch.embedded:
...
> There was still an issue that at all tested speeds (10/20/40/100 ms) the
> embedded side goes in a hard fault handler after a while. Aprox 30
> seconds for 10/20/40 ms and 2 minutes for 100 ms. This was probably due
> to my sending the 4 packets for each interval back to back without calling
> any of the ethrnet handlers in between. If I call the handlers after each
> packet, the hangups disapper. So I may have to revistit the TCP version to
> check if that was a problem there as well. :-(

For get that, the failing TCP test was running only one port at 20 ms, so
it did not have the back to back problem. The failure was also different,
a repeating wait loop, not a hard fault.


-- 
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail)

I drink to make other people interesting.
		-- George Jean Nathan

On 2019-09-25 upsidedown@downunder.com wrote in comp.arch.embedded:
> On Tue, 24 Sep 2019 14:22:51 +0200, Stef
><stef33d@yahooI-N-V-A-L-I-D.com.invalid> wrote:
>
>>
>>Then one port, 20 ms intervals;
>>Runs for a while but ultimately ends up in repeated msDelay().
>>Last statistics output (run time not always equal):
>>- TCP
>>        xmit: 1088
>>        recv: 827
>>        fw: 0
>>        drop: 0
>>        chkerr: 0
>>        lenerr: 0
>>        memerr: 270
>>        rterr: 0
>>        proterr: 0
>>        opterr: 0
>>        err: 0
>>        cachehit: 0
>>
>>In the previous outputs you can see the gap between xmit and recv
>>increasing as well as the memerr.
>>
>>Looks like the PC does just not respond fast enough, forcing LwIP to
>>keep the packet until an ACK is received?
>
> Make sure that you do not cause excessive loading on PC by causing all
> kinds of activity on the PC.

PC is not heavily loaded, debugger, terminal window, wireshark, a
browser with internet radio, not much more.

> If you use Telnet (or other display program) as the TCP client, any
> screen updates will cause loading the PC. At least minimize the Telnet
> window to reduce the PC loading. The same applies to Wireshark, just
> capture to disk. If possible use a separate PC for Telnet and
> Wireshark, but you need a hub, not a switch, so that Wireshark will
> see the traffic between the embedded device and Telnet PC.
>
>>
>>
>>Last 6 lines from the Wireshark packet window when the above stops:
>>
>>6600	78.330701	192.168.125.130	192.168.125.128	TCP	1078	10001 ? 51826 [PSH, ACK] Seq=975873 Ack=1 Win=5840 Len=1024
>>6601	78.379841	192.168.125.128	192.168.125.130	TCP	54	51826 ? 10001 [ACK] Seq=1 Ack=976897 Win=63216 Len=0
>>6602	78.379850	192.168.125.128	192.168.125.130	TCP	54	[TCP Dup ACK 6601#1] 51826 ? 10001 [ACK] Seq=1 Ack=976897 Win=63216 Len=0
>>6607	78.413921	192.168.125.130	192.168.125.128	TCP	566	10001 ? 51826 [PSH, ACK] Seq=976897 Ack=1 Win=5840 Len=512
>>6609	78.459611	192.168.125.128	192.168.125.130	TCP	54	51826 ? 10001 [ACK] Seq=1 Ack=977409 Win=64240 Len=0
>>6610	78.459619	192.168.125.128	192.168.125.130	TCP	54	[TCP Dup ACK 6609#1] 51826 ? 10001 [ACK] Seq=1 Ack=977409 Win=64240 Len=0
>
> Look at the WireShark sequence numbering (first column), why is there
> a jump between 6602 and 6607 ? Is this due to display filtering? Use
> capture filtering to reduce Wireshark loading.

Yes, display filtering is used.

> Look at timestamps (second column), the last digit is in microseconds.
> There seems to be a much larger gap than 20 ms between to embedded
> sends. The time between data and Ack is much larger than 20 ms. 

Why the sends are more than 20 ms apart, I don't know. The sender loop
tries to send a packet every 20 ms. Must be due to the lagging ACKs
and limited buffer space at the sender?

> The Ack sequence number has fallen seriously behind data messages.
> Apparently the sender fails to buffer about 1000 data frames (500 KB)
> before being acknowledged, so clearly some messages are loss.

OK

> This suggest that the Telnet screen update slows down the traffic, so
> minimize Telnet screen.

I don't think the terminal (just a plain serial terminal, no telnet) is
slowing the PC down too much, see below.

>>
>>What seems weird to me is that all ACK are followed by Dup ACK. Is this normal? See that al lot in other TCP traffic as well, but not on all ACK.
>  
> The difference between Ack and duplicate Ack is only 8-9 us. Strange.

Yes.

As discussed in other parts of this thread, UDP might be a better match
for this application, so I tried that.

A test with 4 ports sending 512 byte packets at 10 ms intervals was
successful (mostly). During this test 4 terminal screens were updating
and wireshark was running with display filtering. So the PC seems to
have no problem keeping up.

7888	23.208722	192.168.125.132	192.168.125.128	UDP	554	10001 &rarr; 10001 Len=512
7889	23.209536	192.168.125.132	192.168.125.128	UDP	554	10002 &rarr; 10002 Len=512
7890	23.210885	192.168.125.132	192.168.125.128	UDP	554	10003 &rarr; 10003 Len=512
7891	23.211184	192.168.125.132	192.168.125.128	UDP	554	10004 &rarr; 10004 Len=512
7892	23.219123	192.168.125.132	192.168.125.128	UDP	554	10001 &rarr; 10001 Len=512
7893	23.219593	192.168.125.132	192.168.125.128	UDP	554	10002 &rarr; 10002 Len=512
7894	23.220354	192.168.125.132	192.168.125.128	UDP	554	10003 &rarr; 10003 Len=512
7895	23.221256	192.168.125.132	192.168.125.128	UDP	554	10004 &rarr; 10004 Len=512

So it looks like UDP is indeed a better match for this application.
Have to implement some stuff in the application that TCP took care of.
Like opening a port and handling packet loss. Packet loss was already
an issue that needed to be handled by the final application. This is
because (as an other poster already mentioned) the data is just serial
data with no loss detection to begin with.

There was still an issue that at all tested speeds (10/20/40/100 ms) the
embedded side goes in a hard fault handler after a while. Aprox 30
seconds for 10/20/40 ms and 2 minutes for 100 ms. This was probably due
to my sending the 4 packets for each interval back to back without calling
any of the ethrnet handlers in between. If I call the handlers after each
packet, the hangups disapper. So I may have to revistit the TCP version to
check if that was a problem there as well. :-(

-- 
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail)

I *____knew* I had some reason for not logging you off... If I could just
remember what it was.

On 25/09/2019 06:55, upsidedown@downunder.com wrote:
> On Tue, 24 Sep 2019 14:22:51 +0200, Stef
> <stef33d@yahooI-N-V-A-L-I-D.com.invalid> wrote:
> 
>>
>> Then one port, 20 ms intervals;
>> Runs for a while but ultimately ends up in repeated msDelay().
>> Last statistics output (run time not always equal):
>> - TCP
>>        xmit: 1088
>>        recv: 827
>>        fw: 0
>>        drop: 0
>>        chkerr: 0
>>        lenerr: 0
>>        memerr: 270
>>        rterr: 0
>>        proterr: 0
>>        opterr: 0
>>        err: 0
>>        cachehit: 0
>>
>> In the previous outputs you can see the gap between xmit and recv
>> increasing as well as the memerr.
>>
>> Looks like the PC does just not respond fast enough, forcing LwIP to
>> keep the packet until an ACK is received?
> 
> Make sure that you do not cause excessive loading on PC by causing all
> kinds of activity on the PC.
> 
> If you use Telnet (or other display program) as the TCP client, any
> screen updates will cause loading the PC. At least minimize the Telnet
> window to reduce the PC loading. The same applies to Wireshark, just
> capture to disk. If possible use a separate PC for Telnet and
> Wireshark, but you need a hub, not a switch, so that Wireshark will
> see the traffic between the embedded device and Telnet PC.
> 

Nonsense.  A modern PC will handle this with barely a blip on its
processor usage graph.  You /might/ see some limitations if you are
using a 2 GB "Intel compute stick" with a tiny Celeron and Windows 10.
But assuming the Win 10 has halfway sane specs, it is going to be
absolutely fine.  Run Task Manager and watch the graphs of processor and
memory usage to confirm that.

>>
>>
>> Last 6 lines from the Wireshark packet window when the above stops:
>>
>> 6600	78.330701	192.168.125.130	192.168.125.128	TCP	1078	10001 ? 51826 [PSH, ACK] Seq=975873 Ack=1 Win=5840 Len=1024
>> 6601	78.379841	192.168.125.128	192.168.125.130	TCP	54	51826 ? 10001 [ACK] Seq=1 Ack=976897 Win=63216 Len=0
>> 6602	78.379850	192.168.125.128	192.168.125.130	TCP	54	[TCP Dup ACK 6601#1] 51826 ? 10001 [ACK] Seq=1 Ack=976897 Win=63216 Len=0
>> 6607	78.413921	192.168.125.130	192.168.125.128	TCP	566	10001 ? 51826 [PSH, ACK] Seq=976897 Ack=1 Win=5840 Len=512
>> 6609	78.459611	192.168.125.128	192.168.125.130	TCP	54	51826 ? 10001 [ACK] Seq=1 Ack=977409 Win=64240 Len=0
>> 6610	78.459619	192.168.125.128	192.168.125.130	TCP	54	[TCP Dup ACK 6609#1] 51826 ? 10001 [ACK] Seq=1 Ack=977409 Win=64240 Len=0
> 
> Look at the WireShark sequence numbering (first column), why is there
> a jump between 6602 and 6607 ? Is this due to display filtering? Use
> capture filtering to reduce Wireshark loading.
> 

If you are doing large captures, then capture filtering makes a
difference by reduce the quantity of data.  For short captures, it will
make no measurable difference.

> Look at timestamps (second column), the last digit is in microseconds.
> There seems to be a much larger gap than 20 ms between to embedded
> sends. The time between data and Ack is much larger than 20 ms. 
> 
> The Ack sequence number has fallen seriously behind data messages.
> Apparently the sender fails to buffer about 1000 data frames (500 KB)
> before being acknowledged, so clearly some messages are loss.
> 
> This suggest that the Telnet screen update slows down the traffic, so
> minimize Telnet screen.

No, it suggests things are failing at the LWIP side, especially when we
see the LWIP statistics full of memerr counts.  A memerr occurs when
LWIP can't find a free buffer of some sort (there are several types
used) to handle a packet coming in or going out - it has no choice but
to drop that packet.  This will lead to resends and delays.

> 
>>
>> What seems weird to me is that all ACK are followed by Dup ACK. Is this normal? See that al lot in other TCP traffic as well, but not on all ACK.
>  
> The difference between Ack and duplicate Ack is only 8-9 us. Strange.
> 

It is, I think, due to lost packets.  (Google "wireshark dup ack" for
suggestions.)

On 24/09/2019 14:22, Stef wrote:
>>> On 9/23/2019 15:04, David Brown wrote:
> ...
>> There you go - something was badly wrong with the network.  It had a 
>> Windows 10 machine on it :-)
> 
> That might be my problem as well. ;-)

I was half joking, but only half.  Modern systems are very chatty on the
network - with newer OS's being worse than old ones.  Even Linux desktop
systems produce a lot of traffic, though my experience is that Windows
is much worse.  (I have not looked at Mac's, and server systems are
usually a lot quieter.)  Printers and other devices also chatter
continuously.  Some of this is necessary low-level traffic, like ARP and
DHCP packets.  Some is from the several dozen different "automatic
configuration" and "name service" protocols that everyone has to
implement, but almost no one uses.  Some is from applications like
DropBox or Steam trying to find neighbours on the network.

This all means there is a ridiculous amount of traffic on typical
Ethernet networks, with 90% of it being basically useless, and a good
deal of it being broadcast to all nodes.

> 
> Turned on LwIP statistics and did a first test with a single simulated
> serial port (data not actually coming in, just generated by processor
> itself).
> 
> LwIP runs on an LPC4088 and 'serial port' is opened by Docklight running
> on a W10 PC.
> 
> First, 512 byte packets a 1 second intervals and printing the stats every
> 5 seconds.
> - TCP
>         xmit: 44
>         recv: 45
>         fw: 0
>         drop: 0
>         chkerr: 0
>         lenerr: 0
>         memerr: 0
>         rterr: 0
>         proterr: 0
>         opterr: 0
>         err: 0
>         cachehit: 0
> 
> No memerr and xmit and recv at a constant 1 difference.
> 
> 
> Then a test with one port, 10 ms intervals:
> - (Aparently) Immediate hangup, stacks keeps going into msDelay().
> 
> 
> Then one port, 20 ms intervals;
> Runs for a while but ultimately ends up in repeated msDelay().
> Last statistics output (run time not always equal):
> - TCP
>         xmit: 1088
>         recv: 827
>         fw: 0
>         drop: 0
>         chkerr: 0
>         lenerr: 0
>         memerr: 270
>         rterr: 0
>         proterr: 0
>         opterr: 0
>         err: 0
>         cachehit: 0
> 
> In the previous outputs you can see the gap between xmit and recv
> increasing as well as the memerr.

If you are getting any memerr results, you have not given LWIP enough
resources.  I don't know off-hand what buffer types or other resources
can lead to memerr counts, but you are low on something.  These
statistics show that a quarter of your communications are failing to get
the buffers they need - not good at all.

> 
> Looks like the PC does just not respond fast enough, forcing LwIP to
> keep the packet until an ACK is received?

PC's are ridiculously fast.  Despite the network chatter, and despite
the mess of silliness Windows 10 machines are always running with their
absurd "start menu" full of adverts and other junk, they should be
handling packets in a fraction of a millisecond.  When I did the testing
I mentioned before, the test programs on the PC side (Linux rather than
Windows) was in Python, and not even particularly efficient Python - it
even had manual "sleep" calls.  Handling was well under a millisecond
per transaction.

> 
> 
> Last 6 lines from the Wireshark packet window when the above stops:
> 
> 6600	78.330701	192.168.125.130	192.168.125.128	TCP	1078	10001 &rarr; 51826 [PSH, ACK] Seq=975873 Ack=1 Win=5840 Len=1024
> 6601	78.379841	192.168.125.128	192.168.125.130	TCP	54	51826 &rarr; 10001 [ACK] Seq=1 Ack=976897 Win=63216 Len=0
> 6602	78.379850	192.168.125.128	192.168.125.130	TCP	54	[TCP Dup ACK 6601#1] 51826 &rarr; 10001 [ACK] Seq=1 Ack=976897 Win=63216 Len=0
> 6607	78.413921	192.168.125.130	192.168.125.128	TCP	566	10001 &rarr; 51826 [PSH, ACK] Seq=976897 Ack=1 Win=5840 Len=512
> 6609	78.459611	192.168.125.128	192.168.125.130	TCP	54	51826 &rarr; 10001 [ACK] Seq=1 Ack=977409 Win=64240 Len=0
> 6610	78.459619	192.168.125.128	192.168.125.130	TCP	54	[TCP Dup ACK 6609#1] 51826 &rarr; 10001 [ACK] Seq=1 Ack=977409 Win=64240 Len=0
> 
> What seems weird to me is that all ACK are followed by Dup ACK. Is this normal? See that al lot in other TCP traffic as well, but not on all ACK.
> 
> 

I /think/ the dup ack's are the result of your board failing to handle
some packets, probably related to limited buffers (giving the memerr
counts).

On Tue, 24 Sep 2019 14:22:51 +0200, Stef
<stef33d@yahooI-N-V-A-L-I-D.com.invalid> wrote:

>
>Then one port, 20 ms intervals;
>Runs for a while but ultimately ends up in repeated msDelay().
>Last statistics output (run time not always equal):
>- TCP
>        xmit: 1088
>        recv: 827
>        fw: 0
>        drop: 0
>        chkerr: 0
>        lenerr: 0
>        memerr: 270
>        rterr: 0
>        proterr: 0
>        opterr: 0
>        err: 0
>        cachehit: 0
>
>In the previous outputs you can see the gap between xmit and recv
>increasing as well as the memerr.
>
>Looks like the PC does just not respond fast enough, forcing LwIP to
>keep the packet until an ACK is received?

Make sure that you do not cause excessive loading on PC by causing all
kinds of activity on the PC.

If you use Telnet (or other display program) as the TCP client, any
screen updates will cause loading the PC. At least minimize the Telnet
window to reduce the PC loading. The same applies to Wireshark, just
capture to disk. If possible use a separate PC for Telnet and
Wireshark, but you need a hub, not a switch, so that Wireshark will
see the traffic between the embedded device and Telnet PC.

>
>
>Last 6 lines from the Wireshark packet window when the above stops:
>
>6600	78.330701	192.168.125.130	192.168.125.128	TCP	1078	10001 ? 51826 [PSH, ACK] Seq=975873 Ack=1 Win=5840 Len=1024
>6601	78.379841	192.168.125.128	192.168.125.130	TCP	54	51826 ? 10001 [ACK] Seq=1 Ack=976897 Win=63216 Len=0
>6602	78.379850	192.168.125.128	192.168.125.130	TCP	54	[TCP Dup ACK 6601#1] 51826 ? 10001 [ACK] Seq=1 Ack=976897 Win=63216 Len=0
>6607	78.413921	192.168.125.130	192.168.125.128	TCP	566	10001 ? 51826 [PSH, ACK] Seq=976897 Ack=1 Win=5840 Len=512
>6609	78.459611	192.168.125.128	192.168.125.130	TCP	54	51826 ? 10001 [ACK] Seq=1 Ack=977409 Win=64240 Len=0
>6610	78.459619	192.168.125.128	192.168.125.130	TCP	54	[TCP Dup ACK 6609#1] 51826 ? 10001 [ACK] Seq=1 Ack=977409 Win=64240 Len=0

Look at the WireShark sequence numbering (first column), why is there
a jump between 6602 and 6607 ? Is this due to display filtering? Use
capture filtering to reduce Wireshark loading.

Look at timestamps (second column), the last digit is in microseconds.
There seems to be a much larger gap than 20 ms between to embedded
sends. The time between data and Ack is much larger than 20 ms. 

The Ack sequence number has fallen seriously behind data messages.
Apparently the sender fails to buffer about 1000 data frames (500 KB)
before being acknowledged, so clearly some messages are loss.

This suggest that the Telnet screen update slows down the traffic, so
minimize Telnet screen.

>
>What seems weird to me is that all ACK are followed by Dup ACK. Is this normal? See that al lot in other TCP traffic as well, but not on all ACK.

The difference between Ack and duplicate Ack is only 8-9 us. Strange.

>> On 9/23/2019 15:04, David Brown wrote:
...
> There you go - something was badly wrong with the network.  It had a 
> Windows 10 machine on it :-)

That might be my problem as well. ;-)

Turned on LwIP statistics and did a first test with a single simulated
serial port (data not actually coming in, just generated by processor
itself).

LwIP runs on an LPC4088 and 'serial port' is opened by Docklight running
on a W10 PC.

First, 512 byte packets a 1 second intervals and printing the stats every
5 seconds.
- TCP
        xmit: 44
        recv: 45
        fw: 0
        drop: 0
        chkerr: 0
        lenerr: 0
        memerr: 0
        rterr: 0
        proterr: 0
        opterr: 0
        err: 0
        cachehit: 0

No memerr and xmit and recv at a constant 1 difference.


Then a test with one port, 10 ms intervals:
- (Aparently) Immediate hangup, stacks keeps going into msDelay().


Then one port, 20 ms intervals;
Runs for a while but ultimately ends up in repeated msDelay().
Last statistics output (run time not always equal):
- TCP
        xmit: 1088
        recv: 827
        fw: 0
        drop: 0
        chkerr: 0
        lenerr: 0
        memerr: 270
        rterr: 0
        proterr: 0
        opterr: 0
        err: 0
        cachehit: 0

In the previous outputs you can see the gap between xmit and recv
increasing as well as the memerr.

Looks like the PC does just not respond fast enough, forcing LwIP to
keep the packet until an ACK is received?


Last 6 lines from the Wireshark packet window when the above stops:

6600	78.330701	192.168.125.130	192.168.125.128	TCP	1078	10001 &rarr; 51826 [PSH, ACK] Seq=975873 Ack=1 Win=5840 Len=1024
6601	78.379841	192.168.125.128	192.168.125.130	TCP	54	51826 &rarr; 10001 [ACK] Seq=1 Ack=976897 Win=63216 Len=0
6602	78.379850	192.168.125.128	192.168.125.130	TCP	54	[TCP Dup ACK 6601#1] 51826 &rarr; 10001 [ACK] Seq=1 Ack=976897 Win=63216 Len=0
6607	78.413921	192.168.125.130	192.168.125.128	TCP	566	10001 &rarr; 51826 [PSH, ACK] Seq=976897 Ack=1 Win=5840 Len=512
6609	78.459611	192.168.125.128	192.168.125.130	TCP	54	51826 &rarr; 10001 [ACK] Seq=1 Ack=977409 Win=64240 Len=0
6610	78.459619	192.168.125.128	192.168.125.130	TCP	54	[TCP Dup ACK 6609#1] 51826 &rarr; 10001 [ACK] Seq=1 Ack=977409 Win=64240 Len=0

What seems weird to me is that all ACK are followed by Dup ACK. Is this normal? See that al lot in other TCP traffic as well, but not on all ACK.


-- 
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail)

There is no opinion so absurd that some philosopher will not express it.
		-- Marcus Tullius Cicero, "Ad familiares"