EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

Ethernet connection problem

Started by seecwriter December 3, 2010
The other day I found out that a TCP/IP connection with my RCM3000
was timing out after anywhere from a few minutes to nearly an hour.
This has worked for many years, so I was a bit surprised. I'm using
DC v9.25.

First, I used wireshark to see what was happening. It showed that at
some point the Rabbit just quits responding. After half dozen retries
by the PC, it throws a timeout error and aborts the connection.

Next, I enabled TCP_VERBOSE in the Rabbit and used the debugger. No
connection issues, it ran all day.

I then turned off verbose, and just ran the debugger. Still no
problems.

Reloaded firmware without debug, problem reappears.

So I started adding printf statements at various places to try to
track what was going on and recompiled without debug. This worked in
that I can get debug strings out port A and the problem still exists.

I've narrowed down the issue to two TCP/IP API routines. After a
socket connection is established, after each tcp_tick() my app
executes the following if statement:

if ( !sock_alive(mySock) ||
(nBytes=sock_gets(mySock,buf,MSGLEN)) < 0 ) {
sock_close(mySock);
sock_state = SOCKET_CLOSING;
break;
}
else if (nBytes){
...
}
break;

After the PC throws its timeout error and closes the connection to
the Rabbit, my app continues to make it past that if statement. It
continues for several minutes, then the Rabbit reboots.

Why doesn't sock_alive() detect the closed connection? Should I be
doing something else to detect this condition?
Any thoughts on what might be happening?

I am using ASCII mode sockets. I'm thinking of changing to binary
mode, but that is a significant rewrite, so I'm trying to avoid
that if possble. Plus, I have no guarentee that binary mode will
work any better.

Steve

Steve,

It's been awhile since I've worked with DC9, but I was never a fan of "the middle releases". IIRC, 9.21 is OK, but if you go beyond that you should jump all the way to 9.62. I think that Rabbit Tech Support even has a collection of library patches to correct bugs found since that release.

-Tom

On Dec 3, 2010, at 10:25 AM, seecwriter wrote:
> I'm using
> DC v9.25.
An update.

I rewrote my socket management procedure to use binary sockets, but
it made no difference.

But removing RabbitWeb from my app appears to solve the problem. This
even though I'm not using RabbitWeb for my testing. Just basic
tcp/ip to a PC.

If I move to DC v9.62 my app no longer fits on the 3000 core modules
unless I take some feature out. If take RabbitWet out, then it fits.
But I need to have both an ethernet and a webpage interface. As it is
I can't have both a webpage interface and SNMP. It's only one or the
other, plus ethernet.

But maybe for testing I can do something so I can test my app with DC
v9.62.

Steve
--- In r..., "seecwriter" wrote:
>
> The other day I found out that a TCP/IP connection with my RCM3000
> was timing out after anywhere from a few minutes to nearly an hour.
> This has worked for many years, so I was a bit surprised. I'm using
> DC v9.25.
>
> First, I used wireshark to see what was happening. It showed that at
> some point the Rabbit just quits responding. After half dozen retries
> by the PC, it throws a timeout error and aborts the connection.
>
> Next, I enabled TCP_VERBOSE in the Rabbit and used the debugger. No
> connection issues, it ran all day.
>
> I then turned off verbose, and just ran the debugger. Still no
> problems.
>
> Reloaded firmware without debug, problem reappears.
>
> So I started adding printf statements at various places to try to
> track what was going on and recompiled without debug. This worked in
> that I can get debug strings out port A and the problem still exists.
>
> I've narrowed down the issue to two TCP/IP API routines. After a
> socket connection is established, after each tcp_tick() my app
> executes the following if statement:
>
> if ( !sock_alive(mySock) ||
> (nBytes=sock_gets(mySock,buf,MSGLEN)) < 0 ) {
> sock_close(mySock);
> sock_state = SOCKET_CLOSING;
> break;
> }
> else if (nBytes){
> ...
> }
> break;
>
> After the PC throws its timeout error and closes the connection to
> the Rabbit, my app continues to make it past that if statement. It
> continues for several minutes, then the Rabbit reboots.
>
> Why doesn't sock_alive() detect the closed connection? Should I be
> doing something else to detect this condition?
> Any thoughts on what might be happening?
>
> I am using ASCII mode sockets. I'm thinking of changing to binary
> mode, but that is a significant rewrite, so I'm trying to avoid
> that if possble. Plus, I have no guarentee that binary mode will
> work any better.
>
> Steve
>

I'm not able to shrink my code enough to fit in an RCM3000 with DC
v9.62. Which is odd because I can get it to fit in an RCM3900. Don't
they both have the amount of code space, 512k?

I tried using v9.21 but I can't get a clean compile. It throws all
kinds of errors in the HTTP.LIB. It seems to have a problem with
RabbitWeb. I don't know what that's all about.
Steve

--- In r..., Tom Collins wrote:
>
> Steve,
>
> It's been awhile since I've worked with DC9, but I was never a fan of "the middle releases". IIRC, 9.21 is OK, but if you go beyond that you should jump all the way to 9.62. I think that Rabbit Tech Support even has a collection of library patches to correct bugs found since that release.
>
> -Tom
>
> On Dec 3, 2010, at 10:25 AM, seecwriter wrote:
> > I'm using
> > DC v9.25.
>

I have the same problem with DC v. 9.62 for BL2000, but when I use DC v.
8.61 the code fit.
George G

On 12/6/2010 8:00 AM, seecwriter wrote:
>
> I'm not able to shrink my code enough to fit in an RCM3000 with DC
> v9.62. Which is odd because I can get it to fit in an RCM3900. Don't
> they both have the amount of code space, 512k?
>
> I tried using v9.21 but I can't get a clean compile. It throws all
> kinds of errors in the HTTP.LIB. It seems to have a problem with
> RabbitWeb. I don't know what that's all about.
>
> Steve
>
> --- In r...
> , Tom Collins wrote:
> >
> > Steve,
> >
> > It's been awhile since I've worked with DC9, but I was never a fan
> of "the middle releases". IIRC, 9.21 is OK, but if you go beyond that
> you should jump all the way to 9.62. I think that Rabbit Tech Support
> even has a collection of library patches to correct bugs found since
> that release.
> >
> > -Tom
> >
> > On Dec 3, 2010, at 10:25 AM, seecwriter wrote:
> > > I'm using
> > > DC v9.25.
> >
I've been rolling back the code, piece by piece, to get to the last
version that did not exhibit this problem, which was November, 2009.

The problem appears to be the define for MAX_TCP_SOCKET_BUFFERS.

#define MAX_TCP_SOCKET_BUFFERS NUMSOCKETS+2

If I remove the 2, the problem goes away. My app uses NUMSOCKETS for
its ethernet connections. I think the addition of 2-more sockets was
done when RabbitWeb was added so that there would be enough sockets
for both RabbitWeb (HTTP) and ethernet (TCP/IP).

I don't see anywhere in the RabbitWeb docs or sample programs where
the number of sockets needed to be set. If that's true, I'm not sure
why the 2 was added.

Steve
--- In r..., "seecwriter" wrote:
>
> I'm not able to shrink my code enough to fit in an RCM3000 with DC
> v9.62. Which is odd because I can get it to fit in an RCM3900. Don't
> they both have the amount of code space, 512k?
>
> I tried using v9.21 but I can't get a clean compile. It throws all
> kinds of errors in the HTTP.LIB. It seems to have a problem with
> RabbitWeb. I don't know what that's all about.
> Steve
>
> --- In r..., Tom Collins wrote:
> >
> > Steve,
> >
> > It's been awhile since I've worked with DC9, but I was never a fan of "the middle releases". IIRC, 9.21 is OK, but if you go beyond that you should jump all the way to 9.62. I think that Rabbit Tech Support even has a collection of library patches to correct bugs found since that release.
> >
> > -Tom
> >
> > On Dec 3, 2010, at 10:25 AM, seecwriter wrote:
> > > I'm using
> > > DC v9.25.
>


The 2024 Embedded Online Conference