Gratuitous ARP

Started by "Ste...@yahoo.com [rabbit-semi]" January 9, 2018
I have an RCM3900 module that has a TCP connection with a host. The module is receiving TCP packets at about a 1-second rate. So everything is humming along with the host sending TCP packets requesting status info, and module responds with the data.  After a some time, minutes to hours, of running, the module issues 2-gratuitous ARP packets, about a second apart. And upon receiving the next TCP packet, the module responds by closing the connection.
Both gratuitous ARP packets have the same ARP data:
Opcode: request (1)Sender MAC address: 00:90:c2:f7:fd:36Sender IP address:      192.168.1.64Target MAC address: 3f:c7:01:00:b4:e5Target IP address:       192.168.1.64
The target MAC doesn't appear to be valid.  According to what I read, an opcode of 1 is an ARP Probe or an ARP Announcement. If a probe, the sender IP and target MAC are supposed to be set to 0, and the target IP set to the IP in question (to see if it's taken), and if an announcement, the sender IP and target IP are set to the same IP and the target MAC is set to 0 to announce that this NIC is using this IP address.  The format of the above doesn't fit either format.  Could this be indicating an IP conflict?  There is no response to either message.
So you don’t see an ARP on the network that the Rabbit might be responding to. Does the target MAC vary, or is it always the same?

Are you using DHCP? If so, could this event be happening around the time of a renewal?

Does the TCP connection go to:
- the default gateway for the network
- some other local device
- a remote device (via the default gateway)

Can you enable ARP_VERBOSE for a build, and see what sorts of ARP-related messages it’s spitting out right before failing? That might give a clue as to what’s triggering the ARP request and decision to drop the connection (so TCP_VERBOSE or IP_VERBOSE or NET_VERBOSE might also provide context, but may also require too much space or slow the device down too much).

Feel free to email me directly with detailed information to debug the issue.. If there’s a bug in the DC 9.62 TCP/IP stack, I’d like to get it fixed and pushed up to the GitHub repository.

-Tom
> On Jan 9, 2018, at 10:41 AM, Steve Trigero s...@yahoo.com [rabbit-semi] wrote:
> I have an RCM3900 module that has a TCP connection with a host. The module is receiving TCP packets at about a 1-second rate. So everything is humming along with the host sending TCP packets requesting status info, and module responds with the data. After a some time, minutes to hours, of running, the module issues 2-gratuitous ARP packets, about a second apart. And upon receiving the next TCP packet, the module responds by closing the connection.
>
> Both gratuitous ARP packets have the same ARP data:
>
> Opcode: request (1)
> Sender MAC address: 00:90:c2:f7:fd:36
> Sender IP address: 192.168.1.64
> Target MAC address: 3f:c7:01:00:b4:e5
> Target IP address: 192.168.1.64
>
> The target MAC doesn't appear to be valid. According to what I read, an opcode of 1 is an ARP Probe or an ARP Announcement. If a probe, the sender IP and target MAC are supposed to be set to 0, and the target IP set to the IP in question (to see if it's taken), and if an announcement, the sender IP and target IP are set to the same IP and the target MAC is set to 0 to announce that this NIC is using this IP address. The format of the above doesn't fit either format. Could this be indicating an IP conflict?
> There is no response to either message.
This is a module that is in the field, so I don't have physical access to it. The user, in Europe, notified me that the unit keepsdisconnecting after running for a while. I asked for a wireshark dump and that is what I have. 
The module is static IP. The MAC address is always the same. I haven't seen the connection yet, but I will continue to searchthe data, it's quite large. I haven't figured out how to search for a "syn" flag in wireshark. And no, there is no ARP message that the module is responding too.
I had a problem with similar symptoms a few years ago. That problem was traced to the ARP cache being too small. I think the default cache size was 10. But there were more than 10 devices on the network. And even though the module had only one tcp connection that it was communicating on, every IP address it saw on the network would automatically get added to the cache.Well, it didn't take too long before the IP address of the host it had a connection with was push off the cache. So the connection was dropped and a lot of "who has address" messages started being sent. So I increased the cache to 20, which helped, but didn't solve the problem, just delayed it. So I increased it 64 and haven't seen a problem yet, until perhaps now. But looking at thedata, it appears that there are less than 64 devices on the network. But I haven't scanned through all it, so who knows?

On Tuesday, January 9, 2018, 3:24:24 PM PST, Tom Collins t...@tomlogic.com [rabbit-semi] wrote:

 
So you don’t see an ARP on the network that the Rabbit might be responding to.  Does the target MAC vary, or is it always the same?

Are you using DHCP?  If so, could this event be happening around the time of a renewal?
Does the TCP connection go to:- the default gateway for the network - some other local device- a remote device (via the default gateway)
Can you enable ARP_VERBOSE for a build, and see what sorts of ARP-related messages it’s spitting out right before failing?  That might give a clue as to what’s triggering the ARP request and decision to drop the connection (so TCP_VERBOSE or IP_VERBOSE or NET_VERBOSE might also provide context, but may also require too much space or slow the device down too much).
Feel free to email me directly with detailed information to debug the issue..  If there’s a bug in the DC 9.62 TCP/IP stack, I’d like to get it fixed and pushed up to the GitHub repository.
-Tom
On Jan 9, 2018, at 10:41 AM, Steve Trigero s...@yahoo.com [rabbit-semi] wrote:

I have an RCM3900 module that has a TCP connection with a host. The module is receiving TCP packets at about a 1-second rate. So everything is humming along with the host sending TCP packets requesting status info, and module responds with the data.  After a some time, minutes to hours, of running, the module issues 2-gratuitous ARP packets, about a second apart. And upon receiving the next TCP packet, the module responds by closing the connection.
Both gratuitous ARP packets have the same ARP data:
Opcode: request (1)Sender MAC address: 00:90:c2:f7:fd:36Sender IP address:      192.168.1.64Target MAC address: 3f:c7:01:00:b4:e5Target IP address:       192.168.1.64
The target MAC doesn't appear to be valid.  According to what I read, an opcode of 1 is an ARP Probe or an ARP Announcement. If a probe, the sender IP and target MAC are supposed to be set to 0, and the target IP set to the IP in question (to see if it's taken), and if an announcement, the sender IP and target IP are set to the same IP and the target MAC is set to 0 to announce that this NIC is using this IP address.  The format of the above doesn't fit either format.  Could this be indicating an IP conflict?  There is no response to either message.