Reply by Wojciech M. Zabolotny May 3, 20122012-05-03
Hi,

I'd like to share with you my last development - a L3 
protocol for transmission of data between low resources
FPGA and an embedded system.

It may allow you to use low cost/low resources FPGAs
together with cheap Ethernet switches and simple 
Linux based embedded systems (or just reflashed 
Linux based Ethernet routers working as both a switch
and an embedded system) to create data acquisition
systems concentrating data from multiple channels.

Because the Ethernet link  does not provide reliable
transfer of packet, we need to assure an acknowledge
mechanism (like in TCP).
However if the transfer speed is R bytes/second, and
the maximum acknowledge latency is T, then we need
to buffer at least R*T bytes. The problem is that
at transfer rates of 100Mb/s or 1Gb/s the latency
must be really small if we want to use only internal 
RAM of FPGA for buffering.
E.g. for 100Mb/s (roughly 10MB/s) and 40kB of internal RAM,
we can accept latency of only 4ms. For 1GB/s it could
be only 400µs.

To keep latency as low as possible, I've decided to 
implement my transmission as L3 protocol.
As it is supposed to be used only in private networks,
I have decided to use unregistered 0xfade Ethernet type.

The sources of both: FPGA part (designed for Xilinx
SP601 board, but written with portability in mind)
and of Linux driver (compiled with kernel 3.3.3, but
should work also with other 3.x kernels) are published
on the alt.usenet group.
See the post: "L3 protocol for data transmission
from low resource FPGA to Linux embedded system"
The link to the Google archive is:
http://groups.google.com/group/alt.sources/browse_thread/thread/2c2511659f5869c5

I'll publish newer versions of my sources also
on my website (yet almost empty):
http://www.ise.pw.edu.pl/~wzab/fpga_l3_fade

Generally the system is very simple.
The kernel module may receive data from a few 
FPGA based "slave" devices (defined by the "max_slaves"
parameters, when loading the module).
Data received from each slave are written to the kernel
buffer, which should be mmapped to the user application,
which is supposed to preprocess the data and possibly
send the result further through standard TCP/IP 
connections.

The system of course doesn't handle routing (due
to low latency requirements). 
The FPGA even doesn't need to handle ARP protocol.
In the user application we need simply to register
a slave with particular MAC connected to the
particular network interface, and this slave will
be visible as /dev/l3_fpga0, /dev/l3_fpga1 and 
so on.
In my sources there is a very simple application,
which connects to the slave, and verifies, that this
slave sends consecutive integers.

In system like this a nontrivial problem is to set
the appropriate rate of packets sent from the
slave to the embedded system.
In my system it is achieved by introducing of certain
delay between the data packets.
This delay is adaptively adjusted during the operation
of the system. The appropriate delay is found
analysing the ratio between all sent packets and
retransmitted packets: Nretr/Nall
If the data packets are sent too quickly, the acknowledge
packets from the embedded system are received too late,
and the packet is retransmitted before acknowledge arrives.
The same may occur if the embedded system is overloaded 
with packets from different slaves and drops some packets.
Therefore if the Nretr/Nall is too high, we have to 
increase the delay. If the ratio Nretr/Nall is near to 0,
we may reduce the delay.
Such a simple algorithm works quite satisfactory.

The sources are published partly under the GPL licence
(the Linux driver and user application) and partly
under the BSD license (my FPGA sources).
Sources of FPGA part contain also very slightly 
modified Ethernet MAC 
http://opencores.org/project,ethernet_tri_mode
which is published under LGPL.
Unfortunately, I was not able to publish all
parts generated by Xilinx tools, therefore I include
only xco files.

Please note, that my sources are the first iteration.
They use some "quick and dirty" solutions.
I hope to prepare more mature version and describe it
in a normal publication (of course I'll send information
about it, as I've done with my sorter:
http://www.ise.pw.edu.pl/~wzab/fpga_heapsort, which 
was also first published in alt.sources and
comp.arch.fpga )

I hope that my solution will be useful for somebody.
-- 
Regards,
Wojciech M. Zabolotny
wzab@ise.pw.edu.pl