EmbeddedRelated.com
Forums

embedded gigabit ethernet

Started by Steve at fivetrees August 15, 2005
I have a new project coming up that calls for a data acquisition board (wot 
I have to design) to deliver said data over ethernet at a minimum of 
6.25Mbytes/second (not including packetisation/checksum/TCP/protocol 
overheads). I confess to some trepidation.

Now, I'm an old hand at comms and microprocessory. However this is a couple 
of orders of bignitude larger, bandwidth-wise, than I'm used to. I've dealt 
with various CPU families, but am currently most at home with the H8 and 
H8S. Which may or may not be relevant - I'm expecting to have to switch to a 
different architecture. No worries.

I'm also expecting to use a gigabit ethernet device with as far as possible 
zero-copy DMA capabilities. Raw data will be acquired over a bus (possibly 
the raw CPU bus, but non-DMA) from several hardware sources with relatively 
little overhead.

I have two questions:
  - Which CPU family should I be looking at? Maybe ARM, or PowerPC?
  - Which TCP/IP stack should I be looking at?

All suggestions (preferably hygienic) gratefully considered.

Steve
http://www.fivetrees.com 


Steve at fivetrees wrote:
> I have a new project coming up that calls for a data acquisition board (wot > I have to design) to deliver said data over ethernet at a minimum of > 6.25Mbytes/second (not including packetisation/checksum/TCP/protocol > overheads). I confess to some trepidation. >
Bandwidth woes noted. Too much of just about anything good is impossible. I use the Axis ETRAX100LX, which is getting a little musty. Getting about 7 Mb/s with full TCP (p2p) is about max for this thing. Maybe a little bit more - approaching 8 Mb/s. It has 100 mbit ethernet with DMA. It is a 100 MHz core. I say it's getting a little musty because I think they should have boosted their core speed by now, but other than that, it has been a win. It's Linux, and their own architecture.
> Now, I'm an old hand at comms and microprocessory. However this is a couple > of orders of bignitude larger, bandwidth-wise, than I'm used to. I've dealt > with various CPU families, but am currently most at home with the H8 and > H8S. Which may or may not be relevant - I'm expecting to have to switch to a > different architecture. No worries. > > I'm also expecting to use a gigabit ethernet device with as far as possible > zero-copy DMA capabilities. Raw data will be acquired over a bus (possibly > the raw CPU bus, but non-DMA) from several hardware sources with relatively > little overhead. > > I have two questions: > - Which CPU family should I be looking at? Maybe ARM, or PowerPC? > - Which TCP/IP stack should I be looking at? > > All suggestions (preferably hygienic) gratefully considered. > > Steve > http://www.fivetrees.com > >
Steve at fivetrees wrote:
> - Which CPU family should I be looking at? Maybe ARM, or PowerPC?
Look at Intel XScale chips. Some of them include gigabit ethernet on chip (does not include phy). They have "microengines" capable of unloading the TCP/IP overhead from main CPU. -- @jhol KK (Boogiteorian alkeet / Juice Leskinen Grand Slam)
Steve at fivetrees wrote:
> I have a new project coming up that calls for a data acquisition board (wot > I have to design) to deliver said data over ethernet at a minimum of > 6.25Mbytes/second (not including packetisation/checksum/TCP/protocol > overheads). I confess to some trepidation.
Ouch. 50Mbps? Sounds like some tight code. Plausible with 100Base-T, but Gig-E will afford a lot more headroom. I'd suggest documenting a lot of assumptions (caveats) in your scope of work. Foremost, getting this level of performance out of TCP will prove to be non-trivial in terms of protocol tuning and buffer RAM. (You'll have to buffer for TCP retransmission, which will affect the efficiency of your copying to the Ethernet NIC.) The amount of protocol tuning and buffer RAM increases with the round-trip time between the two network devices, and is also heavily limited by the receiving device's TCP tuning. Seriously consider if the application can tolerate any data loss, and whether UDP might fit the bill better. It could have a significant impact on your specs. Cheers, Richard
"Steve at fivetrees" wrote:

>I have a new project coming up that calls for a data acquisition board (wot >I have to design) to deliver said data over ethernet at a minimum of >6.25Mbytes/second (not including packetisation/checksum/TCP/protocol >overheads). I confess to some trepidation. > >Now, I'm an old hand at comms and microprocessory. However this is a couple >of orders of bignitude larger, bandwidth-wise, than I'm used to. I've dealt >with various CPU families, but am currently most at home with the H8 and >H8S. Which may or may not be relevant - I'm expecting to have to switch to a >different architecture. No worries. > >I'm also expecting to use a gigabit ethernet device with as far as possible >zero-copy DMA capabilities. Raw data will be acquired over a bus (possibly >the raw CPU bus, but non-DMA) from several hardware sources with relatively >little overhead. > >I have two questions: > - Which CPU family should I be looking at? Maybe ARM, or PowerPC? > - Which TCP/IP stack should I be looking at?
For raw speed, a DSP might be best. We're currently doing a streaming application for a customer using a TI 32C6204 and 100baseT, and it is really quick. We didn't use gigabit Ethernet as we thought it'd be a major increase in complication for a negligible increase in the overall throughput. Last time I did any tests, the main bottleneck was actually in the data receiver; most PC-type systems are designed around peak-and-trough Internet traffic, not continuous data capture. The DSP should be able to transmit over 50 Mb/s continuously, but to handle this we're having to use UDP with streaming acknowledgments; essentially a souped-up TFTP, with backwards-compatibility for the occasional slow transfers. We'll be benchmarking in a few weeks; I can let you know the results. Jeremy Bentham Iosoft Ltd.
Steve at fivetrees wrote:
> I have a new project coming up that calls for a data acquisition board (wot > I have to design) to deliver said data over ethernet at a minimum of > 6.25Mbytes/second (not including packetisation/checksum/TCP/protocol > overheads). I confess to some trepidation. >
[...] Please let me (us) know what you decide on. This is on the edge of experience for many if not all small systems developers. Thanks.
Bryan Hackney wrote:
> Steve at fivetrees wrote: > > I have a new project coming up that calls for a data acquisition board (wot > > I have to design) to deliver said data over ethernet at a minimum of > > 6.25Mbytes/second (not including packetisation/checksum/TCP/protocol > > overheads). I confess to some trepidation.
I, too, have been tasked with a similar project for work recently. I need to be able to stream data that will be produced at a bandwidth in excess of 200 Mb/s (from an A/D converter). This task is primarily a transimssion task (minimal reception required; certainly nothing high-speed on the reception side). Essentially, just grab the data, packetize it up into a TCP packet, and dump it out ASAP. The other end of the pipe can essentially be thought of as a top of the line PC with oodles of RAM/disk storage to support the data coming in. While still evaluating possibilities, my current plan is to use a Virtex4 FX12 FPGA which has an embedded PowerPC core already in the chip and an integrated tri-mode ethernet MAC hard core also integrated into the chip. Xilinx also has their Gigabit System Reference Design (can be found at: http://www.xilinx.com/esp/wired/optical/xlnx_net/gsrd.htm but it is currently setup to support only a few eval boards (and is a bit much for what I am looking to do). However, Xilinx just released something much closer to my ideal solution in the form of an app-note called "Minimal Footprint Tri-Mode Ethernet MAC Processing Engine", which can be found here: http://direct.xilinx.com/bvdocs/appnotes/xapp807.pdf This setup uses uIP, a scaled down TCP/IP stack by Adam Dunkels, integrated to run on an UltraController 2 (which is essentially a utilization of the PowerPC core in the Virtex4 that requires no external memory; 16K of program memory and 16K of data memory in the form of BRAM are utilized right on the FPGA fabric to hold code/data). Anyway, the app note shows an example webserver running on the ML403 eval board. I am hoping to play with this over the next few weeks to see what works and what doesn't work with this setup. I'll report back new stuff as I find it.
> Please let me (us) know what you decide on. This is on the edge of experience > for many if not all small systems developers. Thanks.
Yeah...there definitely isn't a ton of stuff out there right now for references on gigE in embedded stuff. Hopefully this will change with time (and requirements) change. Regards, John Orlando www.jrobot.net
>This setup uses uIP, a scaled down TCP/IP stack by Adam Dunkels, >integrated to run on an UltraController 2 (which is essentially a >utilization of the PowerPC core in the Virtex4 that requires no >external memory; 16K of program memory and 16K of data memory in the >form of BRAM are utilized right on the FPGA fabric to hold code/data). >Anyway, the app note shows an example webserver running on the ML403 >eval board. I am hoping to play with this over the next few weeks to >see what works and what doesn't work with this setup. I'll report back >new stuff as I find it.
You should not compare a webserver reading mostly static data to your data aquisition task with a 200MB/sec bandwidth. Under TCP, you must be able to resend segments up to the so called flight size. That said every segment sent out which got not yet acked by the oponent can be lost on it's way, hence must be re-sent if the need arises. That's why TCP is reliable after all. With a 200MB/sec bandwidth the number of not yet acked segments will be impressive. uIP is a fine piece of code but it was designed with restricted embedded resources in mind and obviousely this is in contrast to the very high bandwidth your application requieres. So, unless you can "reproduce" your aquiered data there is no way around local buffering. How do you want to do this with only 16KB? If you can afford to loose data, use UDP instead. If not, you will have to solve this problem, and I'm honestly interested to hear about your aproach. Markus
Markus Zingg wrote: (and omitted attribution)
> >> This setup uses uIP, a scaled down TCP/IP stack by Adam Dunkels, >> integrated to run on an UltraController 2 (which is essentially a >> utilization of the PowerPC core in the Virtex4 that requires no >> external memory; 16K of program memory and 16K of data memory in the >> form of BRAM are utilized right on the FPGA fabric to hold code/data). >> Anyway, the app note shows an example webserver running on the ML403 >> eval board. I am hoping to play with this over the next few weeks to >> see what works and what doesn't work with this setup. I'll report back >> new stuff as I find it. > > You should not compare a webserver reading mostly static data to your > data aquisition task with a 200MB/sec bandwidth. > > Under TCP, you must be able to resend segments up to the so called > flight size. That said every segment sent out which got not yet acked > by the oponent can be lost on it's way, hence must be re-sent if the > need arises. That's why TCP is reliable after all. With a 200MB/sec > bandwidth the number of not yet acked segments will be impressive. uIP > is a fine piece of code but it was designed with restricted embedded > resources in mind and obviousely this is in contrast to the very high > bandwidth your application requieres. So, unless you can "reproduce" > your aquiered data there is no way around local buffering. How do you > want to do this with only 16KB? > > If you can afford to loose data, use UDP instead. If not, you will > have to solve this problem, and I'm honestly interested to hear about > your aproach.
I have no idea what throughput he needs, but the burst speed of 200MB/S will already be a load. It might be possible to use error correcting coding with the packets and UDP, assuming he has a dedicated link. He should also carefully characterize the burst length needed and the overall throughput rate. It may be that the raw error rate with a dedicated link is small enough to allow simplification. -- "If you want to post a followup via groups.google.com, don't use the broken "Reply" link at the bottom of the article. Click on "show options" at the top of the article, then click on the "Reply" at the bottom of the article headers." - Keith Thompson
"Bryan Hackney" <no@body.home> wrote in message 
news:BrHNe.134134$gL1.5284@tornado.texas.rr.com...
> Steve at fivetrees wrote: >> I have a new project coming up that calls for a data acquisition board >> (wot I have to design) to deliver said data over ethernet at a minimum of >> 6.25Mbytes/second (not including packetisation/checksum/TCP/protocol >> overheads). I confess to some trepidation. >> > > [...] > > Please let me (us) know what you decide on. This is on the edge of > experience > for many if not all small systems developers. Thanks.
Certainly. Dame fortune has bought me some time - client needs a lower-bandwidth ethernet board first. I shall bide my time and research. Steve http://www.fivetrees.com