embedded gigabit ethernet

I have a new project coming up that calls for a data acquisition board (wot 
I have to design) to deliver said data over ethernet at a minimum of 
6.25Mbytes/second (not including packetisation/checksum/TCP/protocol 
overheads). I confess to some trepidation.

Now, I'm an old hand at comms and microprocessory. However this is a couple 
of orders of bignitude larger, bandwidth-wise, than I'm used to. I've dealt 
with various CPU families, but am currently most at home with the H8 and 
H8S. Which may or may not be relevant - I'm expecting to have to switch to a 
different architecture. No worries.

I'm also expecting to use a gigabit ethernet device with as far as possible 
zero-copy DMA capabilities. Raw data will be acquired over a bus (possibly 
the raw CPU bus, but non-DMA) from several hardware sources with relatively 
little overhead.

I have two questions:
  - Which CPU family should I be looking at? Maybe ARM, or PowerPC?
  - Which TCP/IP stack should I be looking at?

All suggestions (preferably hygienic) gratefully considered.

Steve
http://www.fivetrees.com

Reply by Bryan Hackney ●August 16, 20052005-08-16

Steve at fivetrees wrote:
> I have a new project coming up that calls for a data acquisition board (wot 
> I have to design) to deliver said data over ethernet at a minimum of 
> 6.25Mbytes/second (not including packetisation/checksum/TCP/protocol 
> overheads). I confess to some trepidation.
> 

Bandwidth woes noted. Too much of just about anything good is impossible.

I use the Axis ETRAX100LX, which is getting a little musty. Getting about
7 Mb/s with full TCP (p2p) is about max for this thing. Maybe a little bit
more - approaching 8 Mb/s.

It has 100 mbit ethernet with DMA. It is a 100 MHz core. I say it's getting
a little musty because I think they should have boosted their core speed
by now, but other than that, it has been a win.

It's Linux, and their own architecture.

> Now, I'm an old hand at comms and microprocessory. However this is a couple 
> of orders of bignitude larger, bandwidth-wise, than I'm used to. I've dealt 
> with various CPU families, but am currently most at home with the H8 and 
> H8S. Which may or may not be relevant - I'm expecting to have to switch to a 
> different architecture. No worries.
> 
> I'm also expecting to use a gigabit ethernet device with as far as possible 
> zero-copy DMA capabilities. Raw data will be acquired over a bus (possibly 
> the raw CPU bus, but non-DMA) from several hardware sources with relatively 
> little overhead.
> 
> I have two questions:
>   - Which CPU family should I be looking at? Maybe ARM, or PowerPC?
>   - Which TCP/IP stack should I be looking at?
> 
> All suggestions (preferably hygienic) gratefully considered.
> 
> Steve
> http://www.fivetrees.com 
> 
>

Reply by Jouko Holopainen ●August 16, 20052005-08-16

Steve at fivetrees wrote:
>   - Which CPU family should I be looking at? Maybe ARM, or PowerPC?

Look at Intel XScale chips. Some of them include gigabit ethernet on
chip (does not include phy). They have "microengines" capable of
unloading the TCP/IP overhead from main CPU.

-- 
  @jhol

KK (Boogiteorian alkeet / Juice Leskinen Grand Slam)

Reply by Richard H. ●August 16, 20052005-08-16

Steve at fivetrees wrote:
> I have a new project coming up that calls for a data acquisition board (wot 
> I have to design) to deliver said data over ethernet at a minimum of 
> 6.25Mbytes/second (not including packetisation/checksum/TCP/protocol 
> overheads). I confess to some trepidation.

Ouch.  50Mbps?  Sounds like some tight code.  Plausible with 100Base-T, 
but Gig-E will afford a lot more headroom.  I'd suggest documenting a 
lot of assumptions (caveats) in your scope of work.

Foremost, getting this level of performance out of TCP will prove to be 
non-trivial in terms of protocol tuning and buffer RAM.  (You'll have to 
buffer for TCP retransmission, which will affect the efficiency of your 
copying to the Ethernet NIC.)

The amount of protocol tuning and buffer RAM increases with the 
round-trip time between the two network devices, and is also heavily 
limited by the receiving device's TCP tuning.  Seriously consider if the 
application can tolerate any data loss, and whether UDP might fit the 
bill better.  It could have a significant impact on your specs.

Cheers,
Richard

Reply by Jeremy Bentham ●August 16, 20052005-08-16

"Steve at fivetrees" wrote:

>I have a new project coming up that calls for a data acquisition board (wot 
>I have to design) to deliver said data over ethernet at a minimum of 
>6.25Mbytes/second (not including packetisation/checksum/TCP/protocol 
>overheads). I confess to some trepidation.
>
>Now, I'm an old hand at comms and microprocessory. However this is a couple 
>of orders of bignitude larger, bandwidth-wise, than I'm used to. I've dealt 
>with various CPU families, but am currently most at home with the H8 and 
>H8S. Which may or may not be relevant - I'm expecting to have to switch to a 
>different architecture. No worries.
>
>I'm also expecting to use a gigabit ethernet device with as far as possible 
>zero-copy DMA capabilities. Raw data will be acquired over a bus (possibly 
>the raw CPU bus, but non-DMA) from several hardware sources with relatively 
>little overhead.
>
>I have two questions:
>  - Which CPU family should I be looking at? Maybe ARM, or PowerPC?
>  - Which TCP/IP stack should I be looking at?

For raw speed, a DSP might be best. We're currently doing a streaming
application for a customer using a TI 32C6204 and 100baseT, and it is
really quick.

We didn't use gigabit Ethernet as we thought it'd be a major increase
in complication for a negligible increase in the overall throughput.
Last time I did any tests, the main bottleneck was actually in the
data receiver; most PC-type systems are designed around
peak-and-trough Internet traffic, not continuous data capture.

The DSP should be able to transmit over 50 Mb/s continuously, but to
handle this we're having to use UDP with streaming acknowledgments;
essentially a souped-up TFTP, with backwards-compatibility for the
occasional slow transfers.

We'll be benchmarking in a few weeks; I can let you know the results.

Jeremy Bentham
Iosoft Ltd.

Reply by Bryan Hackney ●August 20, 20052005-08-20

Steve at fivetrees wrote:
> I have a new project coming up that calls for a data acquisition board (wot 
> I have to design) to deliver said data over ethernet at a minimum of 
> 6.25Mbytes/second (not including packetisation/checksum/TCP/protocol 
> overheads). I confess to some trepidation.
> 

[...]

Please let me (us) know what you decide on. This is on the edge of experience
for many if not all small systems developers. Thanks.

Reply by jro ●August 20, 20052005-08-20

Bryan Hackney wrote:
> Steve at fivetrees wrote:
> > I have a new project coming up that calls for a data acquisition board (wot
> > I have to design) to deliver said data over ethernet at a minimum of
> > 6.25Mbytes/second (not including packetisation/checksum/TCP/protocol
> > overheads). I confess to some trepidation.

I, too, have been tasked with a similar project for work recently.  I
need to be able to stream data that will be produced at a bandwidth in
excess of 200 Mb/s (from an A/D converter).  This task is primarily a
transimssion task (minimal reception required; certainly nothing
high-speed on the reception side).  Essentially, just grab the data,
packetize it up into a TCP packet, and dump it out ASAP.  The other end
of the pipe can essentially be thought of as a top of the line PC with
oodles of RAM/disk storage to support the data coming in.

While still evaluating possibilities, my current plan is to use a
Virtex4 FX12 FPGA which has an embedded PowerPC core already in the
chip and an integrated tri-mode ethernet MAC hard core also integrated
into the chip.  Xilinx also has their Gigabit System Reference Design
(can be found at:

http://www.xilinx.com/esp/wired/optical/xlnx_net/gsrd.htm

but it is currently setup to support only a few eval boards (and is a
bit much for what I am looking to do).  However, Xilinx just released
something much closer to my ideal solution in the form of an app-note
called "Minimal Footprint Tri-Mode Ethernet MAC Processing Engine",
which can be found here:

http://direct.xilinx.com/bvdocs/appnotes/xapp807.pdf

This setup uses uIP, a scaled down TCP/IP stack by Adam Dunkels,
integrated to run on an UltraController 2 (which is essentially a
utilization of the PowerPC core in the Virtex4 that requires no
external memory; 16K of program memory and 16K of data memory in the
form of BRAM are utilized right on the FPGA fabric to hold code/data).
Anyway, the app note shows an example webserver running on the ML403
eval board.  I am hoping to play with this over the next few weeks to
see what works and what doesn't work with this setup.  I'll report back
new stuff as I find it.

> Please let me (us) know what you decide on. This is on the edge of experience
> for many if not all small systems developers. Thanks.

Yeah...there definitely isn't a ton of stuff out there right now for
references on gigE in embedded stuff.  Hopefully this will change with
time (and requirements) change.

Regards,
John Orlando
www.jrobot.net

Reply by Markus Zingg ●August 20, 20052005-08-20

>This setup uses uIP, a scaled down TCP/IP stack by Adam Dunkels,
>integrated to run on an UltraController 2 (which is essentially a
>utilization of the PowerPC core in the Virtex4 that requires no
>external memory; 16K of program memory and 16K of data memory in the
>form of BRAM are utilized right on the FPGA fabric to hold code/data).
>Anyway, the app note shows an example webserver running on the ML403
>eval board.  I am hoping to play with this over the next few weeks to
>see what works and what doesn't work with this setup.  I'll report back
>new stuff as I find it.

You should not compare a webserver reading mostly static data to your
data aquisition task with a 200MB/sec bandwidth.

Under TCP, you must be able to resend segments up to the so called
flight size. That said every segment sent out which got not yet acked
by the oponent can be lost on it's way, hence must be re-sent if the
need arises. That's why TCP is reliable after all. With a 200MB/sec
bandwidth the number of not yet acked segments will be impressive. uIP
is a fine piece of code but it was designed with restricted embedded
resources in mind and obviousely this is in contrast to the very high
bandwidth your application requieres. So, unless you can "reproduce"
your aquiered data there is no way around local buffering. How do you
want to do this with only 16KB?

If you can afford to loose data, use UDP instead. If not, you will
have to solve this problem, and I'm honestly interested to hear about
your aproach.

Markus

Reply by CBFalconer ●August 20, 20052005-08-20

Markus Zingg wrote: (and omitted attribution)
> 
>> This setup uses uIP, a scaled down TCP/IP stack by Adam Dunkels,
>> integrated to run on an UltraController 2 (which is essentially a
>> utilization of the PowerPC core in the Virtex4 that requires no
>> external memory; 16K of program memory and 16K of data memory in the
>> form of BRAM are utilized right on the FPGA fabric to hold code/data).
>> Anyway, the app note shows an example webserver running on the ML403
>> eval board.  I am hoping to play with this over the next few weeks to
>> see what works and what doesn't work with this setup.  I'll report back
>> new stuff as I find it.
> 
> You should not compare a webserver reading mostly static data to your
> data aquisition task with a 200MB/sec bandwidth.
> 
> Under TCP, you must be able to resend segments up to the so called
> flight size. That said every segment sent out which got not yet acked
> by the oponent can be lost on it's way, hence must be re-sent if the
> need arises. That's why TCP is reliable after all. With a 200MB/sec
> bandwidth the number of not yet acked segments will be impressive. uIP
> is a fine piece of code but it was designed with restricted embedded
> resources in mind and obviousely this is in contrast to the very high
> bandwidth your application requieres. So, unless you can "reproduce"
> your aquiered data there is no way around local buffering. How do you
> want to do this with only 16KB?
> 
> If you can afford to loose data, use UDP instead. If not, you will
> have to solve this problem, and I'm honestly interested to hear about
> your aproach.

I have no idea what throughput he needs, but the burst speed of
200MB/S will already be a load. It might be possible to use error
correcting coding with the packets and UDP, assuming he has a
dedicated link.  He should also carefully characterize the burst
length needed and the overall throughput rate.  It may be that the
raw error rate with a dedicated link is small enough to allow
simplification.

-- 
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 "show options" at the top of the article, then click on the 
 "Reply" at the bottom of the article headers." - Keith Thompson

Reply by Steve at fivetrees ●August 20, 20052005-08-20

"Bryan Hackney" <no@body.home> wrote in message 
news:BrHNe.134134$gL1.5284@tornado.texas.rr.com...
> Steve at fivetrees wrote:
>> I have a new project coming up that calls for a data acquisition board 
>> (wot I have to design) to deliver said data over ethernet at a minimum of 
>> 6.25Mbytes/second (not including packetisation/checksum/TCP/protocol 
>> overheads). I confess to some trepidation.
>>
>
> [...]
>
> Please let me (us) know what you decide on. This is on the edge of 
> experience
> for many if not all small systems developers. Thanks.

Certainly.

Dame fortune has bought me some time - client needs a lower-bandwidth 
ethernet board first. I shall bide my time and research.

Steve
http://www.fivetrees.com

Previous12 Next

embedded gigabit ethernet

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group