feeding a FIFO from PCI| page 2

Reply by John Larkin ●April 12, 20082008-04-12

On Sat, 12 Apr 2008 11:59:13 -0700 (PDT), Didi <dp@tgi-sci.com> wrote:

>John Larkin wrote:
>> ...
>>... Our box will output frequency sweeps, arbitrary
>> waveforms, a couple of dozen voltages that can be changed/ramped per
>> user desires, and some discrete logic levels and triggers.
>>
>> One architecture would pack an Intel-cpu SBC and a custom board in a
>> 2U rack box. The SBC would talk gigabit ethernet to the customer's
>> system and PCI to our board.
>>
>> ...
>> ...
>>
>> 5. Other ideas?
>>
>
>Since the echo you are getting so far indicates the latency using a
>x86
>may well be too high under linux or whatever, I can suggest doing some
>tiny
>DPS thing for you - with PCI and Ethernet. The latency then is no
>issue,
>tcp/ip etc. comes with it, filesystem/disk etc. If you settle for 100
>MbpS Ethernet,
>it is quite easy for me - I can reuse some of the MPC5200 designs I
>have.
>1 GbpS it will take some other part and more than 3-4 months, though.
>I am not sure I can beat the cost&time of someone writing the thing
>for you
>under linux while I do the whole thing, but I am willing to try hard
>to do so,
>the time has come when I want to make all that stuff I have more
>popular than it is now.
>

We could put a powerQuicc or a Blackfin on the board. But then we'd
need dram for the sequence storage, or we'd have to interface the
cpu's ram to the fpga some fast way, and we'd have to do the gigabit
ethernet and the tcp/ip stack and all that. We can get that stuff,
already done, with a 2 GHz dual-core CPU, for under $400.

The sbc has a lot of stuff already done. It will run Linux the day we
open the box.

John

Reply by rickman ●April 12, 20082008-04-12

On Apr 12, 8:46 pm, hal-use...@ip-64-139-1-69.sjc.megapath.net (Hal
Murray) wrote:
> >I would again suggest that you can *simplify* the project by putting
> >RAM on your board instead of a *real time* PCI interface.
>
> But he doesn't need any real time software.  All he needs
> is a big buffer in memory.  A DMA engine on the card will
> grab data from memory whenever the FIFO has room.
>
> >You can add a GB of DRAM to your board that with just a very few
> >chips, I haven't looked at the available RAM chips lately but I
> >believe they are beyond the Gbit level.  Or you can use a module and
> >plug in what ever size you need.  You can lose the PCI interface by
> >using something serial which can be done with a single MCU chip.
> >Check out the Luminary Micro parts with Ethernet including the PHY.
> >PCI is not really all that fast and getting any real speed out of it
> >will take a fair amount of programming effort.
>
> Yes, a PC is overkill for just grabbing the data and buffering it
> for the FPGA.  There may well be better overall designs that don't
> use a PC or PCI.
>
> On the other hand, the PCI part of the board design is only ~40 or
> ~70 wires.  I think the PCI logic is roughly as complicated as the
> DRAM interface logic.  (handwave)

For me the issue is not the complexity of the hardware because I think
that is in the noise for this project.  The issue is the complexity of
interfacing an FPGA to a bank of memory and to an MCU which has either
USB or Ethernet connectivity compared to the complexity of interfacing
an FPGA to a PCI bus and developing the software to support whatever
transactions will be happening over the PCI bus.  I guess if you have
designed PCI bus DMA hardware and software before, then this is not a
real issue.  The experience I had was that the hardware for the FPGA
and DRAM was done and working 100% on schedule.  The software had
significant complications and was the limiting factor in the project
schedule.

When you say that you don't need "real time software", I am missing
something.  Once started, does the DMA run to completion by itself?
Maybe I am not up to speed with current software techniques on the PC,
but I thought even DMA required real time response to keep it queued
up and running.  As far as allocating a block of memory to buffer the
data, I have no understanding of what it takes to allocate a buffer of
half a GB or more of contiguous memory.  But like I said, I am not so
familiar with this approach.

I am, however, familiar with memory interfaces.  They are well
specified in maybe a dozen pages vs. the hundreds of pages for the PCI
bus and the virtually unlimited amount of documentation (or lack
thereof) for the operating system and writing drivers for DMA.

To me the issue is that even if the DRAM hardware is about the same
complexity as the PCI bus hardware, it just seems like everything else
is a lot less complex by offloading the memory buffer onto the board.
The hard parts of this project are the real time issues.  I just seems
so much simpler to keep all of the real time aspects on the board *in
100% controllable hardware* and AWAY from the Intel CPU, the shared
PCI bus, DMA controllers and some rather arcane software.

Reply by Hal Murray ●April 13, 20082008-04-13

>When you say that you don't need "real time software", I am missing
>something.  Once started, does the DMA run to completion by itself?
>Maybe I am not up to speed with current software techniques on the PC,
>but I thought even DMA required real time response to keep it queued
>up and running.  As far as allocating a block of memory to buffer the
>data, I have no understanding of what it takes to allocate a buffer of
>half a GB or more of contiguous memory.  But like I said, I am not so
>familiar with this approach.

The basic idea is that you give the FPGA a pointer and length.
It reads memory a cache block at a time as it needs it.  When
it's done, it sets a status bit and maybe generates an interrupt.

The only thing that's different with this design and a typical
disk or network transfer is that this one will be much larger.

You might have to give it a clump of pointer/length pairs,
either stored in memory or on chip.

You could give it each piece of the clump one at a time,
but that gets you into the time constraints.



I haven't actually written the code (driver or FPGA) to do this.
I've worked on projects that did similar things.

It's possible I'm overlooking something critical.  Maybe allocating
huge (as compared to big) chunks of memory is hard.  I'm sure
a good kernel wizard can do it one way or the other.  If nothing
else, you hack the very early part of the kernel to put some memory
in it's back pocket until you ask for it.  Ugly, but effective.

-- 
These are my opinions, not necessarily my employer's.  I hate spam.

Reply by Didi ●April 13, 20082008-04-13

John Larkin wrote:
> ....
> We could put a powerQuicc or a Blackfin on the board. But then we'd
> need dram for the sequence storage, or we'd have to interface the
> cpu's ram to the fpga some fast way, and we'd have to do the gigabit
> ethernet and the tcp/ip stack and all that. We can get that stuff,
> already done, with a 2 GHz dual-core CPU, for under $400.
>
> The sbc has a lot of stuff already done. It will run Linux the day we
> open the box.
>

I know PCs are cheap. But you are after something more - which will
take some programming and latency spec meeting. I have no idea how
viable the thing is and how much time & cash it will cost you.
If I were to do it on a 5200 or other similar part I would likely do
the
actual Internet --> FIFO thing within a few days; having DPS run on
the
particular platform is hard to be predicted, if a 5200 is used a week
to a month, I would say.

Dimiter

------------------------------------------------------
Dimiter Popoff               Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Original message: http://groups.google.com/group/comp.arch.embedded/msg/9a636587b48980f3?dmode=source

Reply by Jure Newsgroups ●April 13, 20082008-04-13

"Hal Murray" <hal-usenet@ip-64-139-1-69.sjc.megapath.net> wrote in message 
news:KNOdnTzMponprpzVnZ2dnUVZ_v3inZ2d@megapath.net...
>
>>One architecture would pack an Intel-cpu SBC and a custom board in a
>>2U rack box. The SBC would talk gigabit ethernet to the customer's
>>system and PCI to our board.
>
> 2U is ugly in that you can't get full height PCI cards without
> using a riser kludge to turn the card sideways.
>
> I think PCI cards fit in 3U.  There is a short size that fits in 2U.
> (or you can cross your fingers on the riser stuff.)
>
>
>>Our board would have a PCI interface driving a biggish FIFO, say 8k
>>deep by 48 bits wide, inside an FPGA. A simple state machine/latch/mux
>>thing repacks the 32-bit pci transfers into the input of the 48-bit
>>wide fifo. The output side of the FIFO would be driving a fairly
>>simple state machine; each fifo word has an opcode field and a data
>>field, with different opcodes feeding various devices connected to the
>>physics... dds synthesizers, ttl outputs, whatever. The state machine
>>that unloads the fifo would run at 128 MHz, but one opcode is WAIT, so
>>we can slow down operations to match the realtime needs of the
>>experiment and reduce the average fifo feed rate.
>
>>OK, we finally get to a question: If we run some flavor of Linux on
>>the SBC, what's a good strategy for keeping the fifo loaded? Assuming
>>that we have the recipe for an entire experimental shot in program
>>ram, some tens of megabytes maybe, we could...
>
>
>>3. Best, if possible: set up a single DMA transfer to do the entire
>>shot. That involves a dma controller that understands that the target
>>is sometimes busy, and retries after getting bounced. I know the pci
>>bus has hooks for split transfers, but I don't know if standard
>>Intel-type dma controllers can work in this mode.
>
> I think that's what you want to do.  It comes for free.  I think
> it will all make sense if you read the PCI specs.  Or maybe
> just the specs for the PCI interface block you are going to use.
>
> Ignoring pipeline problems, the host side of a DMA read request
> doesn't know how how much data the device wants.  It just gets
> an op-code that says read or read-cache-line.  Once data
> starts flowing, either side can say I'm-done-now.  If the device
> (still) wants more data, it starts over with bus arbitration.
> The host may say "done" to let another device have a turn
> or to cross a page boundary or ...
>
>
> The DMA section of the FPGA will run in chatter mode.  When
> there is room in the FIFO for another cache block, it will
> ask for more data.  When the FIFO is near-full, it will stop
> asking.  You have to leave enough room in the FIFO to hold all
> the data in the pipeline.
>
>
> One quirk.  The driver has to allocate a chunk of physically
> contigious memory.  That probably has to happen early in the
> boot-up time so you still have a chunk of contigious memory
> to grab.
>
> -- 
> These are my opinions, not necessarily my employer's.  I hate spam.
>

would this product work ?

http://www.strategic-test.com/ultrafast2_pci-x_cards/uf2-7000.htm

I'll check in my files at work for other possible product I have seen 
before.

Thanks, Jure Z.

Reply by Nico Coesel ●April 13, 20082008-04-13

John Larkin <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote:

>Hi,
>
>I'm working on a proposal to design a box that will control a
>scientific gadget. Our box will output frequency sweeps, arbitrary
>waveforms, a couple of dozen voltages that can be changed/ramped per
>user desires, and some discrete logic levels and triggers.
>
>One architecture would pack an Intel-cpu SBC and a custom board in a
>2U rack box. The SBC would talk gigabit ethernet to the customer's
>system and PCI to our board.
>
>Something like this, maybe:
>
>http://us.kontron.com/index.php?id=226&cat=527&productid=1726
>
>Our board would have a PCI interface driving a biggish FIFO, say 8k
>deep by 48 bits wide, inside an FPGA. A simple state machine/latch/mux
>thing repacks the 32-bit pci transfers into the input of the 48-bit
>wide fifo. The output side of the FIFO would be driving a fairly
>simple state machine; each fifo word has an opcode field and a data
>field, with different opcodes feeding various devices connected to the
>physics... dds synthesizers, ttl outputs, whatever. The state machine
>that unloads the fifo would run at 128 MHz, but one opcode is WAIT, so
>we can slow down operations to match the realtime needs of the
>experiment and reduce the average fifo feed rate.
>
>OK, we finally get to a question: If we run some flavor of Linux on
>the SBC, what's a good strategy for keeping the fifo loaded? Assuming
>that we have the recipe for an entire experimental shot in program
>ram, some tens of megabytes maybe, we could...
>
>1. Have the fifo logic interrupt the cpu when the fifo is, say, half
>empty. The isr would compute how empty the fifo actually is at that
>instant and set up a short dma transfer to top it off.
>
>2. A task (or isr) would be run periodically, a thousand times per
>second might work, and it would be responsible for topping off the
>fifo, either dma or maybe just poking in the data in a loop.
>
>3. Best, if possible: set up a single DMA transfer to do the entire
>shot. That involves a dma controller that understands that the target
>is sometimes busy, and retries after getting bounced. I know the pci
>bus has hooks for split transfers, but I don't know if standard
>Intel-type dma controllers can work in this mode.

With PCI there is no DMA like DMA used to be like. A lot of people get
confused here. PCI is about pushing data into memory area's in a fast
way. The idea behind PCI is that you setup a transfer from one memory
area to another and be told when the transfer is ready.

>4. If it's a dual-core cpu, is it hard (under Linux) to assign one cpu
>to just do the fifo transfers?
>
>5. Other ideas?

Yes. Make the card a PCI master. You can prepare a buffer, lock the
buffer for PCI access, tell the card where to fetch the data and off
it goes. An interrupt when the buffer is nearly done so the driver can
prepare a new buffer is what it takes to feed the next buffer into the
card.

PCI is designed to do burst transfers. If you don't use burst
transfers, then the bandwidth will decrease dramatically, worst, the
CPU will have to wait for each transfer to finish which consumes huge
amounts of CPU cycles.

-- 
Programmeren in Almere?
E-mail naar nico@nctdevpuntnl (punt=.)

Reply by Joel Koltner ●April 13, 20082008-04-13

John,

"John Larkin" <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote in message 
news:6t1204lakho2cj19kit5ub50r5p96jg7cg@4ax.com...
> We could buy an FPGA pci soft core (or use one of the public ones) or
> even just use a PLX chip to handshake the PCI transactions for the
> fpga.

FYI, I've used the old PLX9054 (before PCI Express took over the word), and it 
was a *very* nice chip.  The board was, essentially, a frame grabber with 4GB 
of DRAM going through an FPGA containing its own "2D slicing" DMA engine (so 
that a camera looking at multiple logical "windows" could have each window 
appear as a contiguous stream of pixels) which fed the DMA engine in the 
PLX9054.  From the end-user's perspective then, what would happen would be:

1) User would request a particular frame buffer, that would already have been 
set up such that on the "local bus" (the address/data bus connecting the 
PLX9054 and the FPGA) sequential addresses would grab the correct pixels.  The 
user would want that frame buffer transferred into a contiguous buffer in 
their own user-mode memory space.
2) The device driver for the frame grabber would ask Windows for all the 
*physical* addresses of that user's frame buffer, since of course in many 
cases Window had run off and used a large number of discontinuous physical 
memory (pages) to create the user's (virtual) contiguous buffer.
3) For the benefit of the PLX9054, the device driver builds a "scatter-gather" 
list in the PC's memory, where each list entry just contains information such 
as the number of bytes to transfer, the physical address to transfer to, the 
local bus address to transfer from, and whether or not this is the last entry 
in the list.
4) The device driver writes to the appropriate control registers in the 
PLX9054... and it does the rest!  Poof!  (An interrupt was generated when it 
finished.)

In other words, the PLX9054 would start walking through the scatter-gather 
list, automatically creating read requests on the local bus and write requests 
on the PCI bus as needed, keeping its own internal FIFOs full (it had some 
modest-sized ones... maybe 64 or 128 bytes? -- I've forgotten), and breaking 
the write requests into multiple pieces as needed to keep the PCI bus protocol 
happy.  On quality motherboards, we got ~80Mbps, which was considered pretty 
decent given the 33MHz/32 bit PCI bus architecture of the day.

It was really pretty impressive.  The only caveat was that it couldn't 
transfer more than 16MB or thereabouts in one complete setup, so in software 
we just broke apart any larger transfers into multiple 16MB transfers (since 
transferring 16MB took about 200ms anyway, the additional overhead of some us 
setting up the next transfer was negligible).

I imagine the sequence of steps above is quite similar in Linux.  Although 
I've never written a Linux device driver, I've been told that they're actually 
simpler in many ways that Windows device drivers are.  If you end up using 
Windows, it's absolutely worthwhile to drop the ~$3k or so to send the guy 
who's going to write the device driver to the week-long classes by, e.g., OSR 
to learn how to do so.

My main point here is that going with a chip such as those from PLX gives you 
one heck of a lot of power that would otherwise take a LOT of time and effort 
to implement yourself.  Although for a high-volume project it probably makes 
sense to go with a soft PCI Core for the FPGA, for low volumes I'm a big 
believer in using someone else's "all in one" IC.

---Joel

Reply by Hal Murray ●April 13, 20082008-04-13

>One architecture would pack an Intel-cpu SBC and a custom board in a
>2U rack box. The SBC would talk gigabit ethernet to the customer's
>system and PCI to our board.
>
>Something like this, maybe:
>
>http://us.kontron.com/index.php?id=226&cat=527&productid=1726

Several odds and ends...

There are several Linux distributions targeted at running
without a hard disk.  That avoids the heat, space, and
the unreliability of a hard disk.

Here is one.  There are others.
  http://www.linuxonastick.com/
Almost everything gets copied to ram at boot time.
/etc is still on disk.  Maybe a few others.
If you want files preserved over booting you have
to think about it.


There are Flash disk modules that plug into 40/44 pin IDE sockets.
(no ribbon cables)  Works well with above.  The 40 pin versions
need power, typically from an IDE connector.
Google for >disk on module<.


Modern FPGAs don't get along with 5V PCI.  You can save yourself
a pile of kludgery if your target is 3V PCI.
I think 66 MHz PCI is 3V.  The board above is 5V.


If your box has room for an old/big CD (rather than the modern
thin ones), you can get LCD modules that will fit in that slot.

That lets you display the MAC address (for use with BOOTP) or
key in an IP Address to get your box off the ground.  After
that you can use ssh/web or whatever.  No keyboard or display
required at all.  (They might be handy for debugging, but ssh
generally works fine for me.)


-- 
These are my opinions, not necessarily my employer's.  I hate spam.

Reply by John Larkin ●April 13, 20082008-04-13

On Sun, 13 Apr 2008 16:25:55 -0700, "Joel Koltner"
<zapwireDASHgroups@yahoo.com> wrote:

>John,
>
>"John Larkin" <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote in message 
>news:6t1204lakho2cj19kit5ub50r5p96jg7cg@4ax.com...
>> We could buy an FPGA pci soft core (or use one of the public ones) or
>> even just use a PLX chip to handshake the PCI transactions for the
>> fpga.
>
>FYI, I've used the old PLX9054 (before PCI Express took over the word), and it 
>was a *very* nice chip.  The board was, essentially, a frame grabber with 4GB 
>of DRAM going through an FPGA containing its own "2D slicing" DMA engine (so 
>that a camera looking at multiple logical "windows" could have each window 
>appear as a contiguous stream of pixels) which fed the DMA engine in the 
>PLX9054.  From the end-user's perspective then, what would happen would be:
>
>1) User would request a particular frame buffer, that would already have been 
>set up such that on the "local bus" (the address/data bus connecting the 
>PLX9054 and the FPGA) sequential addresses would grab the correct pixels.  The 
>user would want that frame buffer transferred into a contiguous buffer in 
>their own user-mode memory space.
>2) The device driver for the frame grabber would ask Windows for all the 
>*physical* addresses of that user's frame buffer, since of course in many 
>cases Window had run off and used a large number of discontinuous physical 
>memory (pages) to create the user's (virtual) contiguous buffer.
>3) For the benefit of the PLX9054, the device driver builds a "scatter-gather" 
>list in the PC's memory, where each list entry just contains information such 
>as the number of bytes to transfer, the physical address to transfer to, the 
>local bus address to transfer from, and whether or not this is the last entry 
>in the list.
>4) The device driver writes to the appropriate control registers in the 
>PLX9054... and it does the rest!  Poof!  (An interrupt was generated when it 
>finished.)
>
>In other words, the PLX9054 would start walking through the scatter-gather 
>list, automatically creating read requests on the local bus and write requests 
>on the PCI bus as needed, keeping its own internal FIFOs full (it had some 
>modest-sized ones... maybe 64 or 128 bytes? -- I've forgotten), and breaking 
>the write requests into multiple pieces as needed to keep the PCI bus protocol 
>happy.  On quality motherboards, we got ~80Mbps, which was considered pretty 
>decent given the 33MHz/32 bit PCI bus architecture of the day.
>
>It was really pretty impressive.  The only caveat was that it couldn't 
>transfer more than 16MB or thereabouts in one complete setup, so in software 
>we just broke apart any larger transfers into multiple 16MB transfers (since 
>transferring 16MB took about 200ms anyway, the additional overhead of some us 
>setting up the next transfer was negligible).
>
>I imagine the sequence of steps above is quite similar in Linux.  Although 
>I've never written a Linux device driver, I've been told that they're actually 
>simpler in many ways that Windows device drivers are.  If you end up using 
>Windows, it's absolutely worthwhile to drop the ~$3k or so to send the guy 
>who's going to write the device driver to the week-long classes by, e.g., OSR 
>to learn how to do so.
>
>My main point here is that going with a chip such as those from PLX gives you 
>one heck of a lot of power that would otherwise take a LOT of time and effort 
>to implement yourself.  Although for a high-volume project it probably makes 
>sense to go with a soft PCI Core for the FPGA, for low volumes I'm a big 
>believer in using someone else's "all in one" IC.
>
>---Joel
>

Yup, I'm leaning towards using a PLX chip as the PCI interface. I
didn't know they were that smart!

I suspect we can persuade Linux and our application to make the shot
program (the opcodes we poke into the fpga FIFO) physically contiguous
in real memory.

Thanks

John

Reply by krw ●April 13, 20082008-04-13

In article <r4h5045s5dlicuohipc5b9l1nrpkqjc7cc@4ax.com>, 
jjlarkin@highNOTlandTHIStechnologyPART.com says...
> On Sun, 13 Apr 2008 16:25:55 -0700, "Joel Koltner"
> <zapwireDASHgroups@yahoo.com> wrote:
> 
> >John,
> >
> >"John Larkin" <jjlarkin@highNOTlandTHIStechnologyPART.com> wrote in message 
> >news:6t1204lakho2cj19kit5ub50r5p96jg7cg@4ax.com...
> >> We could buy an FPGA pci soft core (or use one of the public ones) or
> >> even just use a PLX chip to handshake the PCI transactions for the
> >> fpga.
> >
> >FYI, I've used the old PLX9054 (before PCI Express took over the word), and it 
> >was a *very* nice chip.  The board was, essentially, a frame grabber with 4GB 
> >of DRAM going through an FPGA containing its own "2D slicing" DMA engine (so 
> >that a camera looking at multiple logical "windows" could have each window 
> >appear as a contiguous stream of pixels) which fed the DMA engine in the 
> >PLX9054.  From the end-user's perspective then, what would happen would be:
> >
> >1) User would request a particular frame buffer, that would already have been 
> >set up such that on the "local bus" (the address/data bus connecting the 
> >PLX9054 and the FPGA) sequential addresses would grab the correct pixels.  The 
> >user would want that frame buffer transferred into a contiguous buffer in 
> >their own user-mode memory space.
> >2) The device driver for the frame grabber would ask Windows for all the 
> >*physical* addresses of that user's frame buffer, since of course in many 
> >cases Window had run off and used a large number of discontinuous physical 
> >memory (pages) to create the user's (virtual) contiguous buffer.
> >3) For the benefit of the PLX9054, the device driver builds a "scatter-gather" 
> >list in the PC's memory, where each list entry just contains information such 
> >as the number of bytes to transfer, the physical address to transfer to, the 
> >local bus address to transfer from, and whether or not this is the last entry 
> >in the list.
> >4) The device driver writes to the appropriate control registers in the 
> >PLX9054... and it does the rest!  Poof!  (An interrupt was generated when it 
> >finished.)
> >
> >In other words, the PLX9054 would start walking through the scatter-gather 
> >list, automatically creating read requests on the local bus and write requests 
> >on the PCI bus as needed, keeping its own internal FIFOs full (it had some 
> >modest-sized ones... maybe 64 or 128 bytes? -- I've forgotten), and breaking 
> >the write requests into multiple pieces as needed to keep the PCI bus protocol 
> >happy.  On quality motherboards, we got ~80Mbps, which was considered pretty 
> >decent given the 33MHz/32 bit PCI bus architecture of the day.
> >
> >It was really pretty impressive.  The only caveat was that it couldn't 
> >transfer more than 16MB or thereabouts in one complete setup, so in software 
> >we just broke apart any larger transfers into multiple 16MB transfers (since 
> >transferring 16MB took about 200ms anyway, the additional overhead of some us 
> >setting up the next transfer was negligible).
> >
> >I imagine the sequence of steps above is quite similar in Linux.  Although 
> >I've never written a Linux device driver, I've been told that they're actually 
> >simpler in many ways that Windows device drivers are.  If you end up using 
> >Windows, it's absolutely worthwhile to drop the ~$3k or so to send the guy 
> >who's going to write the device driver to the week-long classes by, e.g., OSR 
> >to learn how to do so.
> >
> >My main point here is that going with a chip such as those from PLX gives you 
> >one heck of a lot of power that would otherwise take a LOT of time and effort 
> >to implement yourself.  Although for a high-volume project it probably makes 
> >sense to go with a soft PCI Core for the FPGA, for low volumes I'm a big 
> >believer in using someone else's "all in one" IC.
> >
> >---Joel
> >
> 
> Yup, I'm leaning towards using a PLX chip as the PCI interface. I
> didn't know they were that smart!

They'll save you a TON of work.  PCI isn't easy, though PLX makes it 
(relatively) easy.  I also highly recommend the MindShare books as 
reference.

> I suspect we can persuade Linux and our application to make the shot
> program (the opcodes we poke into the fpga FIFO) physically contiguous
> in real memory.

-- 
Keith