EmbeddedRelated.com
Forums

Question on PCI-express verssus Standard PCI performance

Started by Benjamin Couillard July 25, 2011
Hi everyone,

I'm working on a conversion project where we needed to convert a PCI
acquisition card to a PCI-express (x1) acquisition card. The project
is essentially the same except instead that the new acquisition card
is a PCI-express endpoint instead of being a standard-PCI endpoint.
The project is implemented on a Xilinx FPGA, but I don't think my
issue is Xilinx specific.

The conversion has worked fine on all levels except one. The read
latency of PCI express is about 4 times higher than standard PCI. For
example, on the old product, it takes about 0.9 us to perform a 1-
DWORD read. With the PCI-express product it takes about 3-4 us to
perform a 1-DWORD read.  I've seen this read latency both in real-life
(with a real board) and in  VHDL Simulation so I don't think that this
is a driver issue. Do any of you have experienced similar performance
issues?

Don't get me wrong, for me PCI-express is a major step ahead, the
write burst and read burst performance is way better than standard
PCI.. Perhaps this is the reason, since most PCI-express cards are
mostly used in burst transactions, the read latency does not really
matter, therefore they sacrificed some read latency in order to obtain
better performance.

Best regards
On Mon, 25 Jul 2011 13:23:12 -0700 (PDT), Benjamin Couillard
<benjamin.couillard@gmail.com> wrote:

>Hi everyone, > >I'm working on a conversion project where we needed to convert a PCI >acquisition card to a PCI-express (x1) acquisition card. The project >is essentially the same except instead that the new acquisition card >is a PCI-express endpoint instead of being a standard-PCI endpoint. >The project is implemented on a Xilinx FPGA, but I don't think my >issue is Xilinx specific. > >The conversion has worked fine on all levels except one. The read >latency of PCI express is about 4 times higher than standard PCI. For >example, on the old product, it takes about 0.9 us to perform a 1- >DWORD read. With the PCI-express product it takes about 3-4 us to >perform a 1-DWORD read. I've seen this read latency both in real-life >(with a real board) and in VHDL Simulation so I don't think that this >is a driver issue. Do any of you have experienced similar performance >issues? > >Don't get me wrong, for me PCI-express is a major step ahead, the >write burst and read burst performance is way better than standard >PCI.. Perhaps this is the reason, since most PCI-express cards are >mostly used in burst transactions, the read latency does not really >matter, therefore they sacrificed some read latency in order to obtain >better performance.
One lane PCIe 1.x should be able to turn a word read around in about 250ns assuming not too much else is going on. Of course an excessive number of switches (or slow switches) or slow hardware on either end are obviously possible issues. But PCIe is certainly much faster than 3-4us to read a word.
On Jul 25, 9:23=A0pm, Benjamin Couillard <benjamin.couill...@gmail.com>
wrote:
> The conversion has worked fine on all levels except one. The read > latency of PCI express is about 4 times higher than standard PCI. For > example, on the old product, it takes about 0.9 us to perform a 1- > DWORD read. With the PCI-express product it takes about 3-4 us to > perform a 1-DWORD read. =A0I've seen this read latency both in real-life > (with a real board) and in =A0VHDL Simulation so I don't think that this > is a driver issue. Do any of you have experienced similar performance > issues?
I have no actual experience of experimenting with this, however, I have been interested in a latency sensitive device that may potentially use PCI-E so have been looking around for answers. Have a look at this write up, of a comparison of HyperTransport and PCI-E. The authors claim around 250 nano-seconds (page 9) to read the first byte: http://www.hypertransport.org/docs/wp/Low_Latency_Final.pdf It would be interesting to hear what is causing you to see 3-4 us? That would kill off my potential project, so I am hoping to be able to match the results in the above paper. Could there be some inaccuracy in your measurements; how do you measure the latency? Rupert
When designing with PCI or PCIe you should really try to avoid reads
as much as possible.
What do you need it for anyway? In a multitasking operating system you
are going to have microseconds of jitter on the software side in
kernel mode and tens of miliseconds in user mode anyway. So I am
wondering what the scenario is that benefits from sub us latency for
software reads?

Kolja
cronologic.de
Generally speaking PCI Express much more prone to latency than
convertional PCI because packets have to be constructed, passed
through a structure of nodes, and checked at most levels. Data
checking isn't completed, and onward transmission, until last data
arrives and CRCs are checked.

If you do a "read" this will have a packet outgoing and one coming
back so doubly worse. If you can do a DMA like operation where data is
sent from the data source and then interrupt your system to use the
data in memory.

The latency will also vary from system to system because rooting
structures differ between motherboards. The amount of other things
going on will also affect latency as different things contend for the
data pipes. Generally speaking if you are trying to do anything real
time it is something of a nightmare if you are planning using the host
motherboard processor for control functions.

You can try and make the latency smaller by using smaller packet sizes
and this sometimes helps. Ultimately if there is a real time element
to this then putting the processing and/or control on your card is
probably best for performance and accuracy.

John Adair
Home of Raggedstone2. The Spartan-6 PCIe Development Board.


On Jul 25, 9:23=A0pm, Benjamin Couillard <benjamin.couill...@gmail.com>
wrote:
> Hi everyone, > > I'm working on a conversion project where we needed to convert a PCI > acquisition card to a PCI-express (x1) acquisition card. The project > is essentially the same except instead that the new acquisition card > is a PCI-express endpoint instead of being a standard-PCI endpoint. > The project is implemented on a Xilinx FPGA, but I don't think my > issue is Xilinx specific. > > The conversion has worked fine on all levels except one. The read > latency of PCI express is about 4 times higher than standard PCI. For > example, on the old product, it takes about 0.9 us to perform a 1- > DWORD read. With the PCI-express product it takes about 3-4 us to > perform a 1-DWORD read. =A0I've seen this read latency both in real-life > (with a real board) and in =A0VHDL Simulation so I don't think that this > is a driver issue. Do any of you have experienced similar performance > issues? > > Don't get me wrong, for me PCI-express is a major step ahead, the > write burst and read burst performance is way better than standard > PCI.. Perhaps this is the reason, since most PCI-express cards are > mostly used in burst transactions, the read latency does not really > matter, therefore they sacrificed some read latency in order to obtain > better performance. > > Best regards
On Jul 26, 5:19=A0pm, John Adair <g...@enterpoint.co.uk> wrote:
> If you do a "read" this will have a packet outgoing and one coming > back so doubly worse. If you can do a DMA like operation where data is > sent from the data source and then interrupt your system to use the > data in memory.
In the paper I posted a link to, I think the times are for an interrupt, or for DMA, not a software initiated "read". Thanks for explaining the difference. Rupert
"Benjamin Couillard" <benjamin.couillard@gmail.com> wrote in message 
news:62427806-eeec-499b-a0f0-15ffafa0e3ab@w27g2000yqk.googlegroups.com...
> Hi everyone, > > I'm working on a conversion project where we needed to convert a PCI > acquisition card to a PCI-express (x1) acquisition card. The project > is essentially the same except instead that the new acquisition card > is a PCI-express endpoint instead of being a standard-PCI endpoint. > The project is implemented on a Xilinx FPGA, but I don't think my > issue is Xilinx specific. > > The conversion has worked fine on all levels except one. The read > latency of PCI express is about 4 times higher than standard PCI. For > example, on the old product, it takes about 0.9 us to perform a 1- > DWORD read. With the PCI-express product it takes about 3-4 us to > perform a 1-DWORD read. I've seen this read latency both in real-life > (with a real board) and in VHDL Simulation so I don't think that this > is a driver issue. Do any of you have experienced similar performance > issues?
Is it possible that time-stamping the data would disconnect you somewhat from the latency problem? Usually data can't be processed and presented real-time at those speeds anyway..