Hi everyone, I'm working on a conversion project where we needed to convert a PCI acquisition card to a PCI-express (x1) acquisition card. The project is essentially the same except instead that the new acquisition card is a PCI-express endpoint instead of being a standard-PCI endpoint. The project is implemented on a Xilinx FPGA, but I don't think my issue is Xilinx specific. The conversion has worked fine on all levels except one. The read latency of PCI express is about 4 times higher than standard PCI. For example, on the old product, it takes about 0.9 us to perform a 1- DWORD read. With the PCI-express product it takes about 3-4 us to perform a 1-DWORD read. I've seen this read latency both in real-life (with a real board) and in VHDL Simulation so I don't think that this is a driver issue. Do any of you have experienced similar performance issues? Don't get me wrong, for me PCI-express is a major step ahead, the write burst and read burst performance is way better than standard PCI.. Perhaps this is the reason, since most PCI-express cards are mostly used in burst transactions, the read latency does not really matter, therefore they sacrificed some read latency in order to obtain better performance. Best regards
Question on PCI-express verssus Standard PCI performance
Started by ●July 25, 2011
Reply by ●July 25, 20112011-07-25
On Mon, 25 Jul 2011 13:23:12 -0700 (PDT), Benjamin Couillard <benjamin.couillard@gmail.com> wrote:>Hi everyone, > >I'm working on a conversion project where we needed to convert a PCI >acquisition card to a PCI-express (x1) acquisition card. The project >is essentially the same except instead that the new acquisition card >is a PCI-express endpoint instead of being a standard-PCI endpoint. >The project is implemented on a Xilinx FPGA, but I don't think my >issue is Xilinx specific. > >The conversion has worked fine on all levels except one. The read >latency of PCI express is about 4 times higher than standard PCI. For >example, on the old product, it takes about 0.9 us to perform a 1- >DWORD read. With the PCI-express product it takes about 3-4 us to >perform a 1-DWORD read. I've seen this read latency both in real-life >(with a real board) and in VHDL Simulation so I don't think that this >is a driver issue. Do any of you have experienced similar performance >issues? > >Don't get me wrong, for me PCI-express is a major step ahead, the >write burst and read burst performance is way better than standard >PCI.. Perhaps this is the reason, since most PCI-express cards are >mostly used in burst transactions, the read latency does not really >matter, therefore they sacrificed some read latency in order to obtain >better performance.One lane PCIe 1.x should be able to turn a word read around in about 250ns assuming not too much else is going on. Of course an excessive number of switches (or slow switches) or slow hardware on either end are obviously possible issues. But PCIe is certainly much faster than 3-4us to read a word.
Reply by ●July 26, 20112011-07-26
On Jul 25, 9:23=A0pm, Benjamin Couillard <benjamin.couill...@gmail.com> wrote:> The conversion has worked fine on all levels except one. The read > latency of PCI express is about 4 times higher than standard PCI. For > example, on the old product, it takes about 0.9 us to perform a 1- > DWORD read. With the PCI-express product it takes about 3-4 us to > perform a 1-DWORD read. =A0I've seen this read latency both in real-life > (with a real board) and in =A0VHDL Simulation so I don't think that this > is a driver issue. Do any of you have experienced similar performance > issues?I have no actual experience of experimenting with this, however, I have been interested in a latency sensitive device that may potentially use PCI-E so have been looking around for answers. Have a look at this write up, of a comparison of HyperTransport and PCI-E. The authors claim around 250 nano-seconds (page 9) to read the first byte: http://www.hypertransport.org/docs/wp/Low_Latency_Final.pdf It would be interesting to hear what is causing you to see 3-4 us? That would kill off my potential project, so I am hoping to be able to match the results in the above paper. Could there be some inaccuracy in your measurements; how do you measure the latency? Rupert
Reply by ●July 26, 20112011-07-26
When designing with PCI or PCIe you should really try to avoid reads as much as possible. What do you need it for anyway? In a multitasking operating system you are going to have microseconds of jitter on the software side in kernel mode and tens of miliseconds in user mode anyway. So I am wondering what the scenario is that benefits from sub us latency for software reads? Kolja cronologic.de
Reply by ●July 26, 20112011-07-26
Generally speaking PCI Express much more prone to latency than convertional PCI because packets have to be constructed, passed through a structure of nodes, and checked at most levels. Data checking isn't completed, and onward transmission, until last data arrives and CRCs are checked. If you do a "read" this will have a packet outgoing and one coming back so doubly worse. If you can do a DMA like operation where data is sent from the data source and then interrupt your system to use the data in memory. The latency will also vary from system to system because rooting structures differ between motherboards. The amount of other things going on will also affect latency as different things contend for the data pipes. Generally speaking if you are trying to do anything real time it is something of a nightmare if you are planning using the host motherboard processor for control functions. You can try and make the latency smaller by using smaller packet sizes and this sometimes helps. Ultimately if there is a real time element to this then putting the processing and/or control on your card is probably best for performance and accuracy. John Adair Home of Raggedstone2. The Spartan-6 PCIe Development Board. On Jul 25, 9:23=A0pm, Benjamin Couillard <benjamin.couill...@gmail.com> wrote:> Hi everyone, > > I'm working on a conversion project where we needed to convert a PCI > acquisition card to a PCI-express (x1) acquisition card. The project > is essentially the same except instead that the new acquisition card > is a PCI-express endpoint instead of being a standard-PCI endpoint. > The project is implemented on a Xilinx FPGA, but I don't think my > issue is Xilinx specific. > > The conversion has worked fine on all levels except one. The read > latency of PCI express is about 4 times higher than standard PCI. For > example, on the old product, it takes about 0.9 us to perform a 1- > DWORD read. With the PCI-express product it takes about 3-4 us to > perform a 1-DWORD read. =A0I've seen this read latency both in real-life > (with a real board) and in =A0VHDL Simulation so I don't think that this > is a driver issue. Do any of you have experienced similar performance > issues? > > Don't get me wrong, for me PCI-express is a major step ahead, the > write burst and read burst performance is way better than standard > PCI.. Perhaps this is the reason, since most PCI-express cards are > mostly used in burst transactions, the read latency does not really > matter, therefore they sacrificed some read latency in order to obtain > better performance. > > Best regards
Reply by ●July 26, 20112011-07-26
On Jul 26, 5:19=A0pm, John Adair <g...@enterpoint.co.uk> wrote:> If you do a "read" this will have a packet outgoing and one coming > back so doubly worse. If you can do a DMA like operation where data is > sent from the data source and then interrupt your system to use the > data in memory.In the paper I posted a link to, I think the times are for an interrupt, or for DMA, not a software initiated "read". Thanks for explaining the difference. Rupert
Reply by ●July 26, 20112011-07-26
"Benjamin Couillard" <benjamin.couillard@gmail.com> wrote in message news:62427806-eeec-499b-a0f0-15ffafa0e3ab@w27g2000yqk.googlegroups.com...> Hi everyone, > > I'm working on a conversion project where we needed to convert a PCI > acquisition card to a PCI-express (x1) acquisition card. The project > is essentially the same except instead that the new acquisition card > is a PCI-express endpoint instead of being a standard-PCI endpoint. > The project is implemented on a Xilinx FPGA, but I don't think my > issue is Xilinx specific. > > The conversion has worked fine on all levels except one. The read > latency of PCI express is about 4 times higher than standard PCI. For > example, on the old product, it takes about 0.9 us to perform a 1- > DWORD read. With the PCI-express product it takes about 3-4 us to > perform a 1-DWORD read. I've seen this read latency both in real-life > (with a real board) and in VHDL Simulation so I don't think that this > is a driver issue. Do any of you have experienced similar performance > issues?Is it possible that time-stamping the data would disconnect you somewhat from the latency problem? Usually data can't be processed and presented real-time at those speeds anyway..