EmbeddedRelated.com
Forums

PCI byte enalbes in read cycles

Started by A.D. September 11, 2007
Hi all,
I'm implementing a PCI interface in FPGA, but I'm stuck
trying to figuring out what happens with byte enables in
burst read cycles. Specifications are not clear about this
point, saying that if you perform a burst read, byte
enable are *usually* all active on all cycles. Even figures
always show this situation. This hide the real timing or
behaviour of BEs. I mean: are in general BEs referred
to the data present on the bus in next clock cycle? (BEs
signals are asserted one clock in advance during read
cycles...). Or this happens only on the first data cycle?

To be more clear, I'll try to draw what I mean... :-)

This is what specs show for a burst read cycle:

AD <addr>-----<=0=><=1=><=2=>...
BE <cmd=><=be0==============>...

But if BE need to change during data phase
will we have something like this:

AD <addr>-----<=0=><=1=><=2=>...
BE <cmd=><be0><be1><be2><be3>...

or this:

AD <addr>-----<=0=><=1=><=2=>...
BE < cmd><be0=====><be1><be2>...


Can anybody enlighten me? :-)
Thank you in advance for any answer, and excuse
me for this very specific question and for the
cross post.

Antonio




"A.D." <isd_mod@libero.ix> wrote in message 
news:q3BFi.111012$U01.922629@twister1.libero.it...
> Hi all, > I'm implementing a PCI interface in FPGA, but I'm stuck > trying to figuring out what happens with byte enables in > burst read cycles. Specifications are not clear about this > point, saying that if you perform a burst read, byte > enable are *usually* all active on all cycles. Even figures > always show this situation. This hide the real timing or > behaviour of BEs. I mean: are in general BEs referred > to the data present on the bus in next clock cycle? (BEs > signals are asserted one clock in advance during read > cycles...). Or this happens only on the first data cycle? > > To be more clear, I'll try to draw what I mean... :-) > > This is what specs show for a burst read cycle: > > AD <addr>-----<=0=><=1=><=2=>... > BE <cmd=><=be0==============>... > > But if BE need to change during data phase > will we have something like this: > > AD <addr>-----<=0=><=1=><=2=>... > BE <cmd=><be0><be1><be2><be3>... > > or this: > > AD <addr>-----<=0=><=1=><=2=>... > BE < cmd><be0=====><be1><be2>... > > > Can anybody enlighten me? :-) > Thank you in advance for any answer, and excuse > me for this very specific question and for the > cross post. > > Antonio
While PCI Express has detailed the use of byte enables for the start and end of burst transactions, the PCI bursts tend to be linear from the start word address. Read Multiple Line bursts are terminated when there's no more data to feed because of cache-line or other memory boundary restrictions. A Read Multiple Line does not request a precise number of words or bytes so byte enables don't make much sense. If you need to read from an odd read boundary, do the byte gating yourself and don't expect the PCI interface to deliver everything you think you need for the partial word that starts you off. Is your need above and beyond typical use?
A.D. wrote:

> Specifications are not clear about this > point, saying that if you perform a burst read, byte > enable are *usually* all active on all cycles. Even figures > always show this situation. This hide the real timing or > behaviour of BEs. I mean: are in general BEs referred > to the data present on the bus in next clock cycle? (BEs > signals are asserted one clock in advance during read > cycles...). Or this happens only on the first data cycle?
From PCI System Architecture, Chapter 8, pg 131... "PCI permits burst transactions where the byte enables change from one data phase to the next. Furthermore, the initiator may use any byte enable setting, consisting of contiguous or non-contiguous byte enables. During a read transaction, the initiator will typically assert all of the byte enables during each data phase (because burst reads are typically reading a stream of dwords or quadwords), but it may use any combination." As for timing, the BEs must *NOT* change during a data phase. So although the BEs are asserted before the 1st phase, they don't change until the start of the 2nd phase (which may be while the phase 1 data is still being driven, and before the phase 2 data is valid). Regards, -- Mark McDougall, Engineer Virtual Logic Pty Ltd, <http://www.vl.com.au> 21-25 King St, Rockdale, 2216 Ph: +612-9599-3255 Fax: +612-9599-3266
Mark McDougall wrote:

> As for timing, the BEs must *NOT* change during a data phase. So although > the BEs are asserted before the 1st phase, they don't change until the > start of the 2nd phase (which may be while the phase 1 data is still being > driven, and before the phase 2 data is valid).
Actually, I correct myself, they're not asserted *BEFORE* the 1st phase, but during the 1st phase... Regards, -- Mark McDougall, Engineer Virtual Logic Pty Ltd, <http://www.vl.com.au> 21-25 King St, Rockdale, 2216 Ph: +612-9599-3255 Fax: +612-9599-3266
Mark McDougall <markm@vl.com.au> wrote in message
46e73310$0$14156$5a62ac22@per-qv1-newsreader-01.iinet.net.au...
> As for timing, the BEs must *NOT* change during a data phase. > So although the BEs are asserted before the 1st phase, they > don't change until the start of the 2nd phase (which may be > while the phase 1 data is still being driven, and before the > phase 2 data is valid).
Ok, so will new BEs for the second data phase be put on the bus starting from clock 3? (I'm considering figure 3.5 of PCI 2.2 specs, that is commonly used and reproduced to show the read cycle). Antonio
A.D. wrote:

> Ok, so will new BEs for the second data phase be put on the > bus starting from clock 3? (I'm considering figure 3.5 of PCI > 2.2 specs, that is commonly used and reproduced to show > the read cycle).
No! They must not changed until the data has been transferred for the current phase. From section 3.3.1... "The C/BE# lines contain the byte enable information for data phase N+1 on the clock following the completion of data phase N." Since phase 1 ends on clock 4, the BE for phase 2 is valid on clock 5. Also see fig 3-6... Regards, -- Mark McDougall, Engineer Virtual Logic Pty Ltd, <http://www.vl.com.au> 21-25 King St, Rockdale, 2216 Ph: +612-9599-3255 Fax: +612-9599-3266
On 2007-09-11, A.D. <isd_mod@libero.ix> wrote:
> Hi all, > I'm implementing a PCI interface in FPGA, but I'm stuck > trying to figuring out what happens with byte enables in > burst read cycles.
If there are no side-effects for reading the bytes, you can just return them all. Not many modern memory architectures are going to run faster by reading anything smaller than a 32-bit word, anyway. You might want to look at the PCI-X docs. My vague recollection is that the pipelining of BE (and a few other signals) changed in PCI-X and there might be something that explains the change so as to enlighten you as to the operation of "old" PCI. -- Ben Jackson AD7GD <ben@ben.com> http://www.ben.com/
Mark McDougall <markm@vl.com.au> wrote in message
46e792c1$0$14150$5a62ac22@per-qv1-newsreader-01.iinet.net.au...
> No! They must not changed until the data has been > transferred for the current phase. > [...]
Ok, this is what I undestood at first. But suppose you want to use BEs for a simple gating of the byte lanes: you have to do it in a combinational way, or you have to insert a wait state to latch BEs values and then put gated data on the bus on the next clock. Am I wrong? Both these things are quite bad! In the first case you could obtain long propagation delay, in the second case you waste 50% of clock cycles! Regards, Antonio
A.D. wrote:
> Mark McDougall <markm@vl.com.au> wrote in message > 46e792c1$0$14150$5a62ac22@per-qv1-newsreader-01.iinet.net.au... >> No! They must not changed until the data has been >> transferred for the current phase. >> [...] > > Ok, this is what I undestood at first. But suppose you > want to use BEs for a simple gating of the byte lanes: > you have to do it in a combinational way, or you have > to insert a wait state to latch BEs values and then put > gated data on the bus on the next clock. Am I wrong? > Both these things are quite bad! In the first case you > could obtain long propagation delay, in the second > case you waste 50% of clock cycles! > > Regards, > Antonio > > > >
Perhaps you're looking at this the wrong way. Software defines the valid data for each of the bus cycles [when it sets up the data to be transferred] and the hardware takes care of transferring said data. PCI is *already* a hog of bandwidth for it's transaction system - a few more cycles with invalid data on D[x:n] won't really matter. Cheers PeteS
"A.D." <isd_mod@libero.ix> wrote in message 
news:q3BFi.111012$U01.922629@twister1.libero.it...
> Hi all, > I'm implementing a PCI interface in FPGA, but I'm stuck > trying to figuring out what happens with byte enables in > burst read cycles. Specifications are not clear about this > point, saying that if you perform a burst read, byte > enable are *usually* all active on all cycles. Even figures > always show this situation. This hide the real timing or > behaviour of BEs. I mean: are in general BEs referred > to the data present on the bus in next clock cycle? (BEs > signals are asserted one clock in advance during read > cycles...). Or this happens only on the first data cycle? > > To be more clear, I'll try to draw what I mean... :-) > > This is what specs show for a burst read cycle: > > AD <addr>-----<=0=><=1=><=2=>... > BE <cmd=><=be0==============>... > > But if BE need to change during data phase > will we have something like this: > > AD <addr>-----<=0=><=1=><=2=>... > BE <cmd=><be0><be1><be2><be3>... > > or this: > > AD <addr>-----<=0=><=1=><=2=>... > BE < cmd><be0=====><be1><be2>... > > > Can anybody enlighten me? :-) > Thank you in advance for any answer, and excuse > me for this very specific question and for the > cross post. > > Antonio > > > >
Are you trying to design at target interface or a master interface? Assuming that you are designing a target interface (your device returns the read data) for then there are two possibilities: - the act of reading a byte can change the state of your device (read side-effects) - the act of reading a byte does NOT change the state of your device (no read side-effects) If the later case (no read side-effects) then you can IGNORE the byte-enables and return all bytes for every data-phase (this is the tyipical case). An example of a device that has read side-effects would be a device that has a data FIFO rather than a RAM. The act of reading the FIFO removes an entry (thus you have read side-effects). BUT this is a poor example for burst transactions so I doubt that this is applicable to you. But there is one more thing that you should consider. How is the burst read transaction being generated? Generally, reads from a CPU (i.e. by a device driver) do NOT generate burst read transactions (they generate single DWORD reads) even for memory-mapped devices. In other words, drivers don't generally cannot generate Memory Read Line or Memory Read Multiple transactions UNLESS they use platform-specific (chipset-specific) features AND the device is mapped into a PREFETCHABLE memory address range. On the other hand.... if you are building a master interface (a device that initiates read transactions) AND you expect the target device to respect random byte enables in each data-phase of a read burst transaction then you should consider the characteristics of the target device you are addressing. Typically, your device will be reading system memory. In this case the memory will ignore the read byte enables and return all bytes (because it does NOT have read side-effects). If your master device is initating peer-to-peer transactions with a device other than system memory then you study the characteristics of that devie to understand its read behavior. And finally, remember that target devices can simply DISCONNECT after each data-phase and turn a burst transaction into a series of single data-phase transactions. So, even if the master has gone to the lengths of trying to implement use of byte enables in read burst transactions I expect most targets will simple use disconnect to decompose the bust into a series of single data-phase reads so you are UNLIKELY to get any benefit. Bottom line... you are probably worrying about implementing complexity that either doesn't matter, or won't give you the results that you desire. TC OR