16-bit SPI trying to read from 22-clock cycle ADC| page 4

Reply by Chris Stratton ●September 9, 20092009-09-09

On Sep 9, 5:30=A0am, Stef <stef...@yahooI-N-V-A-L-I-D.com.invalid>
wrote:

> You missed the point: The 'N' units are in a single 'transaction', with
> only a single activation of the CS. There may be some (programmable)
> delays between the bytes within the transaction, but that should not
> bother your peripheral.

If the ADC is using the SPI clock to enact the next conversion, I
could imagine it might actually be bothered, in the sense of being
sensitive to this rather extreme "jitter" though hopefully it is
designed such that it is not.

Reply by Chris Stratton ●September 9, 20092009-09-09

On Sep 9, 4:17=A0pm, Bill <a...@a.a> wrote:
> This is good, but I think that it could be better. Difficult to
> explain, but I'll try:
>
> Imagine my external ADC (with SPI interface) is sampling the analog
> input at 100 ksa/s (the TI ADS8320 that I mentioned allows that). So,
> 10 us between samples. Not much. Each sample needs 22-clock cycles
> inside each assertion of CS=3D0, so each sample needs one DMA block
> transfer (with for instance two 11-bit word transfers inside). Each
> DMA block transfer needs CPU intervention.

You might be interested to compare the blackfin's SPORT DMA
capability... there the DMA can be programmed to transfer words
separated in time without CPU intervention.  Obviously no fixed-
hardware solution (other than a built in gate array ;-) is going to
have the flexibility for all needs, but this sounds like it might be
along the lines of what you are looking for... so at least some
silicon designers seem to be thinking along your lines.

Reply by Stef ●September 9, 20092009-09-09

In comp.arch.embedded,
Bill <a@a.a> wrote:
>
> This is good, but I think that it could be better. Difficult to
> explain, but I'll try:
>
> Imagine my external ADC (with SPI interface) is sampling the analog
> input at 100 ksa/s (the TI ADS8320 that I mentioned allows that). So,
> 10 us between samples. Not much. Each sample needs 22-clock cycles
> inside each assertion of CS=0, so each sample needs one DMA block
> transfer (with for instance two 11-bit word transfers inside). Each
> DMA block transfer needs CPU intervention. So, I need CPU intervention
> every 10 us. That's a short time. Only 480 cycles of my 48 MHz SAM7.
> Since (that I know) a DMA block transfer cannot be triggered directly
> by a timer overflow or underflow, an interrupt service routine
> (triggered by a 10 us timer underflow) must be executed every so
> often, so that the CPU can manually trigger the DMA block transfer and
> collect the data.  Adding up the overhead of the interrupt context
> switching and the instructons needed to move data from and to the
> block buffers, to re-trigger the block transfer, and all this in C++,
> I think that all that may consume a "significant" portion of those 480
> cycles. And the CPU is supposed to do something with that data, and
> some other things. I see that hog as a killer, or at least as a pitty.
>
> If the SPI in the MCU allowed 22-bit (word) transfers, and the DMA
> allowed triggering the next word transfer (inside a block transfer)
> when a certain timer underflows, then the DMA blocks wouldn't need to
> be so small. Each analog sample could travel in one single SPI word
> transfer, and one DMA block could be planned to carry for instance
> 1000 word transfers. That would be one DMA block every 10 ms. The
> buffer (FIFO) memory would be larger, but the CPU intervention needed
> would be much lower. There would be the same number of useful cycles,
> but much fewer wasted cycles. There wouldn't need to exist an
> interrupt service routine executed every 10 us, which is a killer.
> That would be a good SPI and a good DMA, in my opinion, and the extra
> cost in silicon is negligible, compared to the added benefit. Why
> don't most MCUs allow that? Even cheap MCUs could include that. An MCU
> with the price of a SAM7 should include that, in my opinion.

Processing bigger blocks does save some interrupt overhead, but you
still need to handle all the data so the advantage may not be that big.
And 480 cycles is still a decent amount to do some work. ;-)

But with a bit of creativity, you can get bigger blocks from your DMA.
If you use "variable peripheral select" you can create a buffer with
your ADC transfers and dummy transfers to another chipselect in between
to get the required CS switching pattern. If you then set up the SPI
clock correctly, you can let the PDC perform the 100kHz interval
transfers up to 64k 'bytes' (including the dummies).

Another option is to use the SSC (as mentioned by others), this can
transfer up to 32 bits per transfer. Transfers can be started on an
external event on RF pin. If you tie a TC output to the RF input, you
can use the TC waveform mode to initiate the transfers.

But I agree, being able to program some interval timer (maybe TC) and
use that directly to initiate transfers to peripherals would be nice
to have. But as long as it's not there, see what is and try to get the
most out of that. And if you are not tied to the SAM7, check if there
are other CPU's that have features that suit your wishes.

-- 
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail)

A platitude is simply a truth repeated till people get tired of hearing it.
		-- Stanley Baldwin

Reply by krw ●September 9, 20092009-09-09

On Wed, 09 Sep 2009 22:17:51 +0200, Bill <a@a.a> wrote:

>Well, I was wrong in at least one thing: I thought that, with CSAAT=0,
>CS would be deasserted (high) between consecutive "word transfers"
>within one "block transfer", but it is not. I had clear from the
>beginning (from diagrams and text) that there was a way to keep CS=0
>between word transfers, but I thought that it implied CSAAT=1, and it
>is not true. CS is 0 between consecutive word transfers (of the same
>block transfer) regardless of the value of CSAAT.
>
>So, yes, I can leave CSAAT=0 permanently, there is no CPU intervention
>needed (other than at the beginning and at the end of each block
>transfer), and I can use DMA, with two 11-bit word transfers per block
>transfer.
>
>
>This is good, but I think that it could be better. Difficult to
>explain, but I'll try:
>
>Imagine my external ADC (with SPI interface) is sampling the analog
>input at 100 ksa/s (the TI ADS8320 that I mentioned allows that). So,
>10 us between samples. Not much. Each sample needs 22-clock cycles
>inside each assertion of CS=0, so each sample needs one DMA block
>transfer (with for instance two 11-bit word transfers inside). Each
>DMA block transfer needs CPU intervention. So, I need CPU intervention
>every 10 us. That's a short time. Only 480 cycles of my 48 MHz SAM7.
>Since (that I know) a DMA block transfer cannot be triggered directly
>by a timer overflow or underflow, an interrupt service routine
>(triggered by a 10 us timer underflow) must be executed every so
>often, so that the CPU can manually trigger the DMA block transfer and
>collect the data.  Adding up the overhead of the interrupt context
>switching and the instructons needed to move data from and to the
>block buffers, to re-trigger the block transfer, and all this in C++,
>I think that all that may consume a "significant" portion of those 480
>cycles. And the CPU is supposed to do something with that data, and
>some other things. I see that hog as a killer, or at least as a pitty.
>
>If the SPI in the MCU allowed 22-bit (word) transfers, and the DMA
>allowed triggering the next word transfer (inside a block transfer)
>when a certain timer underflows, then the DMA blocks wouldn't need to
>be so small. Each analog sample could travel in one single SPI word
>transfer, and one DMA block could be planned to carry for instance
>1000 word transfers. That would be one DMA block every 10 ms. The
>buffer (FIFO) memory would be larger, but the CPU intervention needed
>would be much lower. There would be the same number of useful cycles,
>but much fewer wasted cycles. There wouldn't need to exist an
>interrupt service routine executed every 10 us, which is a killer.
>That would be a good SPI and a good DMA, in my opinion, and the extra
>cost in silicon is negligible, compared to the added benefit. Why
>don't most MCUs allow that? Even cheap MCUs could include that. An MCU
>with the price of a SAM7 should include that, in my opinion.

It's not a matter of silicon area, but what SPI devices do they want
to cover.  SPI is a thousand twisty little passages, all different.
How are they going to service them all?  The bottom line is that they
put enough in to put the bullet on the front page of the datasheet.
If you want custom I/O do it in an FPGA.

Reply by Meindert Sprang ●September 10, 20092009-09-10

"Bill" <a@a.a> wrote in message
news:760ga55svavucfak34u0jcopco6o9q2gvi@4ax.com...
> Imagine my external ADC (with SPI interface) is sampling the analog
> input at 100 ksa/s (the TI ADS8320 that I mentioned allows that). So,
> 10 us between samples. Not much. Each sample needs 22-clock cycles
> inside each assertion of CS=0, so each sample needs one DMA block
> transfer (with for instance two 11-bit word transfers inside). Each
> DMA block transfer needs CPU intervention. So, I need CPU intervention
> every 10 us. That's a short time. Only 480 cycles of my 48 MHz SAM7.
> Since (that I know) a DMA block transfer cannot be triggered directly
> by a timer overflow or underflow, an interrupt service routine
> (triggered by a 10 us timer underflow) must be executed every so
> often, so that the CPU can manually trigger the DMA block transfer and
> collect the data.  Adding up the overhead of the interrupt context
> switching and the instructons needed to move data from and to the
> block buffers, to re-trigger the block transfer, and all this in C++,
> I think that all that may consume a "significant" portion of those 480
> cycles. And the CPU is supposed to do something with that data, and
> some other things. I see that hog as a killer, or at least as a pitty.

In those instances, I always revert to assembly language for the interrupt
part. Handle things as quickly as you can, and a context switch suddenly
isn't half that bad. No more than pushing and popping a few registers. Hell,
I even did a Fast interrupt handler in C on a Motorola DSP56K. The interrupt
occured every 200 ns.... (no typo). Just... be smart...

Meindert

Reply by Bill ●September 10, 20092009-09-10

>In those instances, I always revert to assembly language for the interrupt
>part. Handle things as quickly as you can, and a context switch suddenly
>isn't half that bad. No more than pushing and popping a few registers. Hell,
>I even did a Fast interrupt handler in C on a Motorola DSP56K. The interrupt
>occured every 200 ns.... (no typo). Just... be smart...

What was the clock frequency of your DSP56K?

Reply by Meindert Sprang ●September 11, 20092009-09-11

"Bill" <a@a.a> wrote in message
news:dprha5p433n19mof4o2k1nqi281bf3kenr@4ax.com...
> >In those instances, I always revert to assembly language for the
interrupt
> >part. Handle things as quickly as you can, and a context switch suddenly
> >isn't half that bad. No more than pushing and popping a few registers.
Hell,
> >I even did a Fast interrupt handler in C on a Motorola DSP56K. The
interrupt
> >occured every 200 ns.... (no typo). Just... be smart...
>
> What was the clock frequency of your DSP56K?

80MHz

Meindert

Reply by Ulf Samuelsson ●September 12, 20092009-09-12

Bill skrev:
> Well, I was wrong in at least one thing: I thought that, with CSAAT=0,
> CS would be deasserted (high) between consecutive "word transfers"
> within one "block transfer", but it is not. I had clear from the
> beginning (from diagrams and text) that there was a way to keep CS=0
> between word transfers, but I thought that it implied CSAAT=1, and it
> is not true. CS is 0 between consecutive word transfers (of the same
> block transfer) regardless of the value of CSAAT.
> 
> So, yes, I can leave CSAAT=0 permanently, there is no CPU intervention
> needed (other than at the beginning and at the end of each block
> transfer), and I can use DMA, with two 11-bit word transfers per block
> transfer.
> 
> 
> This is good, but I think that it could be better. Difficult to
> explain, but I'll try:
> 
> Imagine my external ADC (with SPI interface) is sampling the analog
> input at 100 ksa/s (the TI ADS8320 that I mentioned allows that). So,
> 10 us between samples. Not much. Each sample needs 22-clock cycles
> inside each assertion of CS=0, so each sample needs one DMA block
> transfer (with for instance two 11-bit word transfers inside). Each
> DMA block transfer needs CPU intervention. So, I need CPU intervention
> every 10 us. That's a short time. Only 480 cycles of my 48 MHz SAM7.
> Since (that I know) a DMA block transfer cannot be triggered directly
> by a timer overflow or underflow, an interrupt service routine
> (triggered by a 10 us timer underflow) must be executed every so
> often, so that the CPU can manually trigger the DMA block transfer and
> collect the data.  Adding up the overhead of the interrupt context
> switching and the instructons needed to move data from and to the
> block buffers, to re-trigger the block transfer, and all this in C++,
> I think that all that may consume a "significant" portion of those 480
> cycles. And the CPU is supposed to do something with that data, and
> some other things. I see that hog as a killer, or at least as a pitty.
> 
> If the SPI in the MCU allowed 22-bit (word) transfers, and the DMA
> allowed triggering the next word transfer (inside a block transfer)
> when a certain timer underflows, then the DMA blocks wouldn't need to
> be so small. Each analog sample could travel in one single SPI word
> transfer, and one DMA block could be planned to carry for instance
> 1000 word transfers. That would be one DMA block every 10 ms. The
> buffer (FIFO) memory would be larger, but the CPU intervention needed
> would be much lower. There would be the same number of useful cycles,
> but much fewer wasted cycles. There wouldn't need to exist an
> interrupt service routine executed every 10 us, which is a killer.
> That would be a good SPI and a good DMA, in my opinion, and the extra
> cost in silicon is negligible, compared to the added benefit. Why
> don't most MCUs allow that? Even cheap MCUs could include that. An MCU
> with the price of a SAM7 should include that, in my opinion.
> 
> Best,

An idea:

Run a timer which is connected to the SSC input clock and ADC clock.
It also clocks another timer in PWM mode generating
the ADC chip select.

The ADC will see 22 active and 10 passive bits
and the SSC will see 32 bits.

(Did not test this)

Best Regards
Ulf Samuelsson

Reply by Ulf Samuelsson ●September 12, 20092009-09-12

Bill skrev:
> Well, I was wrong in at least one thing: I thought that, with CSAAT=0,
> CS would be deasserted (high) between consecutive "word transfers"
> within one "block transfer", but it is not. I had clear from the
> beginning (from diagrams and text) that there was a way to keep CS=0
> between word transfers, but I thought that it implied CSAAT=1, and it
> is not true. CS is 0 between consecutive word transfers (of the same
> block transfer) regardless of the value of CSAAT.
> 
> So, yes, I can leave CSAAT=0 permanently, there is no CPU intervention
> needed (other than at the beginning and at the end of each block
> transfer), and I can use DMA, with two 11-bit word transfers per block
> transfer.
> 
> 
> This is good, but I think that it could be better. Difficult to
> explain, but I'll try:
> 
> Imagine my external ADC (with SPI interface) is sampling the analog
> input at 100 ksa/s (the TI ADS8320 that I mentioned allows that). So,
> 10 us between samples. Not much. Each sample needs 22-clock cycles
> inside each assertion of CS=0, so each sample needs one DMA block
> transfer (with for instance two 11-bit word transfers inside). Each
> DMA block transfer needs CPU intervention. So, I need CPU intervention
> every 10 us. That's a short time. Only 480 cycles of my 48 MHz SAM7.
> Since (that I know) a DMA block transfer cannot be triggered directly
> by a timer overflow or underflow, an interrupt service routine
> (triggered by a 10 us timer underflow) must be executed every so
> often, so that the CPU can manually trigger the DMA block transfer and
> collect the data.  Adding up the overhead of the interrupt context
> switching and the instructons needed to move data from and to the
> block buffers, to re-trigger the block transfer, and all this in C++,
> I think that all that may consume a "significant" portion of those 480
> cycles. And the CPU is supposed to do something with that data, and
> some other things. I see that hog as a killer, or at least as a pitty.
> 
> If the SPI in the MCU allowed 22-bit (word) transfers, and the DMA
> allowed triggering the next word transfer (inside a block transfer)
> when a certain timer underflows, then the DMA blocks wouldn't need to
> be so small. Each analog sample could travel in one single SPI word
> transfer, and one DMA block could be planned to carry for instance
> 1000 word transfers. That would be one DMA block every 10 ms. The
> buffer (FIFO) memory would be larger, but the CPU intervention needed
> would be much lower. There would be the same number of useful cycles,
> but much fewer wasted cycles. There wouldn't need to exist an
> interrupt service routine executed every 10 us, which is a killer.
> That would be a good SPI and a good DMA, in my opinion, and the extra
> cost in silicon is negligible, compared to the added benefit. Why
> don't most MCUs allow that? Even cheap MCUs could include that. An MCU
> with the price of a SAM7 should include that, in my opinion.
> 
> Best,

An idea:

Run a timer which is connected to the SSC input clock and ADC clock.
It also clocks another timer in PWM mode generating
the ADC chip select.

The ADC will see 22 active and 10 passive bits
and the SSC will see 32 bits.

(Did not test this)

Best Regards
Ulf Samuelsson

Reply by Bill ●September 13, 20092009-09-13

On Sun, 13 Sep 2009 01:05:07 +0200, Ulf Samuelsson <ulf@atmel.com>
wrote:

>An idea:
>
>Run a timer which is connected to the SSC input clock and ADC clock.
>It also clocks another timer in PWM mode generating
>the ADC chip select.
>
>The ADC will see 22 active and 10 passive bits
>and the SSC will see 32 bits.
>
>(Did not test this)

Hey, that's a wonderful idea!!
It opens up a broad array of new possibilities :-)

Thanks!