Dummy questions from a newbie regarding how to use DMA

Until now I never used a microcontroller with an embedded DMA 
controller, so I always use the CPU to move data from memory to memory, 
from memory to peripheral, from peripheral to memory.

The micro usually have only a two-byte hardware FIFO (the byte currently 
shifting in and the previous completely received byte).
So I often implement a sw FIFO buffer for receiving data from UART.  In 
the ISR of receiving character, I move data from peripheral (UART) to 
memory (FIFO buffer).
I implement a uart_getchar() function that checks if a new byte is 
present in the FIFO an returns that byte or EOF.  This function is 
called from the application (background) not in the foreground.


#define ROLLOVER(x, max)   (((x) + 1) >= (max) ? 0 : (x + 1))

          uint8_t rx_buff[32];
          size_t rx_size;
volatile size_t rx_in;
          size_t rx_out;

/* ISR of a new received character */
void isr_rx(void) {
   size_t i = ROLLOVER(rx_in, rx_size);
   if(i == rx_out) return; // Rx FIFO full, discard character
   rx_buff[rx_in] = c;
   rx_in = i;
}

/* uart_getchar() function */
int uart_getchar(void) {
   if (rx_out == rx_in) return EOF;  // FIFO empty
   unsigned char data;
   data = rx_buff[rx_out];
   rx_out = ROLLOVER(rx_out, rx_size);
   return data;
}


Now I'm starting using new microcontrollers that embed a DMA engine, for 
example SAM C21 Cortex-M0+ from Atmel (I think many Cortex-M0+ micros 
out there integrate a DMA engine).

So I'm asking if it is possible to avoid completely the ISR and use the 
DMA to move received character in the FIFO buffer.

I read the datasheet, but I couldn't found a solution to my problem.

First of all, the destination memory address could be fixed or 
auto-incrementing, but there isn't a mechanism to wrap-around (FIFO 
buffer is a *circular* array, so after pushing N bytes, the address 
should start again from the beginning).
Maybe I have to configure the DMA for a transaction with a byte count 
equals to the size of the FIFO buffer.  When the last byte is received, 
the transaction ends and an interrupt could be raises (if correctly 
configured).  In the relevant ISR, a new DMA transaction could be 
started (with the destination memory address equals to the beginning of 
the FIFO buffer).

Another issue is how the application (background) could know how many 
bytes are present in the FIFO buffer so it can pop and process them as 
it wants.

Reply by ●August 22, 20162016-08-22

You can link descriptors in a circular manner.
Using two, pointing to each other, is the simplest.
You can know how many bytes are there by address arithmetic,
you have access to the dma descriptors addresses.
For interrupts, use terminal count interrupts. 
Better granularity if you use many "small" descriptors...

Reply by pozz ●August 22, 20162016-08-22

Il 22/08/2016 11:49, raimond.dragomir@gmail.com ha scritto:
> You can link descriptors in a circular manner.
> Using two, pointing to each other, is the simplest.

Should the two transfer descriptors (linked together in a circular 
manner) completely identical?

The transfer descriptor includes the total number of data (bytes) of the 
block transfer (BTCNT register).  This value is automatically 
decremented at each new data transfer.  At the end, BTCTN will be zero, 
so it can't be used *as is* for the next linked descriptor.

I should "re-arm" BTCNT register when a transfer descriptor ends, 
shouldn't I?


> You can know how many bytes are there by address arithmetic,
> you have access to the dma descriptors addresses.

I think I have to check BTCNT register in SRAM (transfer descriptor) 
that initially stores the total number of data for that "block 
transfer", but is automatically decremented when new data are transferred.

The application should take the count of the number of bytes already 
popped from the FIFO.

while((sizeof(FIFObuf) - BTCNT -
    number_of_bytes_already_popped_from_FIFO) > 0) {

    new_byte = FIFObuf[number_of_bytes_already_popped_from_FIFO++];
    // process new_byte
}

The only problem I see here is when one transfer is just finished and 
the new transfer (identical to the previous and linked from it) is just 
started.

BTCNT used in the arithmetic above shouldn't be the currently used 
transfer descriptor, but the last one until all the bytes from FIFO has 
popped up by the application.


> For interrupts, use terminal count interrupts.

In my barebone OS, I usually "poll" the presence of new bytes in the 
FIFO through the uart_getchar(), so I don't use interrupts.


> Better granularity if you use many "small" descriptors...

Could you explain?

Reply by rickman ●August 22, 20162016-08-22

On 8/22/2016 6:15 AM, pozz wrote:
> Il 22/08/2016 11:49, raimond.dragomir@gmail.com ha scritto:
>> You can link descriptors in a circular manner.
>> Using two, pointing to each other, is the simplest.
>
> Should the two transfer descriptors (linked together in a circular
> manner) completely identical?

That's up to you and your messages.  Think of it as a ping-pong buffer. 
You don't need to even fill the buffer.  There should be a time out 
somewhere so if no more chars are received, an interrupt is generated to 
say "a message is waiting".  Then the descriptor pointer is changed and 
the next buffer is used for the next message.

Otherwise you need to fill the buffer which may be part of a message or 
more than one message.  If you have a fixed message size, Bob's your uncle.


> The transfer descriptor includes the total number of data (bytes) of the
> block transfer (BTCNT register).  This value is automatically
> decremented at each new data transfer.  At the end, BTCTN will be zero,
> so it can't be used *as is* for the next linked descriptor.
>
> I should "re-arm" BTCNT register when a transfer descriptor ends,
> shouldn't I?
>
>
>> You can know how many bytes are there by address arithmetic,
>> you have access to the dma descriptors addresses.
>
> I think I have to check BTCNT register in SRAM (transfer descriptor)
> that initially stores the total number of data for that "block
> transfer", but is automatically decremented when new data are transferred.
>
> The application should take the count of the number of bytes already
> popped from the FIFO.
>
> while((sizeof(FIFObuf) - BTCNT -
>    number_of_bytes_already_popped_from_FIFO) > 0) {
>
>    new_byte = FIFObuf[number_of_bytes_already_popped_from_FIFO++];
>    // process new_byte
> }
>
> The only problem I see here is when one transfer is just finished and
> the new transfer (identical to the previous and linked from it) is just
> started.
>
> BTCNT used in the arithmetic above shouldn't be the currently used
> transfer descriptor, but the last one until all the bytes from FIFO has
> popped up by the application.
>
>
>> For interrupts, use terminal count interrupts.
>
> In my barebone OS, I usually "poll" the presence of new bytes in the
> FIFO through the uart_getchar(), so I don't use interrupts.
>
>
>> Better granularity if you use many "small" descriptors...
>
> Could you explain?


-- 

Rick C

Reply by pozz ●August 22, 20162016-08-22

Il 22/08/2016 12:27, rickman ha scritto:
> On 8/22/2016 6:15 AM, pozz wrote:
>> Il 22/08/2016 11:49, raimond.dragomir@gmail.com ha scritto:
>>> You can link descriptors in a circular manner.
>>> Using two, pointing to each other, is the simplest.
>>
>> Should the two transfer descriptors (linked together in a circular
>> manner) completely identical?
>
> That's up to you and your messages.  Think of it as a ping-pong buffer.
> You don't need to even fill the buffer.  There should be a time out
> somewhere so if no more chars are received, an interrupt is generated to
> say "a message is waiting".  Then the descriptor pointer is changed and
> the next buffer is used for the next message.

I'd like to use the DMA to manage the transfer from UART to a temporary 
FIFO buffer.  In this process, the details about the protocol (message 
size, fixed or not, preamble, start of frame, end of frame, ...) aren't 
known.

Hey one byte is received, put it here so the application (background) 
will be able to read and move to the final destination (pop from the 
FIFO buffer) when it has free time.


> Otherwise you need to fill the buffer which may be part of a message or
> more than one message.  If you have a fixed message size, Bob's your uncle.

No, I don't have a fixed message size.

>
>
>> The transfer descriptor includes the total number of data (bytes) of the
>> block transfer (BTCNT register).  This value is automatically
>> decremented at each new data transfer.  At the end, BTCTN will be zero,
>> so it can't be used *as is* for the next linked descriptor.
>>
>> I should "re-arm" BTCNT register when a transfer descriptor ends,
>> shouldn't I?
>>
>>
>>> You can know how many bytes are there by address arithmetic,
>>> you have access to the dma descriptors addresses.
>>
>> I think I have to check BTCNT register in SRAM (transfer descriptor)
>> that initially stores the total number of data for that "block
>> transfer", but is automatically decremented when new data are
>> transferred.
>>
>> The application should take the count of the number of bytes already
>> popped from the FIFO.
>>
>> while((sizeof(FIFObuf) - BTCNT -
>>    number_of_bytes_already_popped_from_FIFO) > 0) {
>>
>>    new_byte = FIFObuf[number_of_bytes_already_popped_from_FIFO++];
>>    // process new_byte
>> }
>>
>> The only problem I see here is when one transfer is just finished and
>> the new transfer (identical to the previous and linked from it) is just
>> started.
>>
>> BTCNT used in the arithmetic above shouldn't be the currently used
>> transfer descriptor, but the last one until all the bytes from FIFO has
>> popped up by the application.
>>
>>
>>> For interrupts, use terminal count interrupts.
>>
>> In my barebone OS, I usually "poll" the presence of new bytes in the
>> FIFO through the uart_getchar(), so I don't use interrupts.
>>
>>
>>> Better granularity if you use many "small" descriptors...
>>
>> Could you explain?
>
>

Reply by David Brown ●August 22, 20162016-08-22

On 22/08/16 10:30, pozz wrote:
> Until now I never used a microcontroller with an embedded DMA
> controller, so I always use the CPU to move data from memory to memory,
> from memory to peripheral, from peripheral to memory.
> 
> The micro usually have only a two-byte hardware FIFO (the byte currently
> shifting in and the previous completely received byte).
> So I often implement a sw FIFO buffer for receiving data from UART.  In
> the ISR of receiving character, I move data from peripheral (UART) to
> memory (FIFO buffer).
> I implement a uart_getchar() function that checks if a new byte is
> present in the FIFO an returns that byte or EOF.  This function is
> called from the application (background) not in the foreground.
> 
> 
> #define ROLLOVER(x, max)   (((x) + 1) >= (max) ? 0 : (x + 1))
> 
>          uint8_t rx_buff[32];
>          size_t rx_size;
> volatile size_t rx_in;
>          size_t rx_out;
> 
> /* ISR of a new received character */
> void isr_rx(void) {
>   size_t i = ROLLOVER(rx_in, rx_size);
>   if(i == rx_out) return; // Rx FIFO full, discard character
>   rx_buff[rx_in] = c;
>   rx_in = i;
> }
> 
> /* uart_getchar() function */
> int uart_getchar(void) {
>   if (rx_out == rx_in) return EOF;  // FIFO empty
>   unsigned char data;
>   data = rx_buff[rx_out];
>   rx_out = ROLLOVER(rx_out, rx_size);
>   return data;
> }
> 

Technically, rx_buff and rx_out should be volatile too, but it is
unlikely for it to be a problem (your compiler would have to be inlining
multiple copies of the uart_getchar function - possible with link-time
optimisation - and do some legal but unlikely re-ordering).

And have rx_size as a variable is going to be inefficient compared to
using a compile-time constant and making ROLLOVER a mask.

> 
> Now I'm starting using new microcontrollers that embed a DMA engine, for
> example SAM C21 Cortex-M0+ from Atmel (I think many Cortex-M0+ micros
> out there integrate a DMA engine).
> 
> So I'm asking if it is possible to avoid completely the ISR and use the
> DMA to move received character in the FIFO buffer.
> 
> I read the datasheet, but I couldn't found a solution to my problem.
> 
> First of all, the destination memory address could be fixed or
> auto-incrementing, but there isn't a mechanism to wrap-around (FIFO
> buffer is a *circular* array, so after pushing N bytes, the address
> should start again from the beginning).

My experience is only with Freescale's DMA engine (on Kinetis ARMs and
MPC's), which have support for circular buffers for precisely this
reason.  It seems a strange omission from Atmel to have no similar support.

> Maybe I have to configure the DMA for a transaction with a byte count
> equals to the size of the FIFO buffer.  When the last byte is received,
> the transaction ends and an interrupt could be raises (if correctly
> configured).  In the relevant ISR, a new DMA transaction could be
> started (with the destination memory address equals to the beginning of
> the FIFO buffer).

That sounds about right.

Depending on the type of communication you are expecting, and the
resources you have, it might be possible to simply have a large enough
buffer to support all legal incoming packets.

> 
> Another issue is how the application (background) could know how many
> bytes are present in the FIFO buffer so it can pop and process them as
> it wants.

Again, I don't know Atmel's DMA engine, but usually there are values you
can read (byte count, current destination pointer, etc.) that will help
here.

And you also want to check if you really /need/ the DMA here.  If the
processor is not overloaded, an ISR can be simpler and easier - and
often it does not matter if that "wastes" a few percent of your cpu
capacity.  DMA on transmit is often very simple, but for receive the
complications of timings and timeouts can make the DMA less of a win.

Reply by Dave Nadler ●August 22, 20162016-08-22

On Monday, August 22, 2016 at 7:03:55 AM UTC-4, pozz wrote:
> ...No, I don't have a fixed message size.

A variable sized message means you have to periodically poll
the results, which means stopping the DMA and likely restarting
DMA with a different buffer.

Even for fixed size messages, you'll need to do this in case
a character gets lost!

The fun starts when the DMA behavior is not guaranteed if you
start/stop during transfers: I've worked with chips where the
DMA was unusable because characters would be dropped whilst
switching (ie old NEC V25).

Calculate whether the classic ISR technique's overhead is
such that using DMA is really required. Should not but can
also happen if ISR latency is large and FIFO is small, such
that classic ISR may drop characters at high speed.

Hope that helps!
Best Regards, Dave

Reply by pozz ●August 22, 20162016-08-22

Il 22/08/2016 13:45, Dave Nadler ha scritto:
> On Monday, August 22, 2016 at 7:03:55 AM UTC-4, pozz wrote:
>> ...No, I don't have a fixed message size.
>
> A variable sized message means you have to periodically poll
> the results, which means stopping the DMA and likely restarting
> DMA with a different buffer.

Why stopping the DMA?  DMA should silently work in background, even if 
I'm checking for its results.


> Even for fixed size messages, you'll need to do this in case
> a character gets lost!
>
> The fun starts when the DMA behavior is not guaranteed if you
> start/stop during transfers: I've worked with chips where the
> DMA was unusable because characters would be dropped whilst
> switching (ie old NEC V25).
>
> Calculate whether the classic ISR technique's overhead is
> such that using DMA is really required. Should not but can
> also happen if ISR latency is large and FIFO is small, such
> that classic ISR may drop characters at high speed.
>
> Hope that helps!
> Best Regards, Dave
>

Reply by pozz ●August 22, 20162016-08-22

Il 22/08/2016 13:29, David Brown ha scritto:
> On 22/08/16 10:30, pozz wrote:
>> Until now I never used a microcontroller with an embedded DMA
>> controller, so I always use the CPU to move data from memory to memory,
>> from memory to peripheral, from peripheral to memory.
>>
>> The micro usually have only a two-byte hardware FIFO (the byte currently
>> shifting in and the previous completely received byte).
>> So I often implement a sw FIFO buffer for receiving data from UART.  In
>> the ISR of receiving character, I move data from peripheral (UART) to
>> memory (FIFO buffer).
>> I implement a uart_getchar() function that checks if a new byte is
>> present in the FIFO an returns that byte or EOF.  This function is
>> called from the application (background) not in the foreground.
>>
>>
>> #define ROLLOVER(x, max)   (((x) + 1) >= (max) ? 0 : (x + 1))
>>
>>          uint8_t rx_buff[32];
>>          size_t rx_size;
>> volatile size_t rx_in;
>>          size_t rx_out;
>>
>> /* ISR of a new received character */
>> void isr_rx(void) {
>>   size_t i = ROLLOVER(rx_in, rx_size);
>>   if(i == rx_out) return; // Rx FIFO full, discard character
>>   rx_buff[rx_in] = c;
>>   rx_in = i;
>> }
>>
>> /* uart_getchar() function */
>> int uart_getchar(void) {
>>   if (rx_out == rx_in) return EOF;  // FIFO empty
>>   unsigned char data;
>>   data = rx_buff[rx_out];
>>   rx_out = ROLLOVER(rx_out, rx_size);
>>   return data;
>> }
>>
>
> Technically, rx_buff and rx_out should be volatile too, but it is
> unlikely for it to be a problem (your compiler would have to be inlining
> multiple copies of the uart_getchar function - possible with link-time
> optimisation - and do some legal but unlikely re-ordering).
>
> And have rx_size as a variable is going to be inefficient compared to
> using a compile-time constant and making ROLLOVER a mask.
>
>
>>
>> Now I'm starting using new microcontrollers that embed a DMA engine, for
>> example SAM C21 Cortex-M0+ from Atmel (I think many Cortex-M0+ micros
>> out there integrate a DMA engine).
>>
>> So I'm asking if it is possible to avoid completely the ISR and use the
>> DMA to move received character in the FIFO buffer.
>>
>> I read the datasheet, but I couldn't found a solution to my problem.
>>
>> First of all, the destination memory address could be fixed or
>> auto-incrementing, but there isn't a mechanism to wrap-around (FIFO
>> buffer is a *circular* array, so after pushing N bytes, the address
>> should start again from the beginning).
>
> My experience is only with Freescale's DMA engine (on Kinetis ARMs and
> MPC's), which have support for circular buffers for precisely this
> reason.  It seems a strange omission from Atmel to have no similar support.

I'm not sure, maybe it is possible to automatically resume the transfer 
descriptor as soon as the transfer is complete... in this way you will 
have a circular buffer managed by DMA without any CPU intervention.


>> Maybe I have to configure the DMA for a transaction with a byte count
>> equals to the size of the FIFO buffer.  When the last byte is received,
>> the transaction ends and an interrupt could be raises (if correctly
>> configured).  In the relevant ISR, a new DMA transaction could be
>> started (with the destination memory address equals to the beginning of
>> the FIFO buffer).
>
> That sounds about right.
>
> Depending on the type of communication you are expecting, and the
> resources you have, it might be possible to simply have a large enough
> buffer to support all legal incoming packets.
>
>>
>> Another issue is how the application (background) could know how many
>> bytes are present in the FIFO buffer so it can pop and process them as
>> it wants.
>
> Again, I don't know Atmel's DMA engine, but usually there are values you
> can read (byte count, current destination pointer, etc.) that will help
> here.

I see.


> And you also want to check if you really /need/ the DMA here.  If the
> processor is not overloaded, an ISR can be simpler and easier - and
> often it does not matter if that "wastes" a few percent of your cpu
> capacity.  DMA on transmit is often very simple, but for receive the
> complications of timings and timeouts can make the DMA less of a win.

What do you mean with "timings" and "timeouts"?  I think those 
complications must be implemented even with the "simple" ISR-only approach.

Reply by David Brown ●August 22, 20162016-08-22

On 22/08/16 14:49, pozz wrote:
> Il 22/08/2016 13:29, David Brown ha scritto:

>> And you also want to check if you really /need/ the DMA here.  If the
>> processor is not overloaded, an ISR can be simpler and easier - and
>> often it does not matter if that "wastes" a few percent of your cpu
>> capacity.  DMA on transmit is often very simple, but for receive the
>> complications of timings and timeouts can make the DMA less of a win.
> 
> What do you mean with "timings" and "timeouts"?  I think those
> complications must be implemented even with the "simple" ISR-only approach.
> 

Indeed they do need to be implemented in the ISR-only approach - but
then you have fewer bits that need to interact.

For example, if you have an incoming telegram and you want to react
quickly when it has been completely received, an ISR can give you that
easily.  When the interrupt for the final character arrives, you trigger
your handling action.  But if you are using a DMA then it will just be
one more character in the buffer - you don't get an interrupt until the
buffer is full.  So you need additional methods, such as timer
interrupts, to regularly poll the DMA buffer to see if the final
character has come in.

So a DMA on receive is good for some things, but adds complications for
other tasks.

DMA on transmit, however, is usually very simple because you know
exactly how much you are sending.  It gets "fun" if you want to add to
the DMA buffer while the DMA is running - synchronising between DMA and
the processor is not always a simple task.  It is really easy to make
something that works fine most of the time (and during all your
testing), yet fails if the DMA triggers half-way through the buffer add
function.

Previous12 Next

Dummy questions from a newbie regarding how to use DMA

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group