EmbeddedRelated.com
Forums

Dummy questions from a newbie regarding how to use DMA

Started by pozz August 22, 2016
Until now I never used a microcontroller with an embedded DMA 
controller, so I always use the CPU to move data from memory to memory, 
from memory to peripheral, from peripheral to memory.

The micro usually have only a two-byte hardware FIFO (the byte currently 
shifting in and the previous completely received byte).
So I often implement a sw FIFO buffer for receiving data from UART.  In 
the ISR of receiving character, I move data from peripheral (UART) to 
memory (FIFO buffer).
I implement a uart_getchar() function that checks if a new byte is 
present in the FIFO an returns that byte or EOF.  This function is 
called from the application (background) not in the foreground.


#define ROLLOVER(x, max)   (((x) + 1) >= (max) ? 0 : (x + 1))

          uint8_t rx_buff[32];
          size_t rx_size;
volatile size_t rx_in;
          size_t rx_out;

/* ISR of a new received character */
void isr_rx(void) {
   size_t i = ROLLOVER(rx_in, rx_size);
   if(i == rx_out) return; // Rx FIFO full, discard character
   rx_buff[rx_in] = c;
   rx_in = i;
}

/* uart_getchar() function */
int uart_getchar(void) {
   if (rx_out == rx_in) return EOF;  // FIFO empty
   unsigned char data;
   data = rx_buff[rx_out];
   rx_out = ROLLOVER(rx_out, rx_size);
   return data;
}


Now I'm starting using new microcontrollers that embed a DMA engine, for 
example SAM C21 Cortex-M0+ from Atmel (I think many Cortex-M0+ micros 
out there integrate a DMA engine).

So I'm asking if it is possible to avoid completely the ISR and use the 
DMA to move received character in the FIFO buffer.

I read the datasheet, but I couldn't found a solution to my problem.

First of all, the destination memory address could be fixed or 
auto-incrementing, but there isn't a mechanism to wrap-around (FIFO 
buffer is a *circular* array, so after pushing N bytes, the address 
should start again from the beginning).
Maybe I have to configure the DMA for a transaction with a byte count 
equals to the size of the FIFO buffer.  When the last byte is received, 
the transaction ends and an interrupt could be raises (if correctly 
configured).  In the relevant ISR, a new DMA transaction could be 
started (with the destination memory address equals to the beginning of 
the FIFO buffer).

Another issue is how the application (background) could know how many 
bytes are present in the FIFO buffer so it can pop and process them as 
it wants.
You can link descriptors in a circular manner.
Using two, pointing to each other, is the simplest.
You can know how many bytes are there by address arithmetic,
you have access to the dma descriptors addresses.
For interrupts, use terminal count interrupts. 
Better granularity if you use many "small" descriptors...

Il 22/08/2016 11:49, raimond.dragomir@gmail.com ha scritto:
> You can link descriptors in a circular manner. > Using two, pointing to each other, is the simplest.
Should the two transfer descriptors (linked together in a circular manner) completely identical? The transfer descriptor includes the total number of data (bytes) of the block transfer (BTCNT register). This value is automatically decremented at each new data transfer. At the end, BTCTN will be zero, so it can't be used *as is* for the next linked descriptor. I should "re-arm" BTCNT register when a transfer descriptor ends, shouldn't I?
> You can know how many bytes are there by address arithmetic, > you have access to the dma descriptors addresses.
I think I have to check BTCNT register in SRAM (transfer descriptor) that initially stores the total number of data for that "block transfer", but is automatically decremented when new data are transferred. The application should take the count of the number of bytes already popped from the FIFO. while((sizeof(FIFObuf) - BTCNT - number_of_bytes_already_popped_from_FIFO) > 0) { new_byte = FIFObuf[number_of_bytes_already_popped_from_FIFO++]; // process new_byte } The only problem I see here is when one transfer is just finished and the new transfer (identical to the previous and linked from it) is just started. BTCNT used in the arithmetic above shouldn't be the currently used transfer descriptor, but the last one until all the bytes from FIFO has popped up by the application.
> For interrupts, use terminal count interrupts.
In my barebone OS, I usually "poll" the presence of new bytes in the FIFO through the uart_getchar(), so I don't use interrupts.
> Better granularity if you use many "small" descriptors...
Could you explain?
On 8/22/2016 6:15 AM, pozz wrote:
> Il 22/08/2016 11:49, raimond.dragomir@gmail.com ha scritto: >> You can link descriptors in a circular manner. >> Using two, pointing to each other, is the simplest. > > Should the two transfer descriptors (linked together in a circular > manner) completely identical?
That's up to you and your messages. Think of it as a ping-pong buffer. You don't need to even fill the buffer. There should be a time out somewhere so if no more chars are received, an interrupt is generated to say "a message is waiting". Then the descriptor pointer is changed and the next buffer is used for the next message. Otherwise you need to fill the buffer which may be part of a message or more than one message. If you have a fixed message size, Bob's your uncle.
> The transfer descriptor includes the total number of data (bytes) of the > block transfer (BTCNT register). This value is automatically > decremented at each new data transfer. At the end, BTCTN will be zero, > so it can't be used *as is* for the next linked descriptor. > > I should "re-arm" BTCNT register when a transfer descriptor ends, > shouldn't I? > > >> You can know how many bytes are there by address arithmetic, >> you have access to the dma descriptors addresses. > > I think I have to check BTCNT register in SRAM (transfer descriptor) > that initially stores the total number of data for that "block > transfer", but is automatically decremented when new data are transferred. > > The application should take the count of the number of bytes already > popped from the FIFO. > > while((sizeof(FIFObuf) - BTCNT - > number_of_bytes_already_popped_from_FIFO) > 0) { > > new_byte = FIFObuf[number_of_bytes_already_popped_from_FIFO++]; > // process new_byte > } > > The only problem I see here is when one transfer is just finished and > the new transfer (identical to the previous and linked from it) is just > started. > > BTCNT used in the arithmetic above shouldn't be the currently used > transfer descriptor, but the last one until all the bytes from FIFO has > popped up by the application. > > >> For interrupts, use terminal count interrupts. > > In my barebone OS, I usually "poll" the presence of new bytes in the > FIFO through the uart_getchar(), so I don't use interrupts. > > >> Better granularity if you use many "small" descriptors... > > Could you explain?
-- Rick C
Il 22/08/2016 12:27, rickman ha scritto:
> On 8/22/2016 6:15 AM, pozz wrote: >> Il 22/08/2016 11:49, raimond.dragomir@gmail.com ha scritto: >>> You can link descriptors in a circular manner. >>> Using two, pointing to each other, is the simplest. >> >> Should the two transfer descriptors (linked together in a circular >> manner) completely identical? > > That's up to you and your messages. Think of it as a ping-pong buffer. > You don't need to even fill the buffer. There should be a time out > somewhere so if no more chars are received, an interrupt is generated to > say "a message is waiting". Then the descriptor pointer is changed and > the next buffer is used for the next message.
I'd like to use the DMA to manage the transfer from UART to a temporary FIFO buffer. In this process, the details about the protocol (message size, fixed or not, preamble, start of frame, end of frame, ...) aren't known. Hey one byte is received, put it here so the application (background) will be able to read and move to the final destination (pop from the FIFO buffer) when it has free time.
> Otherwise you need to fill the buffer which may be part of a message or > more than one message. If you have a fixed message size, Bob's your uncle.
No, I don't have a fixed message size.
> > >> The transfer descriptor includes the total number of data (bytes) of the >> block transfer (BTCNT register). This value is automatically >> decremented at each new data transfer. At the end, BTCTN will be zero, >> so it can't be used *as is* for the next linked descriptor. >> >> I should "re-arm" BTCNT register when a transfer descriptor ends, >> shouldn't I? >> >> >>> You can know how many bytes are there by address arithmetic, >>> you have access to the dma descriptors addresses. >> >> I think I have to check BTCNT register in SRAM (transfer descriptor) >> that initially stores the total number of data for that "block >> transfer", but is automatically decremented when new data are >> transferred. >> >> The application should take the count of the number of bytes already >> popped from the FIFO. >> >> while((sizeof(FIFObuf) - BTCNT - >> number_of_bytes_already_popped_from_FIFO) > 0) { >> >> new_byte = FIFObuf[number_of_bytes_already_popped_from_FIFO++]; >> // process new_byte >> } >> >> The only problem I see here is when one transfer is just finished and >> the new transfer (identical to the previous and linked from it) is just >> started. >> >> BTCNT used in the arithmetic above shouldn't be the currently used >> transfer descriptor, but the last one until all the bytes from FIFO has >> popped up by the application. >> >> >>> For interrupts, use terminal count interrupts. >> >> In my barebone OS, I usually "poll" the presence of new bytes in the >> FIFO through the uart_getchar(), so I don't use interrupts. >> >> >>> Better granularity if you use many "small" descriptors... >> >> Could you explain? > >
On 22/08/16 10:30, pozz wrote:
> Until now I never used a microcontroller with an embedded DMA > controller, so I always use the CPU to move data from memory to memory, > from memory to peripheral, from peripheral to memory. > > The micro usually have only a two-byte hardware FIFO (the byte currently > shifting in and the previous completely received byte). > So I often implement a sw FIFO buffer for receiving data from UART. In > the ISR of receiving character, I move data from peripheral (UART) to > memory (FIFO buffer). > I implement a uart_getchar() function that checks if a new byte is > present in the FIFO an returns that byte or EOF. This function is > called from the application (background) not in the foreground. > > > #define ROLLOVER(x, max) (((x) + 1) >= (max) ? 0 : (x + 1)) > > uint8_t rx_buff[32]; > size_t rx_size; > volatile size_t rx_in; > size_t rx_out; > > /* ISR of a new received character */ > void isr_rx(void) { > size_t i = ROLLOVER(rx_in, rx_size); > if(i == rx_out) return; // Rx FIFO full, discard character > rx_buff[rx_in] = c; > rx_in = i; > } > > /* uart_getchar() function */ > int uart_getchar(void) { > if (rx_out == rx_in) return EOF; // FIFO empty > unsigned char data; > data = rx_buff[rx_out]; > rx_out = ROLLOVER(rx_out, rx_size); > return data; > } >
Technically, rx_buff and rx_out should be volatile too, but it is unlikely for it to be a problem (your compiler would have to be inlining multiple copies of the uart_getchar function - possible with link-time optimisation - and do some legal but unlikely re-ordering). And have rx_size as a variable is going to be inefficient compared to using a compile-time constant and making ROLLOVER a mask.
> > Now I'm starting using new microcontrollers that embed a DMA engine, for > example SAM C21 Cortex-M0+ from Atmel (I think many Cortex-M0+ micros > out there integrate a DMA engine). > > So I'm asking if it is possible to avoid completely the ISR and use the > DMA to move received character in the FIFO buffer. > > I read the datasheet, but I couldn't found a solution to my problem. > > First of all, the destination memory address could be fixed or > auto-incrementing, but there isn't a mechanism to wrap-around (FIFO > buffer is a *circular* array, so after pushing N bytes, the address > should start again from the beginning).
My experience is only with Freescale's DMA engine (on Kinetis ARMs and MPC's), which have support for circular buffers for precisely this reason. It seems a strange omission from Atmel to have no similar support.
> Maybe I have to configure the DMA for a transaction with a byte count > equals to the size of the FIFO buffer. When the last byte is received, > the transaction ends and an interrupt could be raises (if correctly > configured). In the relevant ISR, a new DMA transaction could be > started (with the destination memory address equals to the beginning of > the FIFO buffer).
That sounds about right. Depending on the type of communication you are expecting, and the resources you have, it might be possible to simply have a large enough buffer to support all legal incoming packets.
> > Another issue is how the application (background) could know how many > bytes are present in the FIFO buffer so it can pop and process them as > it wants.
Again, I don't know Atmel's DMA engine, but usually there are values you can read (byte count, current destination pointer, etc.) that will help here. And you also want to check if you really /need/ the DMA here. If the processor is not overloaded, an ISR can be simpler and easier - and often it does not matter if that "wastes" a few percent of your cpu capacity. DMA on transmit is often very simple, but for receive the complications of timings and timeouts can make the DMA less of a win.
On Monday, August 22, 2016 at 7:03:55 AM UTC-4, pozz wrote:
> ...No, I don't have a fixed message size.
A variable sized message means you have to periodically poll the results, which means stopping the DMA and likely restarting DMA with a different buffer. Even for fixed size messages, you'll need to do this in case a character gets lost! The fun starts when the DMA behavior is not guaranteed if you start/stop during transfers: I've worked with chips where the DMA was unusable because characters would be dropped whilst switching (ie old NEC V25). Calculate whether the classic ISR technique's overhead is such that using DMA is really required. Should not but can also happen if ISR latency is large and FIFO is small, such that classic ISR may drop characters at high speed. Hope that helps! Best Regards, Dave
Il 22/08/2016 13:45, Dave Nadler ha scritto:
> On Monday, August 22, 2016 at 7:03:55 AM UTC-4, pozz wrote: >> ...No, I don't have a fixed message size. > > A variable sized message means you have to periodically poll > the results, which means stopping the DMA and likely restarting > DMA with a different buffer.
Why stopping the DMA? DMA should silently work in background, even if I'm checking for its results.
> Even for fixed size messages, you'll need to do this in case > a character gets lost! > > The fun starts when the DMA behavior is not guaranteed if you > start/stop during transfers: I've worked with chips where the > DMA was unusable because characters would be dropped whilst > switching (ie old NEC V25). > > Calculate whether the classic ISR technique's overhead is > such that using DMA is really required. Should not but can > also happen if ISR latency is large and FIFO is small, such > that classic ISR may drop characters at high speed. > > Hope that helps! > Best Regards, Dave >
Il 22/08/2016 13:29, David Brown ha scritto:
> On 22/08/16 10:30, pozz wrote: >> Until now I never used a microcontroller with an embedded DMA >> controller, so I always use the CPU to move data from memory to memory, >> from memory to peripheral, from peripheral to memory. >> >> The micro usually have only a two-byte hardware FIFO (the byte currently >> shifting in and the previous completely received byte). >> So I often implement a sw FIFO buffer for receiving data from UART. In >> the ISR of receiving character, I move data from peripheral (UART) to >> memory (FIFO buffer). >> I implement a uart_getchar() function that checks if a new byte is >> present in the FIFO an returns that byte or EOF. This function is >> called from the application (background) not in the foreground. >> >> >> #define ROLLOVER(x, max) (((x) + 1) >= (max) ? 0 : (x + 1)) >> >> uint8_t rx_buff[32]; >> size_t rx_size; >> volatile size_t rx_in; >> size_t rx_out; >> >> /* ISR of a new received character */ >> void isr_rx(void) { >> size_t i = ROLLOVER(rx_in, rx_size); >> if(i == rx_out) return; // Rx FIFO full, discard character >> rx_buff[rx_in] = c; >> rx_in = i; >> } >> >> /* uart_getchar() function */ >> int uart_getchar(void) { >> if (rx_out == rx_in) return EOF; // FIFO empty >> unsigned char data; >> data = rx_buff[rx_out]; >> rx_out = ROLLOVER(rx_out, rx_size); >> return data; >> } >> > > Technically, rx_buff and rx_out should be volatile too, but it is > unlikely for it to be a problem (your compiler would have to be inlining > multiple copies of the uart_getchar function - possible with link-time > optimisation - and do some legal but unlikely re-ordering). > > And have rx_size as a variable is going to be inefficient compared to > using a compile-time constant and making ROLLOVER a mask. > > >> >> Now I'm starting using new microcontrollers that embed a DMA engine, for >> example SAM C21 Cortex-M0+ from Atmel (I think many Cortex-M0+ micros >> out there integrate a DMA engine). >> >> So I'm asking if it is possible to avoid completely the ISR and use the >> DMA to move received character in the FIFO buffer. >> >> I read the datasheet, but I couldn't found a solution to my problem. >> >> First of all, the destination memory address could be fixed or >> auto-incrementing, but there isn't a mechanism to wrap-around (FIFO >> buffer is a *circular* array, so after pushing N bytes, the address >> should start again from the beginning). > > My experience is only with Freescale's DMA engine (on Kinetis ARMs and > MPC's), which have support for circular buffers for precisely this > reason. It seems a strange omission from Atmel to have no similar support.
I'm not sure, maybe it is possible to automatically resume the transfer descriptor as soon as the transfer is complete... in this way you will have a circular buffer managed by DMA without any CPU intervention.
>> Maybe I have to configure the DMA for a transaction with a byte count >> equals to the size of the FIFO buffer. When the last byte is received, >> the transaction ends and an interrupt could be raises (if correctly >> configured). In the relevant ISR, a new DMA transaction could be >> started (with the destination memory address equals to the beginning of >> the FIFO buffer). > > That sounds about right. > > Depending on the type of communication you are expecting, and the > resources you have, it might be possible to simply have a large enough > buffer to support all legal incoming packets. > >> >> Another issue is how the application (background) could know how many >> bytes are present in the FIFO buffer so it can pop and process them as >> it wants. > > Again, I don't know Atmel's DMA engine, but usually there are values you > can read (byte count, current destination pointer, etc.) that will help > here.
I see.
> And you also want to check if you really /need/ the DMA here. If the > processor is not overloaded, an ISR can be simpler and easier - and > often it does not matter if that "wastes" a few percent of your cpu > capacity. DMA on transmit is often very simple, but for receive the > complications of timings and timeouts can make the DMA less of a win.
What do you mean with "timings" and "timeouts"? I think those complications must be implemented even with the "simple" ISR-only approach.
On 22/08/16 14:49, pozz wrote:
> Il 22/08/2016 13:29, David Brown ha scritto:
>> And you also want to check if you really /need/ the DMA here. If the >> processor is not overloaded, an ISR can be simpler and easier - and >> often it does not matter if that "wastes" a few percent of your cpu >> capacity. DMA on transmit is often very simple, but for receive the >> complications of timings and timeouts can make the DMA less of a win. > > What do you mean with "timings" and "timeouts"? I think those > complications must be implemented even with the "simple" ISR-only approach. >
Indeed they do need to be implemented in the ISR-only approach - but then you have fewer bits that need to interact. For example, if you have an incoming telegram and you want to react quickly when it has been completely received, an ISR can give you that easily. When the interrupt for the final character arrives, you trigger your handling action. But if you are using a DMA then it will just be one more character in the buffer - you don't get an interrupt until the buffer is full. So you need additional methods, such as timer interrupts, to regularly poll the DMA buffer to see if the final character has come in. So a DMA on receive is good for some things, but adds complications for other tasks. DMA on transmit, however, is usually very simple because you know exactly how much you are sending. It gets "fun" if you want to add to the DMA buffer while the DMA is running - synchronising between DMA and the processor is not always a simple task. It is really easy to make something that works fine most of the time (and during all your testing), yet fails if the DMA triggers half-way through the buffer add function.