EmbeddedRelated.com
Forums

How to make efficient RX ISR usage on FreeRTOS

Started by skiddybird December 3, 2012
Hi,
Can anyone tell me how to write an UART RX ISR in order to receive several kinds of packets with different length? The kernel is FreeRTOS. I have excerpted one piece of sample code snippet from one of the FreeRTOS demo port for LPC2129, as bellow.



For this code snippet, I am only interested in the receive path. What it does is, whenever 1 character is received, the ISR put it in the queue xRxedChars, then wake up the handler task for data deciphering and handling. But this method is inefficient, because the frequent enqueuing and context switching is time consuming. Just as the author states, the demo port is for illustration of how to use some APIs, not for efficiency. Then how to modify the code to suit the practical needs? As we know, the length of packets that traverse the UART port may not be the same, some are longer, others are shorter, and we have no idea of what kinds of packets will arrive at the next moment. Where to store the received packets? and when to unblock the handler task? If the packet is to be placed in a global structure in RAM, then the issue of exclusive access to this global structure arises. Image there are alreay some data received in the global structure, then while the handler task is in the midst of getting the data out of the structure, some more new data arrives and the RX ISR is invoked for reception, then the RX ISR will grab the global structure from the handler for writing new data, which would result in data corruption. It is another problem.

An Engineer's Guide to the LPC2100 Series

> For this code snippet, I am only interested in the receive path. What it
> does is, whenever 1 character is received, the ISR put it in the queue
> xRxedChars, then wake up the handler task for data deciphering and
> handling. But this method is inefficient, because the frequent enqueuing
> and context switching is time consuming. Just as the author states, the
> demo port is for illustration of how to use some APIs, not for
> efficiency.

Correct. Queuing characters in this way is fine for low bandwidth
communications, things like command console key inputs, etc, but has too
high an overhead for higher volumes of data.

> Then how to modify the code to suit the practical needs?

For optimal performance, if the data throughput requirements warrant it
anyway, you can't beat DMA.

On a USART I normally leave the Rx DMA running all the time because you
don't know when data is going to arrive. The Rx DMA places characters
in a circular buffer, and uses a semaphore to indicate whether there is
data in the buffer or not. The semaphore allows a task to block
(without polling or in fact using any CPU time) on reading data if the
buffer is empty or contains fewer characters than are needed.

The Tx DMA can be used to provide a simple zero copy transmission, which
the sending task can either opt to wait for to complete (again using a
semaphore that is given by the DMA Tx end interrupt), or just continue
on its way while the transmission is in progress.

I have just implemented such a system that is now available, but it is
not on an NXP chip so won't post it here ;o) The principals are the
same on all chips though.

Simpler methods use character by character (or at least Fifo full/empty)
interrupts to achieve the same thing. You can find examples on
http://www.FreeRTOS.org/io - look at the "transfer modes" section at the
bottom (the zero copy Tx transfer mode is due for a little rework to
make it more user friendly).
Regards,
Richard.

+ http://www.FreeRTOS.org
Designed for microcontrollers. More than 7000 downloads per month.

+ http://www.FreeRTOS.org/trace
15 interconnected trace views. An indispensable productivity tool.
Thank you, Richard!
But my puzzle remains. DMA is a good way of easy transmission of data, but for the MCUs that possess no DMA controller, software transaction mechanism is necessary. In terms of circular buffer, I have written the following code. The USART RX ISR puts every character it gets in the buffer Q_serial, and the task relay_PC_message() in the background move the characters from Q_serial to the array rx_PC, for further processing. relay_PC_message() is always ready or running. Seems stupid method, but I just don't know how to incorporate this code snippet into FreeRTOS. The data packets arrived may be of variable length, so when can I wake up relay_PC_message() for data processing in the RX ISR if it is in dormant state by default? I can't rely on the receving count because the length of every packet is probably unique. Nor do I want to perform data decoding in the ISR, because it takes up too much time and could impact the reception of the subsequent
characters. Anohter drawback of this method is the lack of mutual exclusion protection for the Q_serial. It is quite possible that when relay_PC_message() is decrementing Q_serial.count, vSerialISR() pre-empt relay_PC_message(), thus the value of Q_serial's member will be corrupted.
Therefore, what can I do?

On Tue, Dec 4, 2012 at 4:37 PM, Lei Y wrote:
> The data packets arrived may be of variable length, so when can I wake up
> relay_PC_message() for data processing in the RX ISR if it is in dormant
> state by default?

how many bytes do you need to receive to know the packet size? Are you
using a protocol where there's a length field near the start of the packet,
or are you using an end-of-packet delimeter such as newline?

If your packets have a length field, your rx interrupt can extract the
length and set a flag when that many bytes are received. If the length
field changes between packets, you can wake up something to tell your rx
interrupt how many more bytes to receive when there's enough data for a
packet length to be present. The only trouble with this system is when your
minimum packet length is less than the section of the largest packet before
the length field.

If your packets have an end-of-packet delimeter, your rx interrupt can
increment a counter every time it sees one, and your queue pop routine can
decrement it whenever it pulls out a full packet.
--- In l..., Lei Y wrote:
>The data packets arrived may be of variable length, so when can I wake up relay_PC_message() for data processing in the RX ISR if it is in dormant state by default? I can't rely on the receving count because the length of every packet is probably unique. Nor do I want to perform data decoding in the ISR, because it takes up too much time and could impact the reception of the subsequent characters.

How do your packets get framed? IP packets are of variable length but they contain a 'length' field. If you are dealing with unframed data with no real concept of a packet, you are going to have a very difficult time avoiding having to wake up the handler code on every character.

For that matter, the handler is going to be pretty difficult to write (for dealing with a true stream) because it would have to be a fairly massive state machine (to remember what it has already seen and what it is expecting) and one missed character could leave it stuck.

I would look hard at the packet definition and have the ISR wake up the handler as each complete packet is received.

>Anohter drawback of this method is the lack of mutual exclusion protection for the Q_serial. It is quite possible that when relay_PC_message() is decrementing Q_serial.count, vSerialISR() pre-empt relay_PC_message(), thus the value of Q_serial's member will be corrupted. Therefore, what can I do?

You need to provide mutual exclusion around 'count'. Without an RTOS, all you would do is disable interrupts momentarily while you deal with 'count'. It's been a long time since I worked with FreeRTOS but there will be some method for creating a mutex around 'count' but, in effect, it does the same thing.

Before getting all jammed up looking at the details, take a step back and look at the bigger picture. How much help should the ISR provide to the handler? How would the handler like to receive the data? What is a packet? How is the packet framed? How much memory is available for the queue(s)?

Richard
> You need to provide mutual exclusion around 'count'. Without an RTOS, all you would do is disable interrupts momentarily while you deal with 'count'. It's been a long time since I worked with FreeRTOS but there will be some method for creating a mutex around 'count' but, in effect, it does the same thing.

However, you don't really need a 'count', all the insert and extract functions need to do is test for equality of the pointers. Even if the value changes mid stream, it won't matter. The extract function may believe there is one less value than there is and the insert function will believe the queue is full when there is really one spot remaining.

It is always worth reading "Fundamental Algorithms" by Knuth. Around page 240 in my second edition.

I would wake up the handler when I saw the end of the packet. The handler could then extract data as fast as it could knowing that it will eventually get to the end flag.

Richard
Thank you, guys!
According to the disscussion, I have changed the code as below.



The idea is simple. Whenever a character arrives via serial port, the USART RX ISR places it in the array rx_PC, and start a timer for observing a pause on the incoming stream. As long as the stream is transfering, a pause could never appear, and the timer count keeps being refreshed, thus no oveflow occures. At the end of the transmission, the timer will soon overflow of course, and the timer ISR gives a semaphore to wakeup the handler task for processing the received stream.
In such a situation, if no a single character arrives, the handler task should block when attempting to take the semaphore, and could never run to the statement process_data_in_rx_PC().
But this is not the case. After program startup, the handler task reaches process_data_in_rx_PC() immediately, without waiting for the arrival of any USART received characters. The symptom remains the same even if the serial cable between the target board and the sending device is disconnected.
After spending some effort on debugging, I got one finding.
The macro definition for vSemaphoreCreateBinary() is as below.



If it is changed to the below form, the aforementioned problem will get solved. The handler task blocks if no semaphore is given by the timer ISR, and unblocks if the timer ISR provides a semaphore.
#define vSemaphoreCreateBinary(xSemaphore) (xSemaphore) = xQueueGenericCreate((unsigned portBASE_TYPE)1, semSEMAPHORE_QUEUE_ITEM_LENGTH, queueQUEUE_TYPE_BINARY_SEMAPHORE);

So far, my question is, why give it on the creation of a binary semaphore? I've always thought, as a means of syncronization between a task and an ISR, or another task, only the ISR or another task is legitimate to give the binary semaphore. Have I holden a wrong concept for all the time?
My UART driver for FreeRTOS uses this technique (LPC 2xxx ARM7)

Use FIFO in UART. The interrupt rate is reduced from one per char to on per 10 chars (or whatever you wish, 16 byte FIFO).
At each interrupt, copy all bytes in the FIFO to buffer or queue, then dismiss the interrupt. Fast loop.

My issue with FreeRTOS for this is that there are no ring buffers, lists, or variable size message queues. So that's where the overhead lies, in the ISR to task interface.

None the less, my app uses the UART at 115200 baud with about 70% duty cycle on arriving data.
On 13/12/2012 06:01, stevec wrote:
>
> My UART driver for FreeRTOS uses this technique (LPC 2xxx ARM7)
>
> Use FIFO in UART. The interrupt rate is reduced from one per char to on
> per 10 chars (or whatever you wish, 16 byte FIFO).
> At each interrupt, copy all bytes in the FIFO to buffer or queue, then
> dismiss the interrupt. Fast loop.
>
> My issue with FreeRTOS for this is that there are no ring buffers,
> lists, or variable size message queues. So that's where the overhead
> lies, in the ISR to task interface.

Ring buffers are the normal way of doing this. If you don't have DMA
then you are going to have to copy the bytes out one at a time anyway,
unless the hardware somehow lets you memcpy form registers (?), so a
ring buffer is not going to help efficiency in that case.

The latest demos released at Electronica recently (admittedly not on NXP
parts) includes a very efficient UART driver that sets up a DMA to
continuously receive data into a ring buffer. Practically no CPU
overhead at all.

How a ring buffer can be coded also depends on how it is filled - in the
same demo there are actually two implementations. One that is filled by
DMA, and another by interrupts (for a CDC device without DMA).

There is already a plan to extend the NXP FreeRTOS+IO demos to include a
DMA transfer mode. Interrupts filling queues are only acceptable for
very low throughput interfaces, such as a command console.

Regards,
Richard.

+ http://www.FreeRTOS.org
Designed for microcontrollers. More than 7000 downloads per month.

+ http://www.FreeRTOS.org/trace
15 interconnected trace views. An indispensable productivity tool.

Thank you, people.
Yet my previous question remains unanswered. Allow me to repeat it.

The macro definition for vSemaphoreCreateBinary() is as below.



For binary semaphore, is it normal to create it before starting scheduler? If yes, after its creation, why have to give it immediately? For a simple example, imagine the role of one specific binary semaphore is for synchronization between a task and an ISR, then no routine except that ISR is legitimate to give this semaphore, otherwise the task could not block to wait for the occurrence of that interrupt, because the semaphore has been given right after its creation, and the effect of this binary semaphore is void.
In other words, should the above definition for vSemaphoreCreateBinary() be modified as the following one?
#define vSemaphoreCreateBinary(xSemaphore) (xSemaphore) = xQueueGenericCreate((unsigned portBASE_TYPE)1, semSEMAPHORE_QUEUE_ITEM_LENGTH, queueQUEUE_TYPE_BINARY_SEMAPHORE)

Correct me please.