EmbeddedRelated.com
Forums

RTOS Debugging - Tips, Tricks and Tools

Started by stephaneb 6 years ago7 replieslatest reply 2 years ago8418 views

A few months ago, the topic of debugging embedded systems was addressed: Debugging Embedded Systems - Favorite Tools, Strategies, Best Practices... 

This week, following last week's RTOS vs Bare-Metal #FAQ, I would like to create a similar thread but this time with the focus on the RTOS side of things as far as #debugging is concerned.

If this thread is successful, it will end up being a great resource for many in the embedded systems community who are using an #RTOS in their application to possibly learn new debugging tricks,  discover new debugging tools and hopefully improve their debugging skills.

Thank you for your participation and please do not forget to 'thumbs-up' the posts that are the most insightful and that you believe should be at the top of the thread.

[ - ]
Reply by KeedsFebruary 2, 2023

I think the #1 most helpful thing you can do, is enable any debugging help that is provided by your RTOS. I am used to working with FreeRTOS, and that provides a host of helpful options and functions for debugging. I assume that other RTOSes also provide some level of support like this, so I hope this can help all RTOS users.

The first of these tools is configASSERT(). This is a macro that has an equivalent purpose to the standard C assert(). Calls to this macro are sprinkled throughout the FreeRTOS kernel to help with debugging. For example, for Cortex-M, there are some assert checks that make sure you are not using FreeRTOS API calls in an ISR that has a priority that is outside your user-defined kernel priorities range. Very important to configure your interrupt priorities correctly! 

Then, there is a series of hook functions that you can enable and implement, to either trap your code or simply print to a console or log. I've found the vApplicationStackOverflowHook to be especially helpful. There are even a couple configuration options for this one but essentially it can (not always) detect when a task's stack is overflowing! And we can't forget about vApplicationMallocFailedHook which is a hook for the case when dynamically allocating memory from the FreeRTOS heap fails. 

Next, FreeRTOS provides a set of trace functions that are strategically placed throughout the kernel. These are macros which by default are defined as blank, so even if tracing is turned on in configuration, won't affect your code. However, you can use a trace tool such as Tracealyzer to make these calls report valuable information about how your system is running. This information includes everything from basic task information, to RTOS API calls, to custom print messages that you can insert into your code. Then all of this information is compiled into a chronological organized view that you can analyze through various grahs and views. Personally, I have only used System View, which is a great free tool with limited functionality (and requires a J-Link debugger as I understand). However, I am hoping to have a chance to use Percepio Tracealyzer in the near future, as it looks to have many more features! I believe they are working on adding trace functionality for IP stacks (such as LwIP). 

Speaking of J-Link debuggers, apart from being fantastic tools for general debugging, they provide an additional edge with RTOSes and Cortex-M devices. I mean, specifically, through the use of Segger's RTT. This is a small bit of code that resides on the target, and through the use of some buffers in RAM (only 1 kB works fine) and the internal debug bus, can stream data at an incredible rate. You can print strings to a bi-directional console with blazing speed. Segger advertises that you can print 82 characters in only 1 micro-second on a STM32F407 Cortex-M4 running at 168 MHz. This means you can stream additional data out of your system, or have a console without significantly affecting the performance of your RTOS system. I have personally used the RTT block (other than for SystemView) to stream high-frequency sampling data from an ADC on a target and later visualize and process it in another tool. Very helpful! 

[ - ]
Reply by jorickFebruary 2, 2023

If you have access to the RTOS source code, make sure you read it through and become familiar with the code and structures.  Inevitably, there will come a time when you'll need to examine their contents to see why something isn't working properly.

Once I had an app running on FreeRTOS that would suddenly go down to the idle task priority.  While stepping through the code, I saw that a 256-byte buffer was being placed in the stack variables, and the last byte was being set to 0 as a terminator.  That last byte overflowed the stack and got placed right on top of the task priority byte in the TCB.  FreeRTOS didn't detect a stack overflow because the rest of the buffer didn't get written into and the stack check words were still there.

Another time I had an app that used FreeRTOS's tick hook to keep track of time and display the date and time on the screen.  But the time was running about 1% slower than it should have been.  I traced it down to the FreeRTOS timer tick initialization code which didn't have the correct formula.  I fixed the formula and reported it to Richard Barry (the FreeRTOS author) and he had an update within a few days.

[ - ]
Reply by allivFebruary 2, 2023

I'd say to write a good application, first, you have to choose a well-maintained RTOS. When you are selecting a right RTOS, the debug environment and number of drivers supported makes all differences. I am using a lot eCos and FreeRtos, but also looking to Riot, myNewt, Zephyr, etc. 

At the moment FreeRTOS win. It is not ideal, has some problems, but in general, the amount of support software available and debug features embedded to the FreeRTOS from the beginning are winning a competition. (and remember, you have 3 different RTOS under this umbrella, even certified for critical applications one).

In multi-threading applications, I always concern of memory usage (1), or in case of a Stack overflow and Hard fault - who has caused the problem (2)?

For (1) I was using Tracelyzer for static debugging, if you have SWO, then its also can be used for online (stream) debugging. This is a very good professional tool. However, as an alternative, you always can use a free plugins to Eclipse from NXP, or use a built-in to the FreeRTOS statistics which will give you a statistics in text format (in some projects I just display that info over HTTP server).

Below is my FreeRTOSConfig.h for STM32F4 (FR9.0), which I am using in a real project for debugging (originally generated by STM32CubeMx tool), see (1) and (2) in comments /**/:

#ifndef FREERTOS_CONFIG_H
#define FREERTOS_CONFIG_H
/*-----------------------------------------------------------
 * Application specific definitions.
 *
 * These definitions should be adjusted for your particular hardware and
 * application requirements.
 *
 * THESE PARAMETERS ARE DESCRIBED WITHIN THE 'CONFIGURATION' SECTION OF THE
 * FreeRTOS API DOCUMENTATION AVAILABLE ON THE FreeRTOS.org WEB SITE.
 *
 * See http://www.freertos.org/a00110.html.
 *----------------------------------------------------------*/

/* Ensure stdint is only used by the compiler, and not the assembler. */
#if defined(__ICCARM__) || defined(__CC_ARM) || defined(__GNUC__)
    #include 
    extern uint32_t SystemCoreClock;
/* USER CODE BEGIN 0 */
    extern void configureTimerForRunTimeStats(void);
    extern unsigned long getRunTimeCounterValue(void);
/* USER CODE END 0 */
#endif


#define configUSE_PREEMPTION                     1
#define configSUPPORT_STATIC_ALLOCATION          0
#define configSUPPORT_DYNAMIC_ALLOCATION         1
#define configUSE_IDLE_HOOK                      0
#define configUSE_TICK_HOOK                      0
#define configCPU_CLOCK_HZ                       ( SystemCoreClock )
#define configTICK_RATE_HZ                       ((TickType_t)1000)
#define configMAX_PRIORITIES                     ( 7 )
#define configMINIMAL_STACK_SIZE                 ((uint16_t)128)
#define configTOTAL_HEAP_SIZE                    ((size_t)51200)
#define configMAX_TASK_NAME_LEN                  ( 16 )
#define configGENERATE_RUN_TIME_STATS            1                            /* (1) */
#define configUSE_TRACE_FACILITY                 1                            /* (1) */
#define configUSE_STATS_FORMATTING_FUNCTIONS     0                        /* (1) I do not use built-in formatting */
#define configUSE_16_BIT_TICKS                   0
#define configUSE_MUTEXES                        1
#define configQUEUE_REGISTRY_SIZE                8


#define configUSE_RECURSIVE_MUTEXES              1
#define configUSE_COUNTING_SEMAPHORES            1
#define configUSE_PORT_OPTIMISED_TASK_SELECTION  1

#define configCHECK_FOR_STACK_OVERFLOW           1                         /* for (2) */ 
#define configUSE_LIST_DATA_INTEGRITY_CHECK_BYTES   1                     /* for (2) */

#define configFRTOS_MEMORY_SCHEME                4  /*this identifies the memory scheme heap_4 for NXP FreeRTOS plugin */

/* Software timer definitions. */
#define configUSE_TIMERS                         1
#define configTIMER_TASK_PRIORITY                (2)
#define configTIMER_QUEUE_LENGTH                 (3)
#define configTIMER_TASK_STACK_DEPTH             512

/* Co-routine definitions. */
#define configUSE_CO_ROUTINES                    0
#define configMAX_CO_ROUTINE_PRIORITIES          ( 2 )

/* Set the following definitions to 1 to include the API function, or zero
to exclude the API function. */
#define INCLUDE_vTaskPrioritySet            1
#define INCLUDE_uxTaskPriorityGet           1
#define INCLUDE_vTaskDelete                 1
#define INCLUDE_vTaskCleanUpResources       1
#define INCLUDE_vTaskSuspend                1
#define INCLUDE_vTaskDelayUntil             0
#define INCLUDE_vTaskDelay                  1
#define INCLUDE_xTaskGetSchedulerState      1


/* Cortex-M specific definitions. */
#ifdef __NVIC_PRIO_BITS
 /* __BVIC_PRIO_BITS will be specified when CMSIS is being used. */
 #define configPRIO_BITS         __NVIC_PRIO_BITS
#else
 #define configPRIO_BITS         4
#endif


/* The lowest interrupt priority that can be used in a call to a "set priority"
function. */
#define configLIBRARY_LOWEST_INTERRUPT_PRIORITY   15


/* The highest interrupt priority that can be used by any interrupt service
routine that makes calls to interrupt safe FreeRTOS API functions.  DO NOT CALL
INTERRUPT SAFE FREERTOS API FUNCTIONS FROM ANY INTERRUPT THAT HAS A HIGHER
PRIORITY THAN THIS! (higher priorities are lower numeric values. */
#define configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY 5


/* Interrupt priorities used by the kernel port layer itself.  These are generic
to all Cortex-M ports, and do not rely on any particular library functions. */
#define configKERNEL_INTERRUPT_PRIORITY         ( configLIBRARY_LOWEST_INTERRUPT_PRIORITY << (8 - configPRIO_BITS) )
/* !!!! configMAX_SYSCALL_INTERRUPT_PRIORITY must not be set to zero !!!!
See http://www.FreeRTOS.org/RTOS-Cortex-M3-M4.html. */
#define configMAX_SYSCALL_INTERRUPT_PRIORITY  ( configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY << (8 - configPRIO_BITS) )


/* Normal assert() semantics without relying on the provision of an assert.h
header file. */
/* USER CODE BEGIN 1 */
#ifndef taskDISABLE_INTERRUPTS
    #define taskDISABLE_INTERRUPTS     vPortRaiseBASEPRI
#endif

#define configASSERT( x ) if ((x) == 0) {taskDISABLE_INTERRUPTS(); for( ;; );}
/* USER CODE END 1 */

/* Definitions that map the FreeRTOS port interrupt handlers to their CMSIS
standard names. */
#define vPortSVCHandler    SVC_Handler
#define xPortPendSVHandler PendSV_Handler

/* IMPORTANT: This define MUST be commented when used with STM32Cube firmware,
              to prevent overwriting SysTick_Handler defined within STM32Cube HAL */
/* #define xPortSysTickHandler SysTick_Handler */

/* USER CODE BEGIN 2 */
/* Definitions needed when configGENERATE_RUN_TIME_STATS is on */
#define portCONFIGURE_TIMER_FOR_RUN_TIME_STATS              configureTimerForRunTimeStats
#define portGET_RUN_TIME_COUNTER_VALUE                      HAL_GetTick
/* USER CODE END 2 */

/* USER CODE BEGIN Defines */
/* Section where parameter definitions can be added (for instance, to override default ones in FreeRTOS.h) */
/* USER CODE END Defines */


#endif /* FREERTOS_CONFIG_H */


For StackOverflow I have the following code in my port file:

#if (configCHECK_FOR_STACK_OVERFLOW > 0)
void vApplicationStackOverflowHook( TaskHandle_t xTask, char *pcTaskName )
{
    error_handler(1, _ERR_Stack_Overflow);
}
#endif


For HardFault I use following code to extact usefull info on HardFault_Handler Irq:

#if (configCHECK_FOR_STACK_OVERFLOW > 0)
volatile uint32_t r0;
volatile uint32_t r1;
volatile uint32_t r2;
volatile uint32_t r3;
volatile uint32_t r12;
volatile uint32_t lr; /* Link register. */
volatile uint32_t pc; /* Program counter. */
volatile uint32_t psr;/* Program status register. */

void prvGetRegistersFromStack( uint32_t *pulFaultStackAddress )
{
/* These are volatile to try and prevent the compiler/linker optimising them
away as the variables never actually get used.  If the debugger won't show the
values of the variables, make them global my moving their declaration outside
of this function. */
    r0 = pulFaultStackAddress[ 0 ];
    r1 = pulFaultStackAddress[ 1 ];
    r2 = pulFaultStackAddress[ 2 ];
    r3 = pulFaultStackAddress[ 3 ];

    r12 = pulFaultStackAddress[ 4 ];
    lr = pulFaultStackAddress[ 5 ];
    pc = pulFaultStackAddress[ 6 ];
    psr = pulFaultStackAddress[ 7 ];

    /* When the following line is hit, the variables contain the register values. */
    for( ;; );
}
#endif


/**
* @brief This function handles Hard fault interrupt.
*/
void HardFault_Handler(void)
{
  /* USER CODE BEGIN HardFault_IRQn 0 */

#if (configCHECK_FOR_STACK_OVERFLOW > 0)
   __asm volatile
    (
        " tst lr, #4                                                \n"
        " ite eq                                                    \n"
        " mrseq r0, msp                                             \n"
        " mrsne r0, psp                                             \n"
        " ldr r1, [r0, #24]                                         \n"
        " ldr r2, handler2_address_const                            \n"
        " bx r2                                                     \n"
        " handler2_address_const: .word prvGetRegistersFromStack    \n"
    );
#endif
  /* USER CODE END HardFault_IRQn 0 */
  while (1)
  {
  }
  /* USER CODE BEGIN HardFault_IRQn 1 */

  /* USER CODE END HardFault_IRQn 1 */
}

When debuging, with Eclipse ARM plugin from NXP you will see something like this:

capture_27247.png


Using Tracelyzer tool basically much easy than manual configuration and usage of a built-in debugging facility of FreeRTOS, I would recommend this to everyone who is not familiar to FreeRTOS. To enable Tracelyzer, all you need is to add headers from Tracelyzer to the build and add few lines to FreeRTOSConfig.h. And remember, Traceyzer is a recorder, so it will show you dynamics of how your application was run before a crash (if any).
#define configUSE_TRACEALYZER 1 
...
...
/* Integrates the Tracealyzer recorder with FreeRTOS */
#if ( configUSE_TRACEALYZER == 1 )
#include "trcRecorder.h"
#endif


Hope that practical code will help someone and you will press me a beer)

Njoy!

[ - ]
Reply by Tim WescottFebruary 2, 2023

Write your own startup code, and fill the heap and stack spaces with 0xdeadbeef.  Or 0xF00D if it's a 16-bit processor.  Or whatever.

I'm looking at a newly-commissioned RTOS application that has a pointer with the value of 0xdeadbeef -- meaning that I made my stack too small and the thing has wandered off into la-la land.

[ - ]
Reply by mr_banditFebruary 2, 2023

And ... create your lowest priority task to monitor the stacks. Start at the bottom and look for the "high-water mark" where you hit the end of DEADBEEF, meaning the stack has made it that far (basically what he said).

Set a threshold, say 90% of the stack used. Alert the user instantly and at some rate after that. (They may miss the first time).

ALSO - have a mechanism to report on demand the stack size usage. Make it part of the general system status.

This is handy for another reason: the stack area is part of a precious resource: RAM. set your stacks bigger than you expect by some large amount. After debugging, look at how much stack is actually used (assuming you are testing such things like worst case). You can then reduce stack size to free up RAM for the app.

I have debugged RTOS apps with a UART and a scope where I could look at a pin. The trick to a UART is dedicate a task with the second-to-lowest priority. (The lowest is the tack check.) I have several command-line interpreters (CLIs). I put the CLI as the second-lowest task. I can then use it to print out the system status, etc. (Yes, this is old-school.) (Assuming your CLI UART is interrupt driven.)

Here is the trick: Have a message with a buffer that a higher-priority task can put a string in the message and send to the CLI task, which prints it out. That way, the string print does not affect the running time of the original task (other than the sprintf()), AND - the messages are printed in the order received. The CLI can prepend the name of the task, saving on message buffer space. (If you have two tasks that try to printf() at the same time, the messages will be munged by the race condition.) 

Memory should ALWAYS be allocated at compile time!!!! DO NOT MALLOC(). (The only exception is a single malloc() on boot, so the difference is style, but I am very conservative in such things. The linker should tell you if you want to use too much memory, instead of having to wait when you load/execute.)

Create buffer pools. INSTRUMENT THEM! Note who allocs from the pool and frees them to the pool. Keep track of pool usage && when the number of free buffers drops below a threshold. Log them when passed from one task to another.

Your design and analysis should tell you how many of what size buffers you should have. One pool per size - KISS. Balance different buffer sizes to number of pools. For example, if you need a bunch of 20 byte buffers, and some 10 byte buffers, have a single pool of 20 byte buffers.

Your HW design should ALWAYS have some extra pins you can wiggle to look at them on a scope / logic analyzer. Bring them out to a header you can clip a scope probe onto.

[ - ]
Reply by SolderdotFebruary 2, 2023

Monitoring stack usage via magic words was already mentioned. Instead of checking for overflows on the lowest prio thread this should be done on the thread with highest priority. Just assume that you have 3 threads, the one with the highest prio currently executing and suffering a stack overflow, thus destroying the stack of the mid prio thread. This may be scheduled before the low prio thread doing the stack checking, and this might end in a crash with rather fancy effects. If you do not have a core dump it might be difficult to figure out that the high prio thread's stack overflow caused that.

Consider injecting some code into the RTOS scheduler tracing out the scheduling decisions. So you can see at which time which thread started/suspended/resumed/stopped. In a system consisting of more than just a handful of threads such a trace will be very valuable, e.g. for cases with priority inversion or resource conflicts or simply starving due to poorly tuned priorities.

I can only support the statement "know your code", and this includes the RTOS code as well, as long as the sources are available. 

In systems where the boot vector is located at address 0x00 (most ARM devices I worked with are setup like that) consider replacing the instruction at 0x00 with a branch to an exception handler. This will reliably help you in identifying places where uninitialized function pointers are being used - w/o that catcher a startup procedure will take place which might even go unnoticed and cause funny effects only some time later. With such a catcher you will directly see the result.

If your microcontroller supports memory protection mechanisms make sure that your RTOS does so, too. Disable access to other thread's stacks. Disable write access to code memory. Disable execution of code from data memory. Disable access to address regions where neither memory nor registers reside.