Memory Mapped I/O in C
A considerable part of my job as an embedded developer entails brokering interactions between whichever processor my code happens to be running on and the underlying hardware peripherals. Save for older architectures, this often means managing memory-mapped registers. When a device employs this strategy, a portion of the main memory space is not actually memory but a bus connection to a peripheral like GPIO pins, ADC channels, or UART ports. Said peripherals see write operations to certain memory addresses as input commands and can respond with their current state on read accesses.
This is a very convenient approach because typically the kind of low-level programming languages that are best suited for embedded development are already well equipped to work with addressable memory. Moreover, the compiler doesn't need to know specific instructions that may vary from processor to processor to interact with its devices - just a few generic and circumstantial directives.
In most cases, the hassle of working directly with registers is handled by a third-party library that exposes a cushy function API. If you work with low-level code long enough, however, you will eventually find yourself shaking the metal's bare hand in order to do business with the hardware. I also find it an interesting topic and thought exercise, so I'm going to explore how different programming languages fare in this specific aspect.
The obvious starting point is C, a timeless classic and lingua franca of efficient, low-level programming.
Use Case
Just to have a target, let's set the stage as the Raspberry Pi Pico (which hopefully doesn't need any introduction) and embark on the immortal quest for blinking the on-board LED.
This task involves the following registers:
- From the RESETS block:
- RESET register to initialize the IO bank
- RESET DONE register to check whether the IO bank has finished initialization
- From the GPIO block:
- GPIO CTRL to configure the GPIO function to software-controlled I/O
- From the SIO block:
- SIO GPIO OE to enable the GPIO as an output
- SIO GPIO OUT to control the output logic level
This is a nice sample set, showcasing what should be expected from a typical register configuration.
Each register spans the length of a machine word - 32 bits in this case - possibly split into subsections of arbitrary bit size; for example, to kickstart the IO bank, one must clear bit 5 in RESET and then wait for that same bit to fall in RESET DONE.
Some registers are part of an array: starting at address 0x40014000 you will find a repeating pattern of STATUS and CONTROL registers for each of the 30 GPIOs. The address of the control register for GPIO n can be thus calculated as 0x40014000 + n*8 + 4.
Finally, in a spicy twist of ingeniousness, some registers have mirror copies of themselves that allow for trick shots. GPIO OE and GPIO OUT work through normal read-write accesses but also provide in the three words that follow additional registers that set, clear, or xor the corresponding bits of the base register, respectively. If you wanted to change a single bit of one of those registers but keep all the others as they were, you can simply write that bit to one of the mirrors instead of reading the register, changing the value, and writing it again; it's both more efficient and ergonomic.
One should also note that all register operations should pass through a pointer marked with the volatile keyword to prevent the compiler from messing with them.
For the purpose of optimization, the compiler may decide from time to time to interpret your code a little creatively. For example, multiple writes of the same value to the same memory location could normally be collapsed to a single one... If that was really just a memory location.
Unfortunately, memory-mapped registers are subject to side effects, so the number and order of accesses is significant for the final behavior of the application; the side effect is the desired outcome, in fact. With volatile, the compiler knows to trust you and not to meddle.
This concludes the introduction. Now, how do we represent memory-mapped registers in C?
Variables
The immediate and naive approach is to declare a pointer for each register and then work with it. It's undeniably simple and what any seasoned C programmer should feel right at home with.
const uint32_t resets_base = 0x4000C000; volatile uint32_t *const resets_reset = (volatile uint32_t*)resets_base; volatile uint32_t *const resets_reset_done = (volatile uint32_t*)(resets_base + 0x8);
Notice the placement of const. Each pointer is a constant, so the compiler can avoid allocating RAM for a value that will never change. Controlling the devices is now just a matter of dereferencing the pointers:
// Activate the IO block *resets_reset |= 1 << 5; // wait for reset to be done while ((*resets_reset_done & (1 << 5)) == 0) ;
Simple and decently understandable - for C standards at least.
For multiple registers with the same structure, we can define a function:
typedef enum { GPIO_1, GPIO_2, /* ... */ } gpio_t; const uint32_t gpio_bank0_base = 0x40014000; volatile uint32_t *const gpio_ctrl(gpio_t gpio) { return gpio_bank0_base + gpio*0x8 + 0x4; }
All of this works, but not without some friction. The main issue is that variables - like every public symbol in C - need to be both defined and declared. As your project grows, the register pointers will probably take residence in their own module with separate source and header files; consequently, you would need to pair an extern declaration for each one of them.
While this could be a good occasion to hide symbols that don't need to be exposed (like resets_base), writing and maintaining a complete list where every entry must be repeated almost identically between .c and .h becomes very tedious very fast.
Functions require the same level of plumbing, but there are a lot of device registers out there - the clock configuration block alone counts 49 unique ones! We could disregard etiquette and just define the variables in the header files directly, to the horror and dismay of every self-respecting C programmer in a 100 Km radius; it would work, but I don't recommend it.
So variables would be a feasible route if we could avoid the define/declare dichotomy? More or less. For one, there's the issue of namespacing: you have a lot of registers with a lot of different names and you should provide an efficient way for the developer to find the right one. In a primitive language like C, this means explicitly spelling the logic path of the symbol in its name - i.e. resets_reset_done.
Further along, each register is split into multiple bit fields whose management is a whole other mess: defining memorable enums can help navigate the acceptable values, but writing those to the right place requires a bit (pun intended) more finesse.
For example, setting a GPIO to its SIO configuration means writing the value 5 to the first 5 bits of the corresponding CTRL register. Using the function we defined previously, one possible approach would be to write *gpio_ctrl(GPIO_25) = 5. While this achieves the desired result in most cases, it has a nasty side effect: the CTRL register has other fields in its 32 bits, and they all get wiped to zero with this assignment. Maybe that's what you want, but in the general case, it's sloppy programming. You should only modify the part you care about, leaving the rest as it was.
The only way to do so requires a couple of bitwise operations. First, you need to clear out the target bits (typically by "and"ing the register with a negated mask), then you can "or" the value.
// 5-bit mask to clear the spot -v *gpio_ctrl(GPIO_25) = (*gpio_ctrl(GPIO_25) & ~(0x1F)) | 5 // Final value -^
Alternatively, one can reconstruct the whole register value every time you write it, but depending on the context, that could be even more convoluted.
Between the repetition and the verbosity, none of this is ideal; let's try a slightly different approach.
Macros
Besides the oxymoron of constant variables, C provides a more powerful way to express information that is known at compile time: macros. Instead of defining (and declaring!) a variable that points to the register, we can (just) define a macro that substitutes to it.
#define SIO_BASE (0xD0000000) #define SIO_GPIO_OE ((volatile uint32_t*)(SIO_BASE + 0x20))
Notice the volatile uint32_t* cast. Since macros are simply substitutions, we need to state the intended type explicitly. The code that uses the register is not very different from the previous approach:
*SIO_GPIO_OE = 1 << GPIO_25;
The main advantage here is brevity, and it can be improved further. Macros can be nested to avoid repeating the pointer cast every time a new one is defined:
#define MMIO32(Addr) ((volatile uint32_t*)(Addr)) #define SIO_BASE (0xD0000000) #define SIO_GPIO_OE MMIO32(SIO_BASE + 0x20)
One could go even further and include the dereferencing * in the macro in order to access the register as if it was just a variable, like so:
#define SIO_GPIO_OE *MMIO32(SIO_BASE + 0x20) /* ... */ SIO_GPIO_OE |= 1 << GPIO_25;
I personally prefer maintaining the pointer nature explicit, but to each their own.
This solution alleviates the verbosity (partially, as the namespacing issue is still present) but again leaves you on your own to implement bitwise operations. It's frustrating because we are doing menial busywork, something the compiler should be tasked with rather than us developers. If only there was a way to declare the structure of the register map - down to the singular bits - and have some tool generate the code for us...
Structures
There is one last approach that solves most of the bumps and spikes seen thus far. It consists in declaring a struct type that mimics the register memory layout exactly; in other words, if you were to lay your structure over the MMIO region, the fields would line up exactly with each register.
typedef struct { uint32_t sio_gpio_out; uint32_t sio_gpio_out_set; uint32_t sio_gpio_out_clr; uint32_t sio_gpio_out_xor; uint32_t sio_gpio_oe; uint32_t sio_gpio_oe_set; uint32_t sio_gpio_oe_clr; uint32_t sio_gpio_oe_xor; } sio_regs_t;
When you have such a data type, you can define a pointer to it that points to the beginning of the registers' area; then, using the field pointer arrow syntax (pointer->field), you will be able to access each register through the struct's fields!
sio_regs_t *sio_regs = (sio_regs_t*)SIO_BASE; sio_regs->sio_gpio_out_set = 1 << GPIO_25;
This has a number of advantages over both previous options. While the pointer to the register map is technically a variable, the bulk of the information resides in the declaration of the struct; the definition of the pointer costs just one line to get the whole device tree!
Secondly, it is significantly more ergonomic in its definition and usage: the structure can be organized as a tree, and the developer can benefit greatly from autocomplete suggestions when looking for a specific register (without the need to include a pseudo-path in the register's name). Even better, you can split the individual fields into bitfields for maximum precision, so instead of writing a shifted value to the entire register, it is possible to select the exact section you need, down to the individual bit:
typedef struct { // ... Other registers union { uint32_t sio_gpio_oe; struct { uint32_t gpio_0 : 1; uint32_t gpio_1 : 1; uint32_t gpio_2 : 1; uint32_t gpio_3 : 1; // ... from 4 to 28 uint32_t gpio_29 : 1; } sio_gpio_oe_bits; }; // ... Other registers } sio_regs_t;
This may look a bit crooked if you're not used to fiddling with structs and unions. First, I wrapped the field sio_gpio_out that I had already defined previously in an anonymous union - that's right, structs and unions can be anonymous in C - so that I can still access the whole register in one go if I really need to. The other part of the union is a named struct field made of 29 1-bit wide fields, each accessible on its own.
Working with GPIOs, one quickly notices that many registers share the same simple internal structure - one bit for each GPIO. We can give this a name and use it as a type instead of repeating it every time:
typedef struct { uint32_t gpio_0 : 1; uint32_t gpio_1 : 1; uint32_t gpio_2 : 1; uint32_t gpio_3 : 1; // ... Remaining bits } gpio_bits_t; typedef struct { // ... Other registers union { uint32_t sio_gpio_oe; gpio_bits_t sio_gpio_oe_bits; }; union { uint32_t sio_gpio_oe_set; gpio_bits_t sio_gpio_oe_set_bits; }; // ... Other registers } sio_regs_t;
Anonymous structs and union fields do not introduce an additional field name; the registers are accessed like so:
sio_regs_t *sio_regs = (sio_regs_t*)SIO_BASE; sio_regs->sio_gpio_out_set_bits.gpio_25 = 1;
Exceedingly cool. This applies to fields of any width, both for reading and writing. It would also simplify working with different instances of identical peripherals - for example, there are two UARTs (serial ports) at addresses 0x40034000 and 0x40038000. One could separately define the register structure and instantiate it multiple times, one for each address. The instance can then be passed around to functions as if it was a "device object" of sorts.
Instancing can assist in accessing mirrored register maps as well. Suppose we declared a struct, apb_peripherals_t, mapping the totality of the device registers starting at address 0x40000000. The application could declare multiple pointers for each mirrored function, like so:
volatile apb_peripherals_t *const apb_peripherals = (apb_peripherals_t*)0x40000000; volatile apb_peripherals_t *const apb_peripherals_xor = (apb_peripherals_t*)0x40001000; volatile apb_peripherals_t *const apb_peripherals_set = (apb_peripherals_t*)0x40002000; volatile apb_peripherals_t *const apb_peripherals_clear = (apb_peripherals_t*)0x40003000;
See the RP2040 datasheet, section 2.1.2 for details on this functionality.
The Catch
This is the approach taken by many libraries and it tends to lead to an enjoyable developer experience. Unfortunately, for someone willing to nitpick - like me - there is a sizeable caveat: the C standard technically doesn't guarantee this approach to work. In order to be as easy as possible to implement, the C language specification is quite lax in defining what's part of the language, and a lot of times instead of specifying how the compiled software should act, the behavior is simply left undefined.
The memory layout resulting from structure declaration is one of those grey areas: according to the official document, a C struct must begin right at the start of the allocated memory and its fields must be found in the exact order they are laid out, but nothing else is enforced. In practice, this means that placing two uint32_ts one after the other does not guarantee that they will be neighbors in memory. The compiler is free to add as much space as it wishes between them.
Bitfields are even more finicky: whether the first field gpio_0: 1 in the previous example is placed at the most or least significant bit in the word is left up to the compiler yet again. From the ISO/IEC 9899:202x document, section 6.7.2.1.11:
The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
More competent developers than me will point out that before asking whether the compiler can go to crazy town with the memory layout, I should ponder why would it. While the C standard doesn't enforce a precise choice, most compilers tend in fact to be well-behaved. This freedom is usually intended for situations where the fields of a struct have different sizes (like unsigned int and unsigned long), so padding could be added to ensure that each field is properly aligned (for correctness and/or performance). That being said, given any modern and properly documented C compiler, you can easily make sure that your register structure declaration will actually map to the memory how you expect it to; just because it can insert random padding between equally sized and sequential fields, it doesn't mean that it will.
Moreover, both GCC and LLVM allow the user to explicitly require no padding at all - even in case of different field alignments - with the __attribute__((packed)) directive: attached to a struct, this instructs the compiler to avoid padding between fields, giving you more control over its final layout.
As for bit fields, you just need to know how your tool behaves. On arm-none-eabi-gcc the first bits you encounter in a struct declaration are placed in the least significant spots first, but this is not guaranteed. Portability problems caused by bitfield implementations are fairly common - either between different architectures or when interacting with C - so tread carefully.
If you are using a widespread architecture with a well-established compiler, you are fairly safe in using structures to map device registers. You should also absolutely be aware that you may hit some corner case where the tools at your disposal are allowed to behave how it pleases them.
On the other hand, bitwise operations like the ones I described previously are guaranteed to work regardless of how the memory innards are arranged: *x ^= 1 << 2; will always flip the second least significant bit of whatever x is pointing to because that's the meaning of the mathematical operations involved.
Conclusion
C is a language that starts from simple premises and provides surprisingly effective tools. Being able to shape a struct down to the individual bit for read and write access is extremely convenient, and it's a pity that the standard throws it into the pit of undefined behavior; it seems like those features were intended for memory efficiency alone and that precise memory mapping is just a side effect provided by the individual compilers.
If you know how the compiler behaves in your case and do not plan on changing it, structures are a great choice for memory-mapped registers. If - like me - you frequently need to work with multiple, sometimes a little arcane, target architectures and compilers, you might prefer the safety and reassurance that come with working with macros and bitwise operations.
In the end, I find none of the solutions presented here to be entirely satisfying. In more recent times, other languages have included abstractions and tools better suited to access specifically shaped memory without compromising on portability or efficiency; next up will be Rust, then probably Zig and maybe even something else. Stay tuned!
- Comments
- Write a Comment Select to add a comment
It's still hardware. Just because it's "memory-mapped" doesn't guarantee that it's readable just like memory. Maintain a word-aligned image in memory and write to I/O space as needed. Also don't fool around with bitfields, screen scrape the data sheet and assign a bit-mask value to the symbols.
Ah, yes! Some locations may not retain what has been written and some may read as constant values - or even random noise. From a purely programming perspective however they behave exactly like memory locations that happen to trigger side effects.
Sorry boss, those memory locations unforeseably happened to trigger show-stopping side effects. Let's re-spin the board to accommodate the code.
I'm afraid I don't understand your point.
> Maintain a word-aligned image in memory and write to I/O space as needed.
Are you suggesting to modify a copy of the registers and write in in full when I/O update is required? Are you against using pointers in C? Should it be done with assembly instruction?
> Also don't fool around with bitfields, screen scrape the data sheet and assign a bit-mask value to the symbols.
The article mentions both how bitfields are unreliable and how you should only access specific sections of the registers through a mask.
> Sorry boss, those memory locations
unforeseably happened to trigger show-stopping side effects. Let's
re-spin the board to accommodate the code.
The devices have side effects but even those aren't "unforeseeable". Barring an hardware fault they will behave deterministically; besides, what else if not "the code" should interact with them? I'm very confused about what you're trying to say.
Notice the placement of const. Each pointer is a constant, so the compiler can avoid allocating RAM for a value that will never change.
Aha! I've been wondering about use cases for constant pointers for a while. Thanks for pointing it out!
apb_peripherals_t *const apb_peripherals = (apb_peripherals_t*)0x40000000;
apb_peripherals_t *const apb_peripherals_xor = (apb_peripherals_t*)0x40001000;
apb_peripherals_t *const apb_peripherals_set = (apb_peripherals_t*)0x40002000;
apb_peripherals_t *const apb_peripherals_clear = (apb_peripherals_t*)0x40003000;
Make sure to use volatile here! (as you pointed out earlier in the article)
Thank you for pointing that out!
On thing that is often overlooked when using the "pointer to volatile bitfield struct" approach is that the compiler has to emit full-width loads & stores for each field that is modified:
void bad(uint16_t offset, uint16_t size, uint8_t stride, uint8_t priority) { // note the ldr and str instructions for each of these assignments ThePeripheral->B.offset = offset; ThePeripheral->B.size = size; ThePeripheral->B.stride = stride; ThePeripheral->B.priority = priority; } void good(uint16_t offset, uint16_t size, uint8_t stride, uint8_t priority) { Peripheral p; p.B.offset = offset; p.B.size = size; p.B.stride = stride; p.B.priority = priority; ThePeripheral->R = p.R; }
Full sample on godbolt:
https://godbolt.org/z/r3EaGbWjT
Directly accessing the "pointer-to-union"s provided by vendors is comfortable but developers really should think twice if code size / performance is important since both are suffering here.
To post reply to a comment, click on the 'reply' button attached to each comment. To post a new comment (not a reply to a comment) check out the 'Write a Comment' tab at the top of the comments.
Please login (on the right) if you already have an account on this platform.
Otherwise, please use this form to register (free) an join one of the largest online community for Electrical/Embedded/DSP/FPGA/ML engineers: