EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

Really tiny microcontrollers

Started by Paul Rubin February 25, 2013
In article <op.ws30djraj0diun@thanatos.indes.com>, 
sp4mtr4p.boudewijn@indes.com says...
> > Op Tue, 26 Feb 2013 09:44:43 +0100 schreef Paul Rubin > <no.email@nospam.invalid>: > > Tim Wescott <tim@seemywebsite.com> writes: > >> I suspect that there will be 8-bit processors for a good long while; > >> it's > >> just that they'll get pushed further and further down the food chain > >> (while putting severe pressure on the top of whatever ecological niche > >> is left for four-bit processors). > > > > There's another thing too, which is that the small Cortex M0+ parts have > > sizes like 32K flash, 4K ram, which is perfectly fine for my purposes > > and lots of other purposes. Yet it's a low enough amount that there > > will always be some scarcity of memory, which means burning 32 bits for > > every pointer when everything fits in 16 bits seems painful. (I don't > > know the ARM instruction set so maybe there's a way to use 16 bit > > pointers?). > > Unless you use function pointers it should not be necessary to explicitly > store pointers at all. And even if you use function pointers your > compiler should allow you to store them as uint16_t.
How does that work if the entity for which you need a pointer is above address 0xFFFF? That would include all the peripheral registers, RAM and Flash on many ARM variants.
> > > I'm presuming one can use 16 bit integers without having to > > run extra code masking off int32's. > > Register variables should be 32-bit if you don't want masking. And ARM > does have instructions for loading and storing 8- and 16-bit signed and > unsigned values.
Mark Borgerson
In article <512ca096$0$6908$e4fe514c@news2.news.xs4all.nl>, usenet+5@c-
scape.nl says...
> > On 02/26/2013 11:49 AM, Boudewijn Dijkstra wrote: > > >> There's another thing too, which is that the small Cortex M0+ parts have > >> sizes like 32K flash, 4K ram, which is perfectly fine for my purposes > >> and lots of other purposes. Yet it's a low enough amount that there > >> will always be some scarcity of memory, which means burning 32 bits for > >> every pointer when everything fits in 16 bits seems painful. (I don't > >> know the ARM instruction set so maybe there's a way to use 16 bit > >> pointers?). > > > > Unless you use function pointers it should not be necessary to > > explicitly store pointers at all. And even if you use function pointers > > your compiler should allow you to store them as uint16_t. > > Depends. For a linked list, for example, storing pointers to structs is > a natural solution. Of course, in most embedded projects you won't have > too many of those. > > I think the biggest impact will be the size of the stack. Even small > local loop counters will be stored on the stack as 32 bit entries.
That may be compiler dependent. I don't think there is any particular reason that you can't have bytes and 16-bit words on the stack. The compiler may have to do some alignment if you also have 32-bit word variables on the stack. If your code is simple enough and your compiler smart enough, small local loop counters will be in registers. Mark Borgerson
In article <op.ws3816wyj0diun@thanatos.indes.com>, 
sp4mtr4p.boudewijn@indes.com says...
> > Op Tue, 26 Feb 2013 12:46:30 +0100 schreef Arlet Ottens > <usenet+5@c-scape.nl>: > > On 02/26/2013 11:49 AM, Boudewijn Dijkstra wrote: > > > >>> There's another thing too, which is that the small Cortex M0+ parts > >>> have > >>> sizes like 32K flash, 4K ram, which is perfectly fine for my purposes > >>> and lots of other purposes. Yet it's a low enough amount that there > >>> will always be some scarcity of memory, which means burning 32 bits for > >>> every pointer when everything fits in 16 bits seems painful. (I don't > >>> know the ARM instruction set so maybe there's a way to use 16 bit > >>> pointers?). > >> > >> Unless you use function pointers it should not be necessary to > >> explicitly store pointers at all. And even if you use function pointers > >> your compiler should allow you to store them as uint16_t. > > > > Depends. For a linked list, for example, storing pointers to structs is > > a natural solution. > > If you have 4K RAM, then a linked list is not a natural solution. ;) > > > Of course, in most embedded projects you won't have too many of those. > > Save the unavoidable ones like linked DMA transfer descriptors. > > > I think the biggest impact will be the size of the stack. Even small > > local loop counters will be stored on the stack as 32 bit entries. > > Although the size of stack entries cannot be lowered, peak stack usage can > often be reduced by tricks like smart variable relocation. And of course a > good compiler also helps.
The IAR compiler for the ARM is perfectly happy to use 16-bit stack entries for local loop variables. I compiled the following code: void TestStackVars(void){ volatile uint16_t i,j,k,l,m, result; for(i=0;i<10;i++){ for(j=0;j<10;j++){ for(k=0;k<10;k++){ for(l=0;l<4;l++){ for(m=0;m<4;m++){ result = i*j*k*l*m; } } } } } printf("Result = %u\n",result); } If I didn't qualify the variables as volatile, the compiler used 5 registers and nothing on the stack. Here is a portion of the generated assembly code: // 206 void TestStackVars(void){ TestStackVars: PUSH {R5-R7,LR} CFI R14 Frame(CFA, -4) CFI CFA R13+16 // 207 volatile uint16_t i,j,k,l,m, result; // 208 for(i=0;i<10;i++){ MOVS R0,#+0 STRH R0,[SP, #+8] B.N ??TestStackVars_0 ??TestStackVars_1: LDRH R0,[SP, #+8] ADDS R0,R0,#+1 STRH R0,[SP, #+8] ??TestStackVars_0: LDRH R0,[SP, #+8] CMP R0,#+10 BCS.N ??TestStackVars_2 // 209 for(j=0;j<10;j++){ MOVS R0,#+0 STRH R0,[SP, #+6] B.N ??TestStackVars_3 ??TestStackVars_4: LDRH R0,[SP, #+6] ADDS R0,R0,#+1 STRH R0,[SP, #+6] ??TestStackVars_3: LDRH R0,[SP, #+6] CMP R0,#+10 BCS.N ??TestStackVars_1 // 210 for(k=0;k<10;k++){ MOVS R0,#+0 STRH R0,[SP, #+4] B.N ??TestStackVars_5 ??TestStackVars_6: LDRH R0,[SP, #+4] ADDS R0,R0,#+1 STRH R0,[SP, #+4] ??TestStackVars_5: As you can see, the variable offsets on the stack increase by two and LDRH and STRH are used to access the variables. When I changed the variables to uint8_t and recompiled, the offsets on the stack were 1, but the compiler did reserve stack space as a multiple of 32-bits---8 bytes in this case --to hold 6 bytes worth of variables. I think this was done to keep the stack pointer on a 32-bit word boundary. Mark Borgerson
On 02/26/2013 04:34 PM, Mark Borgerson wrote:

>> I think the biggest impact will be the size of the stack. Even small >> local loop counters will be stored on the stack as 32 bit entries. > > That may be compiler dependent. I don't think there is any particular > reason that you can't have bytes and 16-bit words on the stack. The > compiler may have to do some alignment if you also have 32-bit word > variables on the stack. > > If your code is simple enough and your compiler smart enough, small > local loop counters will be in registers.
I meant that whenever you call a sub-function, the caller-saved registers will be saved on the stack as 32 bit values, even though you may only have been using them as 8 or 16 bit variables. Similarly when you get an interrupt.
In article <512cdf23$0$6898$e4fe514c@news2.news.xs4all.nl>, usenet+5@c-
scape.nl says...
> > On 02/26/2013 04:34 PM, Mark Borgerson wrote: > > >> I think the biggest impact will be the size of the stack. Even small > >> local loop counters will be stored on the stack as 32 bit entries. > > > > That may be compiler dependent. I don't think there is any particular > > reason that you can't have bytes and 16-bit words on the stack. The > > compiler may have to do some alignment if you also have 32-bit word > > variables on the stack. > > > > If your code is simple enough and your compiler smart enough, small > > local loop counters will be in registers. > > I meant that whenever you call a sub-function, the caller-saved > registers will be saved on the stack as 32 bit values, even though you > may only have been using them as 8 or 16 bit variables. Similarly when > you get an interrupt.
I agree. It's one of those tradeoffs you don't appreciate until you actually look at the assembly code. If you let the compiler use registers for the 8 or 16-bit variables, it has to push more 32-bit registers onto the stack, but the generated code is shorter and faster. If you specify variables on the stack, the compiler will only use as much stack as is needed for the variable and you won't need to push and pop as many registers. However, the code will take more instructions for each operation and will be longer and slower. As a programmer, you get to balance RAM/stack usage, flash usage, and execution speed. That kind of balancing act is one of the things that separates embedded programmers from Linux and Windows programmers. Mark Borgerson
On Tue, 26 Feb 2013 14:57:32 +0100, "Boudewijn Dijkstra"
<sp4mtr4p.boudewijn@indes.com> wrote:

>Op Tue, 26 Feb 2013 12:46:30 +0100 schreef Arlet Ottens ><usenet+5@c-scape.nl>: >> On 02/26/2013 11:49 AM, Boudewijn Dijkstra wrote: >> >>>> There's another thing too, which is that the small Cortex M0+ parts >>>> have >>>> sizes like 32K flash, 4K ram, which is perfectly fine for my purposes >>>> and lots of other purposes. Yet it's a low enough amount that there >>>> will always be some scarcity of memory, which means burning 32 bits for >>>> every pointer when everything fits in 16 bits seems painful. (I don't >>>> know the ARM instruction set so maybe there's a way to use 16 bit >>>> pointers?). >>> >>> Unless you use function pointers it should not be necessary to >>> explicitly store pointers at all. And even if you use function pointers >>> your compiler should allow you to store them as uint16_t. >> >> Depends. For a linked list, for example, storing pointers to structs is >> a natural solution.
To Arlet: Agreed, it's a "natural" solution because that is the way it is usually taught in school and in many books. It's NOT the only solution for creating linked lists, though. In fact, I use a different method in an operating system I wrote to avoid the need of memory pointers while retaining linked lists (which I do need.)
>If you have 4K RAM, then a linked list is not a natural solution. ;) ><snip>
To Boudewijn: I routinely use linked lists with microcontrollers (which in some cases I deal with may have less than 1k ram) with an operating system I wrote and often use where it makes sense for the application. The O/S needs linked lists to support the run queue, the sleep queue, and the semaphore queues, for example. The method does NOT use memory pointers as data elements but instead uses 1 byte ram as an index into the "array" to achieve this. It's a very simple method that one typically uses in languages which do NOT support memory pointer semantics (like early BASIC.) Works fine. Very light on ram usage. It allows either singly or doubly linked lists, so you can shave off a byte for each thread and the head/tail of each queue. A minimal system with 3 threads, support for run and sleep queues and 4 semaphore queues and support for priorities, requires about 20 bytes of ram for the O/S. I'd consider 4k ram to actually be very "roomy" for what I'd designed this O/S to operate within (not 32-bit alu systems.) Linked lists are always a natural solution for operating systems. It's just a matter of how you implement them. Jon
On 02/26/2013 06:41 PM, Jon Kirwan wrote:

> I routinely use linked lists with microcontrollers (which in > some cases I deal with may have less than 1k ram) with an > operating system I wrote and often use where it makes sense > for the application. The O/S needs linked lists to support > the run queue, the sleep queue, and the semaphore queues, for > example. The method does NOT use memory pointers as data > elements but instead uses 1 byte ram as an index into the > "array" to achieve this. It's a very simple method that one > typically uses in languages which do NOT support memory > pointer semantics (like early BASIC.) Works fine. Very light > on ram usage. It allows either singly or doubly linked lists, > so you can shave off a byte for each thread and the head/tail > of each queue. A minimal system with 3 threads, support for > run and sleep queues and 4 semaphore queues and support for > priorities, requires about 20 bytes of ram for the O/S.
It does have a limitation that the number of items in the list must be statically allocated. Of course, on a memory restricted embedded CPU, you usually need to do that anyway, so in practice it's not a big deal. By the way, I wrote a simple multi tasking scheduler for ARM7/Cortex that keeps the run/sleep/semaphore queues in 32 bit integers. Each task is represented by a bit. Waking up a bunch of tasks is just a bitwise OR operation. Finding the highest priority task is just a matter of finding the highest set bit, and some ARM cores can do that with a single CLZ (count leading zeroes) instruction.
> I'd consider 4k ram to actually be very "roomy" for what I'd > designed this O/S to operate within (not 32-bit alu systems.) > Linked lists are always a natural solution for operating > systems. It's just a matter of how you implement them.
On Feb 26, 8:06&#4294967295;pm, Arlet Ottens <usene...@c-scape.nl> wrote:
> ... > By the way, I wrote a simple multi tasking scheduler for ARM7/Cortex > that keeps the run/sleep/semaphore queues in 32 bit integers. Each task > is represented by a bit. Waking up a bunch of tasks is just a bitwise OR > operation. Finding the highest priority task is just a matter of finding > the highest set bit, and some ARM cores can do that with a single CLZ > (count leading zeroes) instruction. > ....
My approach on DPS (which is a full blown OS for power but recently I used its scheduler on a "small" 16k RAM CF, MCF52211) was (is) different. It is up to each task to state it is OK as far as it is concerned to "nap" by exiting (just for cooperative rescheduling) via this or that call (i.e. it can exit and allow nap or exit and disallow it). Obviously if a task does not exit cooperatively but because its time slot has expired the nap is disallowed. Then the scheduler decides whether to put the machine into nap for some time or not; it will nap it only if all tasks have exited allowing that. Dimiter ------------------------------------------------------ Dimiter Popoff Transgalactic Instruments http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
On Tue, 26 Feb 2013 11:05:53 -0800 (PST), dp <dp@tgi-sci.com>
wrote:

>On Feb 26, 8:06&#4294967295;pm, Arlet Ottens <usene...@c-scape.nl> wrote: >> ... >> By the way, I wrote a simple multi tasking scheduler for ARM7/Cortex >> that keeps the run/sleep/semaphore queues in 32 bit integers. Each task >> is represented by a bit. Waking up a bunch of tasks is just a bitwise OR >> operation. Finding the highest priority task is just a matter of finding >> the highest set bit, and some ARM cores can do that with a single CLZ >> (count leading zeroes) instruction. >> .... > >My approach on DPS (which is a full blown OS for power but recently >I used its scheduler on a "small" 16k RAM CF, MCF52211) was (is) >different. >It is up to each task to state it is OK as far as it is concerned >to "nap" by exiting (just for cooperative rescheduling) via this >or that call (i.e. it can exit and allow nap or exit and disallow >it). Obviously if a task does not exit cooperatively but because >its time slot has expired the nap is disallowed. Then the scheduler >decides whether to put the machine into nap for some time or not; >it will nap it only if all tasks have exited allowing that. > >Dimiter
Agreed. When possible on the micro, I will often "wait" on a halt instruction of the processor, when all processes are sleeping or waiting on a semaphore that may be changed on an interrupt event. The timer event ticks, waits up the processor, decrements a single counter for the top sleeping queue entry (delta times are used, so all threads keep delta times relative to the thread ahead of them and only the top thread needs to be decremented) and if it becomes zero, then all threads with zero deltas (the current one and any following ones with zero, also) are moved to the run queue (by moving exactly ONE index value.) Since a thread is now ready to run, that is executed. But otherwise, with nothing in the ready queue, it can just sit on a halt if that is appropriate. Jon
On 02/26/2013 08:55 PM, Jon Kirwan wrote:

> When possible on the micro, I will often "wait" on a halt > instruction of the processor, when all processes are sleeping > or waiting on a semaphore that may be changed on an interrupt > event. The timer event ticks, waits up the processor, > decrements a single counter for the top sleeping queue entry > (delta times are used, so all threads keep delta times > relative to the thread ahead of them and only the top thread > needs to be decremented) and if it becomes zero, then all > threads with zero deltas (the current one and any following > ones with zero, also) are moved to the run queue (by moving > exactly ONE index value.) Since a thread is now ready to run, > that is executed. But otherwise, with nothing in the ready > queue, it can just sit on a halt if that is appropriate.
I just use a timer counter per thread. The timer interrupt decrements all of them in a small loop. Since I never have more than a handful of threads, the loop is really short and simple: for( i = 0; i < NR_TASKS; i++, task++ ) { if( task->timer && !--task->timer ) mask |= (1 << i); } wakeup( mask ); On the other hand, starting/stopping/adjusting timers is just a single write to task->timer.
The 2026 Embedded Online Conference