EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

Really tiny microcontrollers

Started by Paul Rubin February 25, 2013
On Tue, 26 Feb 2013 21:20:04 +0100, Arlet Ottens
<usenet+5@c-scape.nl> wrote:

>On 02/26/2013 08:55 PM, Jon Kirwan wrote: > >> When possible on the micro, I will often "wait" on a halt >> instruction of the processor, when all processes are sleeping >> or waiting on a semaphore that may be changed on an interrupt >> event. The timer event ticks, waits up the processor, >> decrements a single counter for the top sleeping queue entry >> (delta times are used, so all threads keep delta times >> relative to the thread ahead of them and only the top thread >> needs to be decremented) and if it becomes zero, then all >> threads with zero deltas (the current one and any following >> ones with zero, also) are moved to the run queue (by moving >> exactly ONE index value.) Since a thread is now ready to run, >> that is executed. But otherwise, with nothing in the ready >> queue, it can just sit on a halt if that is appropriate. > >I just use a timer counter per thread. The timer interrupt decrements >all of them in a small loop. Since I never have more than a handful of >threads, the loop is really short and simple: > > for( i = 0; i < NR_TASKS; i++, task++ ) > { > if( task->timer && !--task->timer ) > mask |= (1 << i); > } > wakeup( mask ); > >On the other hand, starting/stopping/adjusting timers is just a single >write to task->timer.
It's too easy for me to NOT use that approach. A delta queue is trivial to implement and requires no looping (unless there is more than one thread waiting on the same time event.) Threads are inserted into the sleep queue by time, but only the delta is retained. So if you have a thread waiting for 8 clocks at the top and one more with a delay of 2 from that (for 10), then inserting a new thread wanting to wait 9 will first subtract 8 from 9, leaving 1, then will insert itself after the top entry but before the other entry. The resulting sleep queue, with 3 entries, would be 8, 1, 1 for their delta time values (both the inserted and one thread immediately following it would have updated values.) In this way, I only need ONE timer and very few instructions needed to advance the entire queue since only the top one needs to be decremented. For very simple systems (I don't always do this), my thread structure is: Sleep--------------------------------> -> [] -> [] -> [] - / | Ready------------> -> [] -> [] -> [] - | / | Avail--> [] -> [] - | | ^ | |_________________________________________________| It's circular. Moving a thread from sleep to ready is simply moving the sleep queue pointer by 1. Sleep Ready Avail Pointer conditions Description ------------------------------------------------------------- Empty Empty Empty A==R, R==S, S==A Meaningless Empty Empty A==R, R==S, S==A All slots sleeping Empty Empty A==R, R==S, S==A All slots ready Empty Empty A==R, R==S, S==A All slots available Empty A<>R, R<>S, S==A No slots sleeping Empty A<>R, R==S, S<>A No slots ready Empty A==R, R<>S, S<>A No slots available A<>R, R<>S, S<>A No empty queues It's very easy to check indices for the above conditions (or pointers, if used instead.) The above is for very simple cases, though, that are primarily timer-driven and threads move frequently from ready to avail, as well, and all of the pointers move in clockwise (or counter clockwise depending on view) arrangements. I use the more usual methods when this isn't appropriate. But the timer queue design almost always works out well, regardless of the other details. I really like having only ONE entry to check in the frequent and regular case of a timer interrupt event. It keeps variability of latency, absolute latency, and excess cpu usage to minimums. Jon
On 26/02/13 17:06, Mark Borgerson wrote:
> In article <op.ws3816wyj0diun@thanatos.indes.com>, > sp4mtr4p.boudewijn@indes.com says... >> >> Op Tue, 26 Feb 2013 12:46:30 +0100 schreef Arlet Ottens >> <usenet+5@c-scape.nl>: >>> On 02/26/2013 11:49 AM, Boudewijn Dijkstra wrote: >>> >>>>> There's another thing too, which is that the small Cortex M0+ parts >>>>> have >>>>> sizes like 32K flash, 4K ram, which is perfectly fine for my purposes >>>>> and lots of other purposes. Yet it's a low enough amount that there >>>>> will always be some scarcity of memory, which means burning 32 bits for >>>>> every pointer when everything fits in 16 bits seems painful. (I don't >>>>> know the ARM instruction set so maybe there's a way to use 16 bit >>>>> pointers?). >>>> >>>> Unless you use function pointers it should not be necessary to >>>> explicitly store pointers at all. And even if you use function pointers >>>> your compiler should allow you to store them as uint16_t. >>> >>> Depends. For a linked list, for example, storing pointers to structs is >>> a natural solution. >> >> If you have 4K RAM, then a linked list is not a natural solution. ;) >> >>> Of course, in most embedded projects you won't have too many of those. >> >> Save the unavoidable ones like linked DMA transfer descriptors. >> >>> I think the biggest impact will be the size of the stack. Even small >>> local loop counters will be stored on the stack as 32 bit entries. >> >> Although the size of stack entries cannot be lowered, peak stack usage can >> often be reduced by tricks like smart variable relocation. And of course a >> good compiler also helps. > > The IAR compiler for the ARM is perfectly happy to use 16-bit stack > entries for local loop variables. > > I compiled the following code: > > void TestStackVars(void){ > volatile uint16_t i,j,k,l,m, result; > for(i=0;i<10;i++){ > for(j=0;j<10;j++){ > for(k=0;k<10;k++){ > for(l=0;l<4;l++){ > for(m=0;m<4;m++){ > result = i*j*k*l*m; > } > } > } > } > } > printf("Result = %u\n",result); > > } > > If I didn't qualify the variables as volatile, the compiler used > 5 registers and nothing on the stack.
Making these "volatile" completely changes the semantics here. Of course the compiler makes them 16-bit and in-memory, on the stack - there is almost no other choice it could make (I say /almost/, as there is no requirement in C for there to be a stack at all). This is a totally different situation from when the compiler stores registers on the stack as part of the calling convention. I expect that IAR's compiler - and most likely all compilers - will store the full 32-bit registers on the stack if it is unable to eliminate the store altogether.
> > Here is a portion of the generated assembly code: > > // 206 void TestStackVars(void){ > TestStackVars: > PUSH {R5-R7,LR} > CFI R14 Frame(CFA, -4) > CFI CFA R13+16 > // 207 volatile uint16_t i,j,k,l,m, result; > // 208 for(i=0;i<10;i++){ > MOVS R0,#+0 > STRH R0,[SP, #+8] > B.N ??TestStackVars_0 > ??TestStackVars_1: > LDRH R0,[SP, #+8] > ADDS R0,R0,#+1 > STRH R0,[SP, #+8] > ??TestStackVars_0: > LDRH R0,[SP, #+8] > CMP R0,#+10 > BCS.N ??TestStackVars_2 > // 209 for(j=0;j<10;j++){ > MOVS R0,#+0 > STRH R0,[SP, #+6] > B.N ??TestStackVars_3 > ??TestStackVars_4: > LDRH R0,[SP, #+6] > ADDS R0,R0,#+1 > STRH R0,[SP, #+6] > ??TestStackVars_3: > LDRH R0,[SP, #+6] > CMP R0,#+10 > BCS.N ??TestStackVars_1 > // 210 for(k=0;k<10;k++){ > MOVS R0,#+0 > STRH R0,[SP, #+4] > B.N ??TestStackVars_5 > ??TestStackVars_6: > LDRH R0,[SP, #+4] > ADDS R0,R0,#+1 > STRH R0,[SP, #+4] > ??TestStackVars_5: > > As you can see, the variable offsets on the stack increase > by two and LDRH and STRH are used to access the variables. > > When I changed the variables to uint8_t and recompiled, the > offsets on the stack were 1, but the compiler did reserve > stack space as a multiple of 32-bits---8 bytes in this case > --to hold 6 bytes worth of variables. I think this was > done to keep the stack pointer on a 32-bit word boundary. >
Again, this is no surprise whatsoever.
> Mark Borgerson > >
On 26/02/13 17:54, Mark Borgerson wrote:
> In article <512cdf23$0$6898$e4fe514c@news2.news.xs4all.nl>, usenet+5@c- > scape.nl says... >> >> On 02/26/2013 04:34 PM, Mark Borgerson wrote: >> >>>> I think the biggest impact will be the size of the stack. Even small >>>> local loop counters will be stored on the stack as 32 bit entries. >>> >>> That may be compiler dependent. I don't think there is any particular >>> reason that you can't have bytes and 16-bit words on the stack. The >>> compiler may have to do some alignment if you also have 32-bit word >>> variables on the stack. >>> >>> If your code is simple enough and your compiler smart enough, small >>> local loop counters will be in registers. >> >> I meant that whenever you call a sub-function, the caller-saved >> registers will be saved on the stack as 32 bit values, even though you >> may only have been using them as 8 or 16 bit variables. Similarly when >> you get an interrupt. > > I agree. It's one of those tradeoffs you don't appreciate until you > actually look at the assembly code. > > If you let the compiler use registers for the 8 or 16-bit variables, > it has to push more 32-bit registers onto the stack, but the > generated code is shorter and faster. > > If you specify variables on the stack, the compiler will only > use as much stack as is needed for the variable and you won't > need to push and pop as many registers. However, the code > will take more instructions for each operation and will be > longer and slower. > > As a programmer, you get to balance RAM/stack usage, flash usage, and > execution speed. That kind of balancing act is one of the things that > separates embedded programmers from Linux and Windows programmers. >
That is the kind of thing that separates programmers obsessed with micro-optimisations from programmers who know what is important. Leave your local variables as local variables, and let the compiler optimise the code as well as it can. You won't make better code by mucking around trying to make your local variables "volatile" to push them onto the stack. All you will do is make your code much larger, much slower, and much harder to understand. The chances of /really/ saving more than a tiny amount of stack space are small - the compiler will use the registers for other purposes, and save them anyway on function calls (and for interrupts, it must save /all/ the registers that are used). The odds are actually higher that you will use /more/ stack space - by trying to force the compiler like this, you will hinder its optimiser and limit opportunities for inter-procedural optimisations which can significantly reduce stack space (as well as making smaller and faster code). The compiler is better at this than you are. Learn to use your compiler properly - understand its switches and options, and study the generated code. Then let it do its job. Work /with/ your compiler, not against it, and not by ignoring it. /That/ is one of the things that separates embedded programmers from "big system" programmers.
In article <Z-OdnVmQj5RQuLDMnZ2dnUVZ7sGdnZ2d@lyse.net>, 
david.brown@removethis.hesbynett.no says...
> > On 26/02/13 17:06, Mark Borgerson wrote: > > In article <op.ws3816wyj0diun@thanatos.indes.com>, > > sp4mtr4p.boudewijn@indes.com says... > >> > >> Op Tue, 26 Feb 2013 12:46:30 +0100 schreef Arlet Ottens > >> <usenet+5@c-scape.nl>: > >>> On 02/26/2013 11:49 AM, Boudewijn Dijkstra wrote: > >>> > >>>>> There's another thing too, which is that the small Cortex M0+ parts > >>>>> have > >>>>> sizes like 32K flash, 4K ram, which is perfectly fine for my purposes > >>>>> and lots of other purposes. Yet it's a low enough amount that there > >>>>> will always be some scarcity of memory, which means burning 32 bits for > >>>>> every pointer when everything fits in 16 bits seems painful. (I don't > >>>>> know the ARM instruction set so maybe there's a way to use 16 bit > >>>>> pointers?). > >>>> > >>>> Unless you use function pointers it should not be necessary to > >>>> explicitly store pointers at all. And even if you use function pointers > >>>> your compiler should allow you to store them as uint16_t. > >>> > >>> Depends. For a linked list, for example, storing pointers to structs is > >>> a natural solution. > >> > >> If you have 4K RAM, then a linked list is not a natural solution. ;) > >> > >>> Of course, in most embedded projects you won't have too many of those. > >> > >> Save the unavoidable ones like linked DMA transfer descriptors. > >> > >>> I think the biggest impact will be the size of the stack. Even small > >>> local loop counters will be stored on the stack as 32 bit entries. > >> > >> Although the size of stack entries cannot be lowered, peak stack usage can > >> often be reduced by tricks like smart variable relocation. And of course a > >> good compiler also helps. > > > > The IAR compiler for the ARM is perfectly happy to use 16-bit stack > > entries for local loop variables. > > > > I compiled the following code: > > > > void TestStackVars(void){ > > volatile uint16_t i,j,k,l,m, result; > > for(i=0;i<10;i++){ > > for(j=0;j<10;j++){ > > for(k=0;k<10;k++){ > > for(l=0;l<4;l++){ > > for(m=0;m<4;m++){ > > result = i*j*k*l*m; > > } > > } > > } > > } > > } > > printf("Result = %u\n",result); > > > > } > > > > If I didn't qualify the variables as volatile, the compiler used > > 5 registers and nothing on the stack. > > Making these "volatile" completely changes the semantics here. Of > course the compiler makes them 16-bit and in-memory, on the stack - > there is almost no other choice it could make (I say /almost/, as there > is no requirement in C for there to be a stack at all). > > This is a totally different situation from when the compiler stores > registers on the stack as part of the calling convention. I expect that > IAR's compiler - and most likely all compilers - will store the full > 32-bit registers on the stack if it is unable to eliminate the store > altogether. > > > > > Here is a portion of the generated assembly code: > > > > // 206 void TestStackVars(void){ > > TestStackVars: > > PUSH {R5-R7,LR} > > CFI R14 Frame(CFA, -4) > > CFI CFA R13+16 > > // 207 volatile uint16_t i,j,k,l,m, result; > > // 208 for(i=0;i<10;i++){ > > MOVS R0,#+0 > > STRH R0,[SP, #+8] > > B.N ??TestStackVars_0 > > ??TestStackVars_1: > > LDRH R0,[SP, #+8] > > ADDS R0,R0,#+1 > > STRH R0,[SP, #+8] > > ??TestStackVars_0: > > LDRH R0,[SP, #+8] > > CMP R0,#+10 > > BCS.N ??TestStackVars_2 > > // 209 for(j=0;j<10;j++){ > > MOVS R0,#+0 > > STRH R0,[SP, #+6] > > B.N ??TestStackVars_3 > > ??TestStackVars_4: > > LDRH R0,[SP, #+6] > > ADDS R0,R0,#+1 > > STRH R0,[SP, #+6] > > ??TestStackVars_3: > > LDRH R0,[SP, #+6] > > CMP R0,#+10 > > BCS.N ??TestStackVars_1 > > // 210 for(k=0;k<10;k++){ > > MOVS R0,#+0 > > STRH R0,[SP, #+4] > > B.N ??TestStackVars_5 > > ??TestStackVars_6: > > LDRH R0,[SP, #+4] > > ADDS R0,R0,#+1 > > STRH R0,[SP, #+4] > > ??TestStackVars_5: > > > > As you can see, the variable offsets on the stack increase > > by two and LDRH and STRH are used to access the variables. > > > > When I changed the variables to uint8_t and recompiled, the > > offsets on the stack were 1, but the compiler did reserve > > stack space as a multiple of 32-bits---8 bytes in this case > > --to hold 6 bytes worth of variables. I think this was > > done to keep the stack pointer on a 32-bit word boundary. > > > > Again, this is no surprise whatsoever. >
I understand what you're saying and I pretty much agree. I've hardly ever had to force an ARM compiler to do something or had to resort to assembly language. I've used assembly language in the past for the MSP430 and M68K when the compiler was either inefficient or refused to use optimum instruction (The DBNE loop instruction on the M68K comes to mind). The compiler certainly is better than I am at producing good ARM code. However, my example is there to contest the following statement:
>>> I think the biggest impact will be the size of the stack. Even small >>> local loop counters will be stored on the stack as 32 bit entries.
That's not always true and I wrote the example to illustrate that fact. I suppose I could have avoided the "volatile" keyword by simply adding a few more levels of nested loops to the point where the compiler register allocation algorithm ran out of free registers. Unless your processor is VERY limited in RAM space, such considerations are probably best left to the compiler. I have run into some tight RAM problems on MSP430 systems where I needed a couple of 512-byte SD buffers, an inut data queue, and general variables on a system with just 2KB RAM. Things were tight, but a long test under real world conditions with stack markers showed I had almost 5% of the RAM unused for stack space. I later shifted to a processor with more RAM when I found that long delays in SD writes were getting close to input queue overflow. Mark Borgerson
On Mon, 25 Feb 2013 18:16:40 +0100, Olaf Kaluza <olaf@criseis.ruhr.de>
wrote:

>Paul Rubin <no.email@nospam.invalid> wrote: > > >That's good to know about, and impressive. Is it possible to solder > >that part with a reflow oven and some tweezers, or does it need machine > >placement? > >It is not a problem. I do all my prototype by hand-placement. But this >is heplful: > >http://www.smtinspection.com/Mantis-Microscope/
You can see under a BGA with that???
Op Tue, 26 Feb 2013 17:06:23 +0100 schreef Mark Borgerson  
<mborgerson@comcast.net>:
> In article <op.ws3816wyj0diun@thanatos.indes.com>, > sp4mtr4p.boudewijn@indes.com says... >> >> Op Tue, 26 Feb 2013 12:46:30 +0100 schreef Arlet Ottens >> <usenet+5@c-scape.nl>: >> > On 02/26/2013 11:49 AM, Boudewijn Dijkstra wrote: >> > >> [...] >> >> > I think the biggest impact will be the size of the stack. Even small >> > local loop counters will be stored on the stack as 32 bit entries. >> >> Although the size of stack entries cannot be lowered, peak stack usage >> can >> often be reduced by tricks like smart variable relocation. And of >> course a good compiler also helps. > > The IAR compiler for the ARM is perfectly happy to use 16-bit stack > entries for local loop variables.
I should know that and in hindsight I do, thanks for reminding me that stack is not only used as call stack but also for local variable frames.
> [...] > > When I changed the variables to uint8_t and recompiled, the > offsets on the stack were 1, but the compiler did reserve > stack space as a multiple of 32-bits---8 bytes in this case > --to hold 6 bytes worth of variables. I think this was > done to keep the stack pointer on a 32-bit word boundary.
Yes, an 8-byte boundary even, in the latest EABI. -- Gemaakt met Opera's revolutionaire e-mailprogramma: http://www.opera.com/mail/
Op Tue, 26 Feb 2013 18:41:02 +0100 schreef Jon Kirwan  
<jonk@infinitefactors.org>:
> On Tue, 26 Feb 2013 14:57:32 +0100, "Boudewijn Dijkstra" > <sp4mtr4p.boudewijn@indes.com> wrote: >> Op Tue, 26 Feb 2013 12:46:30 +0100 schreef Arlet Ottens >> <usenet+5@c-scape.nl>: >>> On 02/26/2013 11:49 AM, Boudewijn Dijkstra wrote: >>> [...] > >> If you have 4K RAM, then a linked list is not a natural solution. ;) >> <snip> > > [My] method does NOT use memory pointers as data > elements but instead uses 1 byte ram as an index into the > "array" to achieve this.
I am familiar with this technique but I didn't know that people would call this "linked list" as there no conceptual linking going on. I would call this something like indirect indexing.
> [...] It allows either singly or doubly linked lists, > so you can shave off a byte for each thread and the head/tail > of each queue.
I may be misunderstanding something here, but I would think that you always have a doubly linked list because you can traverse the array of indices in two directions. -- Gemaakt met Opera's revolutionaire e-mailprogramma: http://www.opera.com/mail/
Op Tue, 26 Feb 2013 16:31:09 +0100 schreef Mark Borgerson  
<mborgerson@comcast.net>:
> In article <op.ws30djraj0diun@thanatos.indes.com>, > sp4mtr4p.boudewijn@indes.com says... >> Op Tue, 26 Feb 2013 09:44:43 +0100 schreef Paul Rubin >> <no.email@nospam.invalid>: >> > Tim Wescott <tim@seemywebsite.com> writes: >> >> I suspect that there will be 8-bit processors for a good long while; >> >> it's >> >> just that they'll get pushed further and further down the food chain >> >> (while putting severe pressure on the top of whatever ecological >> niche >> >> is left for four-bit processors). >> > >> > There's another thing too, which is that the small Cortex M0+ parts >> have >> > sizes like 32K flash, 4K ram, which is perfectly fine for my purposes >> > and lots of other purposes. Yet it's a low enough amount that there >> > will always be some scarcity of memory, which means burning 32 bits >> for >> > every pointer when everything fits in 16 bits seems painful. (I don't >> > know the ARM instruction set so maybe there's a way to use 16 bit >> > pointers?). >> >> Unless you use function pointers it should not be necessary to >> explicitly >> store pointers at all. And even if you use function pointers your >> compiler should allow you to store them as uint16_t. > > How does that work if the entity for which you need a pointer is > above address 0xFFFF?
I was talking in the context of <=64K flash (and note that there are tricks to stretch that limit for function pointers and techniques to place objects below a certain limit).
> That would include all the peripheral > registers, RAM and Flash on many ARM variants.
When I talked about "explicitly storing a pointer", I meant explicitly declaring a pointer variable or constant. This is not needed for peripherals as they can be accessed via a literal base pointer to a struct, nor for RAM (unless you use dynamic allocation), nor for Flash constants, the only exception in the OP's case is function pointers. -- Gemaakt met Opera's revolutionaire e-mailprogramma: http://www.opera.com/mail/
On 27/02/13 02:11, Mark Borgerson wrote:
> In article <Z-OdnVmQj5RQuLDMnZ2dnUVZ7sGdnZ2d@lyse.net>, > david.brown@removethis.hesbynett.no says... >> >> On 26/02/13 17:06, Mark Borgerson wrote:
<snip>
>>> As you can see, the variable offsets on the stack increase >>> by two and LDRH and STRH are used to access the variables. >>> >>> When I changed the variables to uint8_t and recompiled, the >>> offsets on the stack were 1, but the compiler did reserve >>> stack space as a multiple of 32-bits---8 bytes in this case >>> --to hold 6 bytes worth of variables. I think this was >>> done to keep the stack pointer on a 32-bit word boundary. >>> >> >> Again, this is no surprise whatsoever. >> > I understand what you're saying and I pretty much agree. I've > hardly ever had to force an ARM compiler to do something > or had to resort to assembly language. I've used assembly > language in the past for the MSP430 and M68K when the compiler > was either inefficient or refused to use optimum instruction > (The DBNE loop instruction on the M68K comes to mind). The compiler > certainly is better than I am at producing good ARM code.
I found M68K compilers could often generate DBNE loops if you made sure the index was 16-bit, and you were careful about exactly what was in the loop, and how the condition was checked. But YMMV. I haven't had to use much assembly on the msp430 since the early days of mspgcc - the compiler does an excellent job. In general, it is not often that I have to write assembly to improve on a compiler (obviously you need assembly for features that don't exist in C, if the compiler or library does not support it).
> > However, my example is there to contest the following statement: > >>>> I think the biggest impact will be the size of the stack. Even small >>>> local loop counters will be stored on the stack as 32 bit entries. > > That's not always true and I wrote the example to illustrate that > fact. I suppose I could have avoided the "volatile" keyword by > simply adding a few more levels of nested loops to the point where > the compiler register allocation algorithm ran out of free registers.
I suspect if you did that, they would be stored as 32-bit registers on the stack as that is the fastest size for the processor - the variable would initially be in a register, and would only move onto the stack as an overflow. I know what you are trying to say here, but the fact is that most compilers will normally store such variables as 32-bit if they have to put them on the stack (which they will not do unless it is absolutely necessary). You are right that they will not /always/ be 32-bit on the stack - but they will be 32-bit most of the time.
> > Unless your processor is VERY limited in RAM space, such considerations > are probably best left to the compiler. I have run into some tight > RAM problems on MSP430 systems where I needed a couple of 512-byte > SD buffers, an inut data queue, and general variables on a system with > just 2KB RAM. Things were tight, but a long test under real world > conditions with stack markers showed I had almost 5% of the RAM > unused for stack space. I later shifted to a processor with more > RAM when I found that long delays in SD writes were getting close > to input queue overflow. >
I worked with a COP8 program in assembly which practically filled the 32K OTP rom and 512 byte ram. By the end of the last version of the program, bug fixes or feature changes could take minutes to find a solution, but days to find a spare bit (not byte) of ram, or a couple of spare bytes of rom space. It also had an external flash for logging data, with bus cycle times measured in milliseconds. mvh., David
In article <ifadnUS_WvSfn7PMnZ2dnUVZ7sidnZ2d@lyse.net>, 
david@westcontrol.removethisbit.com says...
> > On 27/02/13 02:11, Mark Borgerson wrote: > > In article <Z-OdnVmQj5RQuLDMnZ2dnUVZ7sGdnZ2d@lyse.net>, > > david.brown@removethis.hesbynett.no says... > >> > >> On 26/02/13 17:06, Mark Borgerson wrote: > > <snip> > > >>> As you can see, the variable offsets on the stack increase > >>> by two and LDRH and STRH are used to access the variables. > >>> > >>> When I changed the variables to uint8_t and recompiled, the > >>> offsets on the stack were 1, but the compiler did reserve > >>> stack space as a multiple of 32-bits---8 bytes in this case > >>> --to hold 6 bytes worth of variables. I think this was > >>> done to keep the stack pointer on a 32-bit word boundary. > >>> > >> > >> Again, this is no surprise whatsoever. > >> > > I understand what you're saying and I pretty much agree. I've > > hardly ever had to force an ARM compiler to do something > > or had to resort to assembly language. I've used assembly > > language in the past for the MSP430 and M68K when the compiler > > was either inefficient or refused to use optimum instruction > > (The DBNE loop instruction on the M68K comes to mind). The compiler > > certainly is better than I am at producing good ARM code. > > I found M68K compilers could often generate DBNE loops if you made sure > the index was 16-bit, and you were careful about exactly what was in the > loop, and how the condition was checked. But YMMV. > > I haven't had to use much assembly on the msp430 since the early days of > mspgcc - the compiler does an excellent job. In general, it is not > often that I have to write assembly to improve on a compiler (obviously > you need assembly for features that don't exist in C, if the compiler or > library does not support it). > > > > > However, my example is there to contest the following statement: > > > >>>> I think the biggest impact will be the size of the stack. Even small > >>>> local loop counters will be stored on the stack as 32 bit entries. > > > > That's not always true and I wrote the example to illustrate that > > fact. I suppose I could have avoided the "volatile" keyword by > > simply adding a few more levels of nested loops to the point where > > the compiler register allocation algorithm ran out of free registers. > > I suspect if you did that, they would be stored as 32-bit registers on > the stack as that is the fastest size for the processor - the variable > would initially be in a register, and would only move onto the stack as > an overflow.
Hmmm. I wasn't aware that using LDRH and STRH to store 16-bit half- words took any longer than the 32-bit LDR and STR instructions.
> > I know what you are trying to say here, but the fact is that most > compilers will normally store such variables as 32-bit if they have to > put them on the stack (which they will not do unless it is absolutely > necessary). You are right that they will not /always/ be 32-bit on the > stack - but they will be 32-bit most of the time.
So why did that change when I used the 'volatile' keyword? Using that keyword should not have changed any of the supposed advantages of using 32-bit stack entries.
> > > > > Unless your processor is VERY limited in RAM space, such considerations > > are probably best left to the compiler. I have run into some tight > > RAM problems on MSP430 systems where I needed a couple of 512-byte > > SD buffers, an inut data queue, and general variables on a system with > > just 2KB RAM. Things were tight, but a long test under real world > > conditions with stack markers showed I had almost 5% of the RAM > > unused for stack space. I later shifted to a processor with more > > RAM when I found that long delays in SD writes were getting close > > to input queue overflow. > > > > I worked with a COP8 program in assembly which practically filled the > 32K OTP rom and 512 byte ram. By the end of the last version of the > program, bug fixes or feature changes could take minutes to find a > solution, but days to find a spare bit (not byte) of ram, or a couple of > spare bytes of rom space. It also had an external flash for logging > data, with bus cycle times measured in milliseconds.
That brings back terrible memories of the first programs I wrote in C for a 2KB Flash 8051 variant. I still shudder when I think of all the different pointer types!
> > mvh., > > David
Mark Borgerson
The 2026 Embedded Online Conference