Developing/compiling software| page 10

Reply by Ulf Samuelsson ●September 25, 20092009-09-25

David Brown skrev:
> Ulf Samuelsson wrote:
>> The GNU toolchain can be OK, and it can be horrible.
>> If you look at ST's home page you will find some discussion
>> about performance of GCC-4.2.1 on the STM32.
>>
> 
> Could you provide a link to this?  I could not see any such discussion.
> 
> I note that gcc-4.2.1 was the CodeSourcery release two years ago, when 
> Thumb-2 support was very new in gcc.  And if the gcc-4.2.1 in question 
> was not from CodeSourcery but based on the official FSF tree, then I 
> don't think it had Thumb-2 at all.  It is very important with gcc to be 
> precise about the source and versions - particularly so since 
> CodeSourcery (who maintain the ARM ports amongst others) have 
> target-specific features long before they become part of the official 
> FSF tree.
> 
>> The rumoured 90 MIPS becomes:
>>
>> wait for it...
>>
>> 32 MIPS...
>>
>> With a Keil compiler you can reach about 60-65 MIPS at least with
>> a 72 MHz Cortex-M3.
>>
>> Anyone seen improvement in later gcc versions?
>>
> 
> I would be very surprised to see any major ARM compiler generating code 
> at twice the speed of another major ARM compiler, whether we are talking 
> gcc or commercial compilers.  To me, this indicates either something odd 
> about the benchmark code, something wrong in the use of the tools (such 
> as compiler flags or libraries), or something wrong in the setup of the 
> device in question (maybe failing to set clock speeds or wait states 
> correctly).
> 
> If there was consistently such a big difference, I would not expect 
> gcc-based development tools to feature so prominently on websites such 
> as ST's or TI (Luminary Micros) - a compiler as bad as you suggest here 
> would put the devices themselves in a very bad light.
> 
> I haven't used the ST32 devices, but I am considering TI's Cortex-M3 for 
> a project, so I interested in the state of development tools for the 
> same core.
> 
>> ...
>> On the AVR I noted things like pushing ALL registers
>> when entering an interrupt.
> 
> avr-gcc does /not/ push all registers when entering an interrupt.  It 
> does little for the credibility of your other points when you make such 
> widely inaccurate claims.

In the case I investigated for a customer
(which was more than one year ago)
the interrupt routines took a lot longer time to execute,
and this causes a lot of grievance.


> 
> avr-gcc always pushes three registers in interrupts - SREG, and its 
> "zero" register and "tmp" register because some code sequences generated 
> by avr-gcc make assumptions about being able to use these registers. 
> Theoretically, these could be omitted in some cases, but it turns out to 
> be a difficult to do in avr-gcc, and the advantages are small (for 
> non-trivial interrupt functions).  No one claims that avr-gcc is 
> perfect, merely that it is very good.



> 
> Beyond that, avr-gcc pushes registers if they are needed - pretty much 
> like any other compiler I have used.  If your interrupt function calls 
> an external function, and you are not using whole-program optimisation, 
> then this means pushing all ABI "volatile" registers - an additional 12 
> registers.  Again, this is the same as for any other compiler I have 
> seen.  And as with any other compiler, you avoid the overhead by keeping 
> your interrupt functions small and avoiding external function calls, or 
> by using whole-program optimisations.
> 
>> The IAR is simply - better - .
>>
> 
> I'll not argue with you about IAR producing somewhat smaller or faster 
> code than avr-gcc.  I have only very limited experience with IAR, so I 
> can't judge properly.  But then, you apparently have very little 
> experience with avr-gcc - 

I don't disagree with that.
I have both, but I quickly scurry back to the IAR compiler
if I need to show off the AVR.


 > few people have really studied and compared
> both compilers in a fair and objective test.  There is certainly room 
> for improvement in avr-gcc - there are people working on it, and it gets 
> better over time.
> 
> But to say "IAR is simply better" is too sweeping a statement to be 
> taken seriously, since "better" means so many different things to 
> different people.

OK, let me rephrase: It generally outputs smaller and faster code.

> 
>> The gcc compiler can be OK, as shown with the AVR32 gnu compiler.
>>
> 
> To go back to your original statement, "The GNU toolchain can be OK, and 
> it can be horrible", I agree in general - although I'd rate the range a 
> bit higher (from "very good" down to "pretty bad", perhaps).  There have 
> been gcc ports in the past that could rate as "horrible", but I don't 
> think that applies to any modern gcc port in serious active use.
> 
>>
BR
Ulf Samuelsson

Reply by David Brown ●September 26, 20092009-09-26

Niklas Holsti wrote:
> FreeRTOS info wrote:
>>
>> "ChrisQ" <meru@devnull.com> wrote in message 
>> news:sK4vm.199649$AC5.36013@newsfe06.ams2...
>>> FreeRTOS info wrote:
>>>
>>>>
>>>> GCC and IAR compilers do very different things on the AVR - the 
>>>> biggest difference being that IAR use two stacks whereas GCC uses 
>>>> one.  This makes IAR more difficult to setup and tune, and GCC 
>>>> slower and clunkier because it has to disable interrupts for a few 
>>>> instructions on every function call. Normally this is not a problem, 
>>>> but it is not as elegant as the two stack solution for sure.  GCC is 
>>>> very popular on the AVR though, and is good enough for most 
>>>> applications, especially used in combination with the other free AVR 
>>>> tools such as AVRStudio.
>>>>
>>>
>>> Can you elaborate a bit as to why 2 stacks are used with IAR ?. 
>>> Haven't user avr, so have no real experience. The AVR 32 has shadow 
>>> register sets, including stacks for each processor and exception 
>>> mode. Thus, separate initialisation on startup, but so do Renasas 
>>> 80C87 and some arm machines. How does gcc work for arm, for example ?.
>>
>>
>> I have not gone back to check, but from memory (might not be 
>> completely accurate) the AVR uses two 8 bit registers to implement a 
>> 16 bit stack pointer.  When entering/exiting a function the stack 
>> pointer has to potentially be updated as two separate operations, and 
>> you don't want the update to be split by an interrupt occuring half 
>> way through.
> 
> Adding a bit to Richard's reply: The AVR call and return instructions 
> update the 16-bit "hardware" stack pointer (to push and pop the return 
> address) but they do so atomically, so they don't need interrupt 
> disabling. But gcc uses the "hardware" stack also for data, and must 
> then update the stack pointer as two 8-bit parts, which needs interrupt 
> disabling as Richard describes above.
> 
> The IAR compiler uses the AVR Y register (a pair of 8-bit registers 
> making up a 16-bit number) as the stack pointer for the second, 
> compiler-defined "software" stack. IAR still uses the hardware stack for 
> return addresses, so it still uses the normal call and return 
> instructions (usually), but it puts all stack-allocated data on the 
> software stack accessed via the Y register. The AVR provides 
> instructions that can increment or decrement the Y register atomically, 
> as a 16-bit entity, and the IAR compiler's function prologues/epilogues 
> often use these instructions. However, sometimes the IAR compiler 
> generates code that adds or subtracts a larger number (> 1) to/from Y, 
> and then it must use two 8-bit operations, and must disable interrupts 
> just as gcc does.
> 
> Conclusion: the frequency of interrupt disabling is probably less in 
> IAR-generated code than in gcc-generated code, but the impact in terms 
> of an increased worst-case interrupt response latency is the same.
> 

One point to remember here is that this only applies to functions that 
need to allocate a stack frame for data on the stack.  The AVR has a 
fair number of registers, so that a great many functions do not require 
data to be allocated on the stack, and thus don't need such a stack 
frame.  I had a quick "grep" through a medium-sized project (20K code) 
for which I happened to have listing files - there were only two 
functions in the entire project that had a stack frame.  For the great 
majority of the time, it is sufficient to save and restore registers 
using push and pop.  For AVR compilers that use a separate data stack (I 
am familiar with ImageCraft rather than IAR, but the technique is the 
same), saving and restoring on the data stack via Y++/Y-- is the same 
size and speed.

Also note that you only need to disable interrupts if you are changing 
both the high and the low bytes of the stack pointer.  If you know your 
stack will never be more than 256 bytes (which is very often the case), 
you can use the "-mtiny-stack" flag to tell avr-gcc that the SP_H 
register is unchanged by any stack frame allocation, and thus interrupts 
are not disabled.

There are two advantages of using Y as a data stack pointer rather than 
using the hardware stack.  One is that it is possible to use common 
routines to handle register save and restores rather than a sequence of 
push/pops in each function, which saves a bit of code space (at the cost 
of a little run-time).  Secondly, you don't have to set up a frame 
pointer to access the data, as Y is already available (the AVR can 
access data at [Y+index], but not [SP+index]).  However, this is a minor 
benefit - any function that needs a frame will be large enough that the 
few extra instructions needed are a small cost in time and space. 
Interrupts do need to be disabled (unless you use -mtiny-stack), but it 
is only for a couple of clock cycles.

But there are two disadvantages of using Y as a data stack pointer, 
rather than using a single stack.  One is that you have to think about 
where your two stacks are situated in memory, and how big they must be - 
it is hard to be safe without wasting data space (especially if you also 
use a heap).  The other is that if your code uses more than one pointer 
at a time, the compiler must generate code to save and restore Y (maybe 
also disabling interrupts in the process), or miss out on using it.  The 
AVR has only two good pointers - Y and Z, and a limited third pointer X. 
  Code that uses pointers to structs will see particular benefits of 
having Y available for general use.

All in all, you cannot make clear decisions as to which method is the 
"best".

<http://www.nongnu.org/avr-libc/user-manual/FAQ.html#faq_spman>

Reply by David Brown ●September 26, 20092009-09-26

Ulf Samuelsson wrote:
> David Brown skrev:
>> Ulf Samuelsson wrote:
>>> The GNU toolchain can be OK, and it can be horrible.
>>> If you look at ST's home page you will find some discussion
>>> about performance of GCC-4.2.1 on the STM32.
>>>
>>
>> Could you provide a link to this?  I could not see any such discussion.
>>
>> I note that gcc-4.2.1 was the CodeSourcery release two years ago, when 
>> Thumb-2 support was very new in gcc.  And if the gcc-4.2.1 in question 
>> was not from CodeSourcery but based on the official FSF tree, then I 
>> don't think it had Thumb-2 at all.  It is very important with gcc to 
>> be precise about the source and versions - particularly so since 
>> CodeSourcery (who maintain the ARM ports amongst others) have 
>> target-specific features long before they become part of the official 
>> FSF tree.
>>
>>> The rumoured 90 MIPS becomes:
>>>
>>> wait for it...
>>>
>>> 32 MIPS...
>>>
>>> With a Keil compiler you can reach about 60-65 MIPS at least with
>>> a 72 MHz Cortex-M3.
>>>
>>> Anyone seen improvement in later gcc versions?
>>>
>>
>> I would be very surprised to see any major ARM compiler generating 
>> code at twice the speed of another major ARM compiler, whether we are 
>> talking gcc or commercial compilers.  To me, this indicates either 
>> something odd about the benchmark code, something wrong in the use of 
>> the tools (such as compiler flags or libraries), or something wrong in 
>> the setup of the device in question (maybe failing to set clock speeds 
>> or wait states correctly).
>>
>> If there was consistently such a big difference, I would not expect 
>> gcc-based development tools to feature so prominently on websites such 
>> as ST's or TI (Luminary Micros) - a compiler as bad as you suggest 
>> here would put the devices themselves in a very bad light.
>>
>> I haven't used the ST32 devices, but I am considering TI's Cortex-M3 
>> for a project, so I interested in the state of development tools for 
>> the same core.
>>
>>> ...
>>> On the AVR I noted things like pushing ALL registers
>>> when entering an interrupt.
>>
>> avr-gcc does /not/ push all registers when entering an interrupt.  It 
>> does little for the credibility of your other points when you make 
>> such widely inaccurate claims.
> 
> In the case I investigated for a customer
> (which was more than one year ago)
> the interrupt routines took a lot longer time to execute,
> and this causes a lot of grievance.
> 

I don't remember if avr-gcc ever pushed all registers when entering an 
interrupt, but if so it was much more than a year ago (I have used it 
for over 6 years).

I have no problem believing that an interrupt routine took significantly 
longer to execute with avr-gcc than with IAR - my issue is only with 
your reasoning, particularly since you emphasised that "ALL registers" 
were pushed.

Without knowing anything about the customer, the code, the compiler 
versions, or compiler switches used, I would hazard a guess that the 
interrupt function called an external function in another module (or 
perhaps in a library).  My guess is that IAR did full-program 
optimisation, and pushed the called code into the interrupt handler and 
thus avoided saving all the ABI volatile registers since it new exactly 
what the called code would need.  Full-program optimisation (using 
--combine and -fwhole-program flags) is relatively new to avr-gcc, and 
not yet well known - it is very unlikely that it was used in your 
comparison.  Of course, developers who understand how their tools work 
and how their target processor works would normally avoid making an 
external function call from an interrupt routine in the first place.

It is fair to say that the ability to choose compiler options like 
full-program optimisation through simple dialog boxes is an advantage of 
IAR over avr-gcc - getting the absolute best out of avr-gcc requires 
more thought, research and experimenting than it does with a tool like IAR.

> 
>>
>> avr-gcc always pushes three registers in interrupts - SREG, and its 
>> "zero" register and "tmp" register because some code sequences 
>> generated by avr-gcc make assumptions about being able to use these 
>> registers. Theoretically, these could be omitted in some cases, but it 
>> turns out to be a difficult to do in avr-gcc, and the advantages are 
>> small (for non-trivial interrupt functions).  No one claims that 
>> avr-gcc is perfect, merely that it is very good.
> 
> 
> 
>>
>> Beyond that, avr-gcc pushes registers if they are needed - pretty much 
>> like any other compiler I have used.  If your interrupt function calls 
>> an external function, and you are not using whole-program 
>> optimisation, then this means pushing all ABI "volatile" registers - 
>> an additional 12 registers.  Again, this is the same as for any other 
>> compiler I have seen.  And as with any other compiler, you avoid the 
>> overhead by keeping your interrupt functions small and avoiding 
>> external function calls, or by using whole-program optimisations.
>>
>>> The IAR is simply - better - .
>>>
>>
>> I'll not argue with you about IAR producing somewhat smaller or faster 
>> code than avr-gcc.  I have only very limited experience with IAR, so I 
>> can't judge properly.  But then, you apparently have very little 
>> experience with avr-gcc - 
> 
> I don't disagree with that.
> I have both, but I quickly scurry back to the IAR compiler
> if I need to show off the AVR.
> 

You have colleagues at Atmel who put a great deal of time and effort 
into avr-gcc.  You might want to talk to them about how to get the best 
out of avr-gcc - that way you can offer your customers a wider choice. 
Different tools are better for different users and different projects - 
your aim is that customers have the best tools for their use, and know 
how to get the best from those tools, so that they will get the best out 
of your devices.

On the other hand, I fully understand that no one has the time to learn 
about all the tools available, and you have to concentrate on particular 
choices.  It's fair enough to tell people how wonderful IAR and the AVR 
go together - but it is not fair enough to tell people that avr-gcc is a 
poor choice without better technical justification.

> 
>  > few people have really studied and compared
>> both compilers in a fair and objective test.  There is certainly room 
>> for improvement in avr-gcc - there are people working on it, and it 
>> gets better over time.
>>
>> But to say "IAR is simply better" is too sweeping a statement to be 
>> taken seriously, since "better" means so many different things to 
>> different people.
> 
> OK, let me rephrase: It generally outputs smaller and faster code.
> 

That is much better - although some day I'd like to hear numbers based 
on real code examples, generated by someone familiar with both tools.  I 
guess some day I'll need to test out IAR's compiler for myself.  But 
this is certainly an opinion I've heard enough to make it believable.

If you have any links that can actually show numbers, I'd appreciate 
looking at them.  The only independent comparison I have found is from 
the www.freertos.org page, and that's badly out of date (the avr-gcc is 
from 2003, I don't know about IAR).  There is no size comparison, but 
avr-gcc beats IAR on most of the speed tests...


>>
>>> The gcc compiler can be OK, as shown with the AVR32 gnu compiler.
>>>
>>
>> To go back to your original statement, "The GNU toolchain can be OK, 
>> and it can be horrible", I agree in general - although I'd rate the 
>> range a bit higher (from "very good" down to "pretty bad", perhaps).  
>> There have been gcc ports in the past that could rate as "horrible", 
>> but I don't think that applies to any modern gcc port in serious 
>> active use.
>>
>>>
> BR
> Ulf Samuelsson
>

Reply by Rocky ●September 26, 20092009-09-26

On Sep 26, 2:30=A0pm, David Brown <da...@westcontrol.removethisbit.com>
wrote:

>Snip interesting stuff<

> I don't remember if avr-gcc ever pushed all registers when entering an
> interrupt, but if so it was much more than a year ago (I have used it
> for over 6 years).

Must be a lot of code in that interrupt!

Reply by ChrisQ ●September 28, 20092009-09-28

Niklas Holsti wrote:
> A small addition to my own posting, sorry for omitting it initially:
> 
> Niklas Holsti wrote:
> (I elide most of the context):
> 
>> However, sometimes the IAR compiler generates code that adds or 
>> subtracts a larger number (> 1) to/from Y, and then it must use two 
>> 8-bit operations, and must disable interrupts just as gcc does.
> 
> Some AVR models do provide instructions (ADIW, SBIW) that can atomically 
> add/subtract an immediate number (0..63) to/from the 16-bit Y register. 
> I assume, but haven't checked, that IAR uses these instructions when 
> possible, rather than two 8-bit operations in an interrupt-disabled region.
>

A very good explanation and thanks. It's the intricacies of architecture 
that is sometimes hard to get a big picture of when choosing a processor 
for a project. I've never used avr for any project and info like this 
would tend to keep me in the 8051 world for small logic replacement 
tasks, no matter how constrained it is. AVR32 looks much better though.

In summary then, it looks like the 8 bit avr's need special compiler 
support to get best results, which I wouldn't necessarily expect gcc to 
provide. I'm quite happy to accept that IAR would produce better code, 
in much the same way as Keil is arguably the best solution for 8051. 
Both are 8 bit legacy architectures, designed before the days of general 
hll development. I think if I were trying to find a low end micro now, 
msp430 would be the first point of call, as it is a much more compiler 
friendly 16 bit architecture. Stuff like this does matter as it can have 
a significant impact on software development timescales and quality...

Regards,

Chris

Reply by ChrisQ ●September 28, 20092009-09-28

David Brown wrote:

>  Code that uses pointers to structs will see particular benefits of 
> having Y available for general use.
> 

More good info - I suspect avr is a far better architecture than 8051. 
Memories of legacy 8051 hw platforms, multiple code banks, not enough 
common area and hard work trying to ensure that all the correct data 
appeared in the selected bank at the right time suggests that there must 
be a better way. The impact on development timescales can be significant 
and outweighs any device cost advantage for small to medium volume products.

As you suggest, more than 2 arguments and the best way is to package up 
into a structure and pass a pointer to it. Such a structure can also aid 
encapsulation as common variables can also be declared within it. Object 
oriented methods for 8 bit micros indeed :-)...

Regards,

Chris

Reply by David Brown ●September 28, 20092009-09-28

ChrisQ wrote:
> David Brown wrote:
> 
>>  Code that uses pointers to structs will see particular benefits of 
>> having Y available for general use.
>>
> 
> More good info - I suspect avr is a far better architecture than 8051. 
> Memories of legacy 8051 hw platforms, multiple code banks, not enough 
> common area and hard work trying to ensure that all the correct data 
> appeared in the selected bank at the right time suggests that there must 
> be a better way. The impact on development timescales can be significant 
> and outweighs any device cost advantage for small to medium volume 
> products.
> 
> As you suggest, more than 2 arguments and the best way is to package up 
> into a structure and pass a pointer to it. Such a structure can also aid 
> encapsulation as common variables can also be declared within it. Object 
> oriented methods for 8 bit micros indeed :-)...
> 

No, no - I did not suggest packing function call arguments in a struct! 
  How did you manage to read that from my post?  You only need to use 
such tricks for braindead architectures like the 8051, where you have a 
hopeless stack and almost no registers, and thus need to pass data via 
globals or extra structs.  (A good compiler will hide these messy 
implementation details from you, and do a better job that using these 
tricks manually.)

The AVR has plenty of registers - you pass arguments in these registers 
by using normal C function calls.  If you have so many parameters (or 
such large parameters) that passing by stack is needed, the compiler 
handles that fine - there is a minor overhead, but any code that needs 
it will already be large.

What I said about pointers to structs is that the AVR has two pointer 
registers that work well with structs - Y and Z (since there are Y+index 
and Z+index addressing modes).  If your compiler dedicates Y to a data 
stack pointer, it's going to be inefficient at code that could otherwise 
take advantage of two pointer-to-struct registers.

Reply by David Brown ●September 28, 20092009-09-28

ChrisQ wrote:
> Niklas Holsti wrote:
>> A small addition to my own posting, sorry for omitting it initially:
>>
>> Niklas Holsti wrote:
>> (I elide most of the context):
>>
>>> However, sometimes the IAR compiler generates code that adds or 
>>> subtracts a larger number (> 1) to/from Y, and then it must use two 
>>> 8-bit operations, and must disable interrupts just as gcc does.
>>
>> Some AVR models do provide instructions (ADIW, SBIW) that can 
>> atomically add/subtract an immediate number (0..63) to/from the 16-bit 
>> Y register. I assume, but haven't checked, that IAR uses these 
>> instructions when possible, rather than two 8-bit operations in an 
>> interrupt-disabled region.
>>
> 
> A very good explanation and thanks. It's the intricacies of architecture 
> that is sometimes hard to get a big picture of when choosing a processor 
> for a project. I've never used avr for any project and info like this 
> would tend to keep me in the 8051 world for small logic replacement 
> tasks, no matter how constrained it is. AVR32 looks much better though.
> 

I'm guessing you wrote this before reading my other post?  Remember, 
these are details that are hidden by the compiler, and the AVR will have 
executed the necessary pushes, stack pointer manipulation, interrupt 
disable and whatever before the average 8051 device has managed to push 
the A register onto the stack.  The discussion is about whether gcc's 
stack arrangement or IAR's stack arrangement is best for producing 
optimal interrupt code on the AVR - no one would seriously compare it to 
the 8051.

The AVR32 is a different beast entirely.  It shares the same developer 
(Atmel), and some tools, but other than that it is a totally different 
processor.

> In summary then, it looks like the 8 bit avr's need special compiler 
> support to get best results, which I wouldn't necessarily expect gcc to 

The AVR needs an AVR compiler - just like any other cpu needs its own 
compiler.  It doesn't need any "special" support or tricks here - every 
target has it's own way of handling function prologues and epilogues.

gcc is best suited to RISC-type architectures with plenty of registers 
and an orthogonal instruction set.  The AVR comes fairly close to that, 
but with two big exceptions - it is 8-bit (most gcc targets are 32-bit), 
and it has a separate memory space for flash.  avr-gcc does a good job 
in working around these "non-standard" features, but is occasionally 
sub-optimal in that regard.

This is hugely different from cores like the 8051 or the COP8, which 
need much more specialised compilers to generate good code.

> provide. I'm quite happy to accept that IAR would produce better code, 
> in much the same way as Keil is arguably the best solution for 8051. 

It's a different world entirely.  IAR produces better code than gcc (at 
least, according to popular opinion - I have not yet compared it myself, 
or seen any independent comparisons) because they have more resources to 
use in the development of their compiler, and their compiler 
architecture is probably also more suited to optimising 8-bit code. 
They have also been working with the AVR developers since before the 
core was fully specified.

Not to belittle the work of either the avr-gcc or IAR development teams, 
but writing a solid AVR compiler that produces small and fast code is a 
fraction of the work needed to make a close-to-optimal 8051 compiler. 
And if you've got a working multi-target compiler to start with (as both 
avr-gcc and IAR had), then porting it to the AVR is a practical task. 
For the 8051, you have to start almost from scratch.

> Both are 8 bit legacy architectures, designed before the days of general 
> hll development. I think if I were trying to find a low end micro now, 

I think you should read a little about the AVR before making such 
ignorant and incorrect statements.  The AVR was specifically designed as 
a small and low power core that worked well with C - it was developed in 
cooperation with IAR.  The 8051 is legacy, even though there are modern 
implementations.  But the AVR, while not perfect, is about as close to 
modern cpu design as you get in 8 bits.

> msp430 would be the first point of call, as it is a much more compiler 
> friendly 16 bit architecture. Stuff like this does matter as it can have 
> a significant impact on software development timescales and quality...
> 

The msp430 is certainly very compiler friendly - even more so than the 
AVR (16-bit registers, plenty of flexible pointers, and a single address 
space).  But they too have their "special issues".  For example, the 
multiplier is implemented as a peripheral and the state of the 
multiplier cannot be properly saved by an interrupt.  Thus either 
interrupts must avoid using the multiplier, or main code must disable 
interrupts when using the multiplier.  /Every/ cpu core has it issues.

And the newer msp430 cores with their 20-bit registers totally buggers 
up their C compiler friendliness.

Reply by Niklas Holsti ●September 28, 20092009-09-28

David Brown wrote:

> The AVR has plenty of registers - you pass arguments in these registers 
> by using normal C function calls.  If you have so many parameters (or 
> such large parameters) that passing by stack is needed, the compiler 
> handles that fine - there is a minor overhead, but any code that needs 
> it will already be large.
> 
> What I said about pointers to structs is that the AVR has two pointer 
> registers that work well with structs - Y and Z (since there are Y+index 
> and Z+index addressing modes).  If your compiler dedicates Y to a data 
> stack pointer, it's going to be inefficient at code that could otherwise 
> take advantage of two pointer-to-struct registers.

... which is a drawback of the two-stack solution. On the other hand, 
since the AVR provides no SP-relative addressing, single-stack code must 
often use one of the Y or Z pointers as a frame pointer, and there we 
are again.

Although the AVR is register-rich, it is "pointer-poor". Some other 
architectures, such as the H8/300, have more flexible interplay of 8-bit 
and 16-bit computations.

-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .

Reply by ChrisQ ●September 28, 20092009-09-28

David Brown wrote:

> 
> No, no - I did not suggest packing function call arguments in a struct! 
>  How did you manage to read that from my post?  You only need to use 
> such tricks for braindead architectures like the 8051, where you have a 
> hopeless stack and almost no registers, and thus need to pass data via 
> globals or extra structs.  (A good compiler will hide these messy 
> implementation details from you, and do a better job that using these 
> tricks manually.)

I think we were looking from opposite sides. I use structure pointers 
into functions a lot as one string in the bow of getting some object 
oriented functionality without the overhead of C++. Also, it's usefull 
for sharing variables and resticting global scope to subsystem as you 
can pass a single pointer down through several code layers. The only 
truly global vars are const data. It also simplifies maintenance and 
additional functionality. Some think such ideas are crap, but I find it 
very very usefull and it's generally very fast and code efficient as well...

Regards,

Chris