EmbeddedRelated.com
Forums

Developing/compiling software

Started by Lodewicus Maas September 16, 2009
David Brown skrev:
> Ulf Samuelsson wrote: >> The GNU toolchain can be OK, and it can be horrible. >> If you look at ST's home page you will find some discussion >> about performance of GCC-4.2.1 on the STM32. >> > > Could you provide a link to this? I could not see any such discussion. > > I note that gcc-4.2.1 was the CodeSourcery release two years ago, when > Thumb-2 support was very new in gcc. And if the gcc-4.2.1 in question > was not from CodeSourcery but based on the official FSF tree, then I > don't think it had Thumb-2 at all. It is very important with gcc to be > precise about the source and versions - particularly so since > CodeSourcery (who maintain the ARM ports amongst others) have > target-specific features long before they become part of the official > FSF tree. > >> The rumoured 90 MIPS becomes: >> >> wait for it... >> >> 32 MIPS... >> >> With a Keil compiler you can reach about 60-65 MIPS at least with >> a 72 MHz Cortex-M3. >> >> Anyone seen improvement in later gcc versions? >> > > I would be very surprised to see any major ARM compiler generating code > at twice the speed of another major ARM compiler, whether we are talking > gcc or commercial compilers. To me, this indicates either something odd > about the benchmark code, something wrong in the use of the tools (such > as compiler flags or libraries), or something wrong in the setup of the > device in question (maybe failing to set clock speeds or wait states > correctly). > > If there was consistently such a big difference, I would not expect > gcc-based development tools to feature so prominently on websites such > as ST's or TI (Luminary Micros) - a compiler as bad as you suggest here > would put the devices themselves in a very bad light. > > I haven't used the ST32 devices, but I am considering TI's Cortex-M3 for > a project, so I interested in the state of development tools for the > same core. > >> ... >> On the AVR I noted things like pushing ALL registers >> when entering an interrupt. > > avr-gcc does /not/ push all registers when entering an interrupt. It > does little for the credibility of your other points when you make such > widely inaccurate claims.
In the case I investigated for a customer (which was more than one year ago) the interrupt routines took a lot longer time to execute, and this causes a lot of grievance.
> > avr-gcc always pushes three registers in interrupts - SREG, and its > "zero" register and "tmp" register because some code sequences generated > by avr-gcc make assumptions about being able to use these registers. > Theoretically, these could be omitted in some cases, but it turns out to > be a difficult to do in avr-gcc, and the advantages are small (for > non-trivial interrupt functions). No one claims that avr-gcc is > perfect, merely that it is very good.
> > Beyond that, avr-gcc pushes registers if they are needed - pretty much > like any other compiler I have used. If your interrupt function calls > an external function, and you are not using whole-program optimisation, > then this means pushing all ABI "volatile" registers - an additional 12 > registers. Again, this is the same as for any other compiler I have > seen. And as with any other compiler, you avoid the overhead by keeping > your interrupt functions small and avoiding external function calls, or > by using whole-program optimisations. > >> The IAR is simply - better - . >> > > I'll not argue with you about IAR producing somewhat smaller or faster > code than avr-gcc. I have only very limited experience with IAR, so I > can't judge properly. But then, you apparently have very little > experience with avr-gcc -
I don't disagree with that. I have both, but I quickly scurry back to the IAR compiler if I need to show off the AVR. > few people have really studied and compared
> both compilers in a fair and objective test. There is certainly room > for improvement in avr-gcc - there are people working on it, and it gets > better over time. > > But to say "IAR is simply better" is too sweeping a statement to be > taken seriously, since "better" means so many different things to > different people.
OK, let me rephrase: It generally outputs smaller and faster code.
> >> The gcc compiler can be OK, as shown with the AVR32 gnu compiler. >> > > To go back to your original statement, "The GNU toolchain can be OK, and > it can be horrible", I agree in general - although I'd rate the range a > bit higher (from "very good" down to "pretty bad", perhaps). There have > been gcc ports in the past that could rate as "horrible", but I don't > think that applies to any modern gcc port in serious active use. > >>
BR Ulf Samuelsson
Niklas Holsti wrote:
> FreeRTOS info wrote: >> >> "ChrisQ" <meru@devnull.com> wrote in message >> news:sK4vm.199649$AC5.36013@newsfe06.ams2... >>> FreeRTOS info wrote: >>> >>>> >>>> GCC and IAR compilers do very different things on the AVR - the >>>> biggest difference being that IAR use two stacks whereas GCC uses >>>> one. This makes IAR more difficult to setup and tune, and GCC >>>> slower and clunkier because it has to disable interrupts for a few >>>> instructions on every function call. Normally this is not a problem, >>>> but it is not as elegant as the two stack solution for sure. GCC is >>>> very popular on the AVR though, and is good enough for most >>>> applications, especially used in combination with the other free AVR >>>> tools such as AVRStudio. >>>> >>> >>> Can you elaborate a bit as to why 2 stacks are used with IAR ?. >>> Haven't user avr, so have no real experience. The AVR 32 has shadow >>> register sets, including stacks for each processor and exception >>> mode. Thus, separate initialisation on startup, but so do Renasas >>> 80C87 and some arm machines. How does gcc work for arm, for example ?. >> >> >> I have not gone back to check, but from memory (might not be >> completely accurate) the AVR uses two 8 bit registers to implement a >> 16 bit stack pointer. When entering/exiting a function the stack >> pointer has to potentially be updated as two separate operations, and >> you don't want the update to be split by an interrupt occuring half >> way through. > > Adding a bit to Richard's reply: The AVR call and return instructions > update the 16-bit "hardware" stack pointer (to push and pop the return > address) but they do so atomically, so they don't need interrupt > disabling. But gcc uses the "hardware" stack also for data, and must > then update the stack pointer as two 8-bit parts, which needs interrupt > disabling as Richard describes above. > > The IAR compiler uses the AVR Y register (a pair of 8-bit registers > making up a 16-bit number) as the stack pointer for the second, > compiler-defined "software" stack. IAR still uses the hardware stack for > return addresses, so it still uses the normal call and return > instructions (usually), but it puts all stack-allocated data on the > software stack accessed via the Y register. The AVR provides > instructions that can increment or decrement the Y register atomically, > as a 16-bit entity, and the IAR compiler's function prologues/epilogues > often use these instructions. However, sometimes the IAR compiler > generates code that adds or subtracts a larger number (> 1) to/from Y, > and then it must use two 8-bit operations, and must disable interrupts > just as gcc does. > > Conclusion: the frequency of interrupt disabling is probably less in > IAR-generated code than in gcc-generated code, but the impact in terms > of an increased worst-case interrupt response latency is the same. >
One point to remember here is that this only applies to functions that need to allocate a stack frame for data on the stack. The AVR has a fair number of registers, so that a great many functions do not require data to be allocated on the stack, and thus don't need such a stack frame. I had a quick "grep" through a medium-sized project (20K code) for which I happened to have listing files - there were only two functions in the entire project that had a stack frame. For the great majority of the time, it is sufficient to save and restore registers using push and pop. For AVR compilers that use a separate data stack (I am familiar with ImageCraft rather than IAR, but the technique is the same), saving and restoring on the data stack via Y++/Y-- is the same size and speed. Also note that you only need to disable interrupts if you are changing both the high and the low bytes of the stack pointer. If you know your stack will never be more than 256 bytes (which is very often the case), you can use the "-mtiny-stack" flag to tell avr-gcc that the SP_H register is unchanged by any stack frame allocation, and thus interrupts are not disabled. There are two advantages of using Y as a data stack pointer rather than using the hardware stack. One is that it is possible to use common routines to handle register save and restores rather than a sequence of push/pops in each function, which saves a bit of code space (at the cost of a little run-time). Secondly, you don't have to set up a frame pointer to access the data, as Y is already available (the AVR can access data at [Y+index], but not [SP+index]). However, this is a minor benefit - any function that needs a frame will be large enough that the few extra instructions needed are a small cost in time and space. Interrupts do need to be disabled (unless you use -mtiny-stack), but it is only for a couple of clock cycles. But there are two disadvantages of using Y as a data stack pointer, rather than using a single stack. One is that you have to think about where your two stacks are situated in memory, and how big they must be - it is hard to be safe without wasting data space (especially if you also use a heap). The other is that if your code uses more than one pointer at a time, the compiler must generate code to save and restore Y (maybe also disabling interrupts in the process), or miss out on using it. The AVR has only two good pointers - Y and Z, and a limited third pointer X. Code that uses pointers to structs will see particular benefits of having Y available for general use. All in all, you cannot make clear decisions as to which method is the "best". <http://www.nongnu.org/avr-libc/user-manual/FAQ.html#faq_spman>
Ulf Samuelsson wrote:
> David Brown skrev: >> Ulf Samuelsson wrote: >>> The GNU toolchain can be OK, and it can be horrible. >>> If you look at ST's home page you will find some discussion >>> about performance of GCC-4.2.1 on the STM32. >>> >> >> Could you provide a link to this? I could not see any such discussion. >> >> I note that gcc-4.2.1 was the CodeSourcery release two years ago, when >> Thumb-2 support was very new in gcc. And if the gcc-4.2.1 in question >> was not from CodeSourcery but based on the official FSF tree, then I >> don't think it had Thumb-2 at all. It is very important with gcc to >> be precise about the source and versions - particularly so since >> CodeSourcery (who maintain the ARM ports amongst others) have >> target-specific features long before they become part of the official >> FSF tree. >> >>> The rumoured 90 MIPS becomes: >>> >>> wait for it... >>> >>> 32 MIPS... >>> >>> With a Keil compiler you can reach about 60-65 MIPS at least with >>> a 72 MHz Cortex-M3. >>> >>> Anyone seen improvement in later gcc versions? >>> >> >> I would be very surprised to see any major ARM compiler generating >> code at twice the speed of another major ARM compiler, whether we are >> talking gcc or commercial compilers. To me, this indicates either >> something odd about the benchmark code, something wrong in the use of >> the tools (such as compiler flags or libraries), or something wrong in >> the setup of the device in question (maybe failing to set clock speeds >> or wait states correctly). >> >> If there was consistently such a big difference, I would not expect >> gcc-based development tools to feature so prominently on websites such >> as ST's or TI (Luminary Micros) - a compiler as bad as you suggest >> here would put the devices themselves in a very bad light. >> >> I haven't used the ST32 devices, but I am considering TI's Cortex-M3 >> for a project, so I interested in the state of development tools for >> the same core. >> >>> ... >>> On the AVR I noted things like pushing ALL registers >>> when entering an interrupt. >> >> avr-gcc does /not/ push all registers when entering an interrupt. It >> does little for the credibility of your other points when you make >> such widely inaccurate claims. > > In the case I investigated for a customer > (which was more than one year ago) > the interrupt routines took a lot longer time to execute, > and this causes a lot of grievance. >
I don't remember if avr-gcc ever pushed all registers when entering an interrupt, but if so it was much more than a year ago (I have used it for over 6 years). I have no problem believing that an interrupt routine took significantly longer to execute with avr-gcc than with IAR - my issue is only with your reasoning, particularly since you emphasised that "ALL registers" were pushed. Without knowing anything about the customer, the code, the compiler versions, or compiler switches used, I would hazard a guess that the interrupt function called an external function in another module (or perhaps in a library). My guess is that IAR did full-program optimisation, and pushed the called code into the interrupt handler and thus avoided saving all the ABI volatile registers since it new exactly what the called code would need. Full-program optimisation (using --combine and -fwhole-program flags) is relatively new to avr-gcc, and not yet well known - it is very unlikely that it was used in your comparison. Of course, developers who understand how their tools work and how their target processor works would normally avoid making an external function call from an interrupt routine in the first place. It is fair to say that the ability to choose compiler options like full-program optimisation through simple dialog boxes is an advantage of IAR over avr-gcc - getting the absolute best out of avr-gcc requires more thought, research and experimenting than it does with a tool like IAR.
> >> >> avr-gcc always pushes three registers in interrupts - SREG, and its >> "zero" register and "tmp" register because some code sequences >> generated by avr-gcc make assumptions about being able to use these >> registers. Theoretically, these could be omitted in some cases, but it >> turns out to be a difficult to do in avr-gcc, and the advantages are >> small (for non-trivial interrupt functions). No one claims that >> avr-gcc is perfect, merely that it is very good. > > > >> >> Beyond that, avr-gcc pushes registers if they are needed - pretty much >> like any other compiler I have used. If your interrupt function calls >> an external function, and you are not using whole-program >> optimisation, then this means pushing all ABI "volatile" registers - >> an additional 12 registers. Again, this is the same as for any other >> compiler I have seen. And as with any other compiler, you avoid the >> overhead by keeping your interrupt functions small and avoiding >> external function calls, or by using whole-program optimisations. >> >>> The IAR is simply - better - . >>> >> >> I'll not argue with you about IAR producing somewhat smaller or faster >> code than avr-gcc. I have only very limited experience with IAR, so I >> can't judge properly. But then, you apparently have very little >> experience with avr-gcc - > > I don't disagree with that. > I have both, but I quickly scurry back to the IAR compiler > if I need to show off the AVR. >
You have colleagues at Atmel who put a great deal of time and effort into avr-gcc. You might want to talk to them about how to get the best out of avr-gcc - that way you can offer your customers a wider choice. Different tools are better for different users and different projects - your aim is that customers have the best tools for their use, and know how to get the best from those tools, so that they will get the best out of your devices. On the other hand, I fully understand that no one has the time to learn about all the tools available, and you have to concentrate on particular choices. It's fair enough to tell people how wonderful IAR and the AVR go together - but it is not fair enough to tell people that avr-gcc is a poor choice without better technical justification.
> > > few people have really studied and compared >> both compilers in a fair and objective test. There is certainly room >> for improvement in avr-gcc - there are people working on it, and it >> gets better over time. >> >> But to say "IAR is simply better" is too sweeping a statement to be >> taken seriously, since "better" means so many different things to >> different people. > > OK, let me rephrase: It generally outputs smaller and faster code. >
That is much better - although some day I'd like to hear numbers based on real code examples, generated by someone familiar with both tools. I guess some day I'll need to test out IAR's compiler for myself. But this is certainly an opinion I've heard enough to make it believable. If you have any links that can actually show numbers, I'd appreciate looking at them. The only independent comparison I have found is from the www.freertos.org page, and that's badly out of date (the avr-gcc is from 2003, I don't know about IAR). There is no size comparison, but avr-gcc beats IAR on most of the speed tests...
>> >>> The gcc compiler can be OK, as shown with the AVR32 gnu compiler. >>> >> >> To go back to your original statement, "The GNU toolchain can be OK, >> and it can be horrible", I agree in general - although I'd rate the >> range a bit higher (from "very good" down to "pretty bad", perhaps). >> There have been gcc ports in the past that could rate as "horrible", >> but I don't think that applies to any modern gcc port in serious >> active use. >> >>> > BR > Ulf Samuelsson >
On Sep 26, 2:30=A0pm, David Brown <da...@westcontrol.removethisbit.com>
wrote:

>Snip interesting stuff<
> I don't remember if avr-gcc ever pushed all registers when entering an > interrupt, but if so it was much more than a year ago (I have used it > for over 6 years).
Must be a lot of code in that interrupt!
Niklas Holsti wrote:
> A small addition to my own posting, sorry for omitting it initially: > > Niklas Holsti wrote: > (I elide most of the context): > >> However, sometimes the IAR compiler generates code that adds or >> subtracts a larger number (> 1) to/from Y, and then it must use two >> 8-bit operations, and must disable interrupts just as gcc does. > > Some AVR models do provide instructions (ADIW, SBIW) that can atomically > add/subtract an immediate number (0..63) to/from the 16-bit Y register. > I assume, but haven't checked, that IAR uses these instructions when > possible, rather than two 8-bit operations in an interrupt-disabled region. >
A very good explanation and thanks. It's the intricacies of architecture that is sometimes hard to get a big picture of when choosing a processor for a project. I've never used avr for any project and info like this would tend to keep me in the 8051 world for small logic replacement tasks, no matter how constrained it is. AVR32 looks much better though. In summary then, it looks like the 8 bit avr's need special compiler support to get best results, which I wouldn't necessarily expect gcc to provide. I'm quite happy to accept that IAR would produce better code, in much the same way as Keil is arguably the best solution for 8051. Both are 8 bit legacy architectures, designed before the days of general hll development. I think if I were trying to find a low end micro now, msp430 would be the first point of call, as it is a much more compiler friendly 16 bit architecture. Stuff like this does matter as it can have a significant impact on software development timescales and quality... Regards, Chris
David Brown wrote:

> Code that uses pointers to structs will see particular benefits of > having Y available for general use. >
More good info - I suspect avr is a far better architecture than 8051. Memories of legacy 8051 hw platforms, multiple code banks, not enough common area and hard work trying to ensure that all the correct data appeared in the selected bank at the right time suggests that there must be a better way. The impact on development timescales can be significant and outweighs any device cost advantage for small to medium volume products. As you suggest, more than 2 arguments and the best way is to package up into a structure and pass a pointer to it. Such a structure can also aid encapsulation as common variables can also be declared within it. Object oriented methods for 8 bit micros indeed :-)... Regards, Chris
ChrisQ wrote:
> David Brown wrote: > >> Code that uses pointers to structs will see particular benefits of >> having Y available for general use. >> > > More good info - I suspect avr is a far better architecture than 8051. > Memories of legacy 8051 hw platforms, multiple code banks, not enough > common area and hard work trying to ensure that all the correct data > appeared in the selected bank at the right time suggests that there must > be a better way. The impact on development timescales can be significant > and outweighs any device cost advantage for small to medium volume > products. > > As you suggest, more than 2 arguments and the best way is to package up > into a structure and pass a pointer to it. Such a structure can also aid > encapsulation as common variables can also be declared within it. Object > oriented methods for 8 bit micros indeed :-)... >
No, no - I did not suggest packing function call arguments in a struct! How did you manage to read that from my post? You only need to use such tricks for braindead architectures like the 8051, where you have a hopeless stack and almost no registers, and thus need to pass data via globals or extra structs. (A good compiler will hide these messy implementation details from you, and do a better job that using these tricks manually.) The AVR has plenty of registers - you pass arguments in these registers by using normal C function calls. If you have so many parameters (or such large parameters) that passing by stack is needed, the compiler handles that fine - there is a minor overhead, but any code that needs it will already be large. What I said about pointers to structs is that the AVR has two pointer registers that work well with structs - Y and Z (since there are Y+index and Z+index addressing modes). If your compiler dedicates Y to a data stack pointer, it's going to be inefficient at code that could otherwise take advantage of two pointer-to-struct registers.
ChrisQ wrote:
> Niklas Holsti wrote: >> A small addition to my own posting, sorry for omitting it initially: >> >> Niklas Holsti wrote: >> (I elide most of the context): >> >>> However, sometimes the IAR compiler generates code that adds or >>> subtracts a larger number (> 1) to/from Y, and then it must use two >>> 8-bit operations, and must disable interrupts just as gcc does. >> >> Some AVR models do provide instructions (ADIW, SBIW) that can >> atomically add/subtract an immediate number (0..63) to/from the 16-bit >> Y register. I assume, but haven't checked, that IAR uses these >> instructions when possible, rather than two 8-bit operations in an >> interrupt-disabled region. >> > > A very good explanation and thanks. It's the intricacies of architecture > that is sometimes hard to get a big picture of when choosing a processor > for a project. I've never used avr for any project and info like this > would tend to keep me in the 8051 world for small logic replacement > tasks, no matter how constrained it is. AVR32 looks much better though. >
I'm guessing you wrote this before reading my other post? Remember, these are details that are hidden by the compiler, and the AVR will have executed the necessary pushes, stack pointer manipulation, interrupt disable and whatever before the average 8051 device has managed to push the A register onto the stack. The discussion is about whether gcc's stack arrangement or IAR's stack arrangement is best for producing optimal interrupt code on the AVR - no one would seriously compare it to the 8051. The AVR32 is a different beast entirely. It shares the same developer (Atmel), and some tools, but other than that it is a totally different processor.
> In summary then, it looks like the 8 bit avr's need special compiler > support to get best results, which I wouldn't necessarily expect gcc to
The AVR needs an AVR compiler - just like any other cpu needs its own compiler. It doesn't need any "special" support or tricks here - every target has it's own way of handling function prologues and epilogues. gcc is best suited to RISC-type architectures with plenty of registers and an orthogonal instruction set. The AVR comes fairly close to that, but with two big exceptions - it is 8-bit (most gcc targets are 32-bit), and it has a separate memory space for flash. avr-gcc does a good job in working around these "non-standard" features, but is occasionally sub-optimal in that regard. This is hugely different from cores like the 8051 or the COP8, which need much more specialised compilers to generate good code.
> provide. I'm quite happy to accept that IAR would produce better code, > in much the same way as Keil is arguably the best solution for 8051.
It's a different world entirely. IAR produces better code than gcc (at least, according to popular opinion - I have not yet compared it myself, or seen any independent comparisons) because they have more resources to use in the development of their compiler, and their compiler architecture is probably also more suited to optimising 8-bit code. They have also been working with the AVR developers since before the core was fully specified. Not to belittle the work of either the avr-gcc or IAR development teams, but writing a solid AVR compiler that produces small and fast code is a fraction of the work needed to make a close-to-optimal 8051 compiler. And if you've got a working multi-target compiler to start with (as both avr-gcc and IAR had), then porting it to the AVR is a practical task. For the 8051, you have to start almost from scratch.
> Both are 8 bit legacy architectures, designed before the days of general > hll development. I think if I were trying to find a low end micro now,
I think you should read a little about the AVR before making such ignorant and incorrect statements. The AVR was specifically designed as a small and low power core that worked well with C - it was developed in cooperation with IAR. The 8051 is legacy, even though there are modern implementations. But the AVR, while not perfect, is about as close to modern cpu design as you get in 8 bits.
> msp430 would be the first point of call, as it is a much more compiler > friendly 16 bit architecture. Stuff like this does matter as it can have > a significant impact on software development timescales and quality... >
The msp430 is certainly very compiler friendly - even more so than the AVR (16-bit registers, plenty of flexible pointers, and a single address space). But they too have their "special issues". For example, the multiplier is implemented as a peripheral and the state of the multiplier cannot be properly saved by an interrupt. Thus either interrupts must avoid using the multiplier, or main code must disable interrupts when using the multiplier. /Every/ cpu core has it issues. And the newer msp430 cores with their 20-bit registers totally buggers up their C compiler friendliness.
David Brown wrote:

> The AVR has plenty of registers - you pass arguments in these registers > by using normal C function calls. If you have so many parameters (or > such large parameters) that passing by stack is needed, the compiler > handles that fine - there is a minor overhead, but any code that needs > it will already be large. > > What I said about pointers to structs is that the AVR has two pointer > registers that work well with structs - Y and Z (since there are Y+index > and Z+index addressing modes). If your compiler dedicates Y to a data > stack pointer, it's going to be inefficient at code that could otherwise > take advantage of two pointer-to-struct registers.
... which is a drawback of the two-stack solution. On the other hand, since the AVR provides no SP-relative addressing, single-stack code must often use one of the Y or Z pointers as a frame pointer, and there we are again. Although the AVR is register-rich, it is "pointer-poor". Some other architectures, such as the H8/300, have more flexible interplay of 8-bit and 16-bit computations. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
David Brown wrote:

> > No, no - I did not suggest packing function call arguments in a struct! > How did you manage to read that from my post? You only need to use > such tricks for braindead architectures like the 8051, where you have a > hopeless stack and almost no registers, and thus need to pass data via > globals or extra structs. (A good compiler will hide these messy > implementation details from you, and do a better job that using these > tricks manually.)
I think we were looking from opposite sides. I use structure pointers into functions a lot as one string in the bow of getting some object oriented functionality without the overhead of C++. Also, it's usefull for sharing variables and resticting global scope to subsystem as you can pass a single pointer down through several code layers. The only truly global vars are const data. It also simplifies maintenance and additional functionality. Some think such ideas are crap, but I find it very very usefull and it's generally very fast and code efficient as well... Regards, Chris