Forums

STM32 ARM toolset advice?

Started by John Speth October 7, 2008
On Thu, 16 Oct 2008 00:33:02 -0500, Walter Banks
<walter@bytecraft.com> wrote:

> > >Anton Erasmus wrote: > >> We ported all our 68K Code from commercial compilers to GCC for 68k. >> This is the same GCC used for Coldfire. The code was significantly >> faster using GCC. >> > >Anton, > >Did you find out where the GCC was faster?
We ported 2 main types of applications. One consists mainly of fairly complex axis transformations and data filtering, while also handling low latency comms to a host. The main task is executed at 100Hz and the maximum, minimum and avarage execution time is calculated every interrupt cycle. If I remeber correctly the gcc code was about 30% faster. The axis transformations is mostly scaled integer with a little bit of floating point. The second app was a one which displayed moving 2D icons over a live video image. No graphics acceleration hardware was available. Everything is done in software in a frame buffer. Again most calculations were done in fixed poit, with a little bit of floating point. If I remember correctly, it was overall 20% faster, with some low level graphic primitive routines almost 80% faster. On the graphics routines, gcc did much better at register allocation.
> >Which compilers?
It was SDS and Microtec compilers. Both the compilers got more and more expensive over time. Initially their were big improvments in new versions. We stopped purchasing support when the new versions basically did nothing much over the previous versions, Microtec also started adding copy protection, which became a total pain to work with. Regards Anton Erasmus
In article <48F6D20E.8335FFB8@bytecraft.com>, walter@bytecraft.com 
says...
> > > Anton Erasmus wrote: > > > We ported all our 68K Code from commercial compilers to GCC for 68k. > > This is the same GCC used for Coldfire. The code was significantly > > faster using GCC. > > > > Anton, > > Did you find out where the GCC was faster? > > Which compilers? >
I've not used GCC for the M68K, but I have many years experience with Codewarrior 68K. I was able to speed up some loops by factors near two by using the DBRA (decrement and branch) instruction in assembly language rewrites of C code. This was generally only necessary in very tight loops for high speed data collection. The instruction set and architecture of the M68K made assembly language routines much simpler to write than is the case with the ARM. The other common problem with Codewarrior (and other compilers I've used) is that there seem to be a lot of redundant register loads from stack-based variables. This may be because I generally set optimization to the lowest level. That generally makes it easier to read the assembly language output and single-step through the code. Mark Borgerson
Mark Borgerson wrote:
> In article <48F6D20E.8335FFB8@bytecraft.com>, walter@bytecraft.com > says... >> >> Anton Erasmus wrote: >> >>> We ported all our 68K Code from commercial compilers to GCC for 68k. >>> This is the same GCC used for Coldfire. The code was significantly >>> faster using GCC. >>> >> Anton, >> >> Did you find out where the GCC was faster? >> >> Which compilers? >> > I've not used GCC for the M68K, but I have many years experience with > Codewarrior 68K. I was able to speed up some loops by factors near > two by using the DBRA (decrement and branch) instruction in assembly > language rewrites of C code. This was generally only necessary in > very tight loops for high speed data collection. The instruction > set and architecture of the M68K made assembly language routines > much simpler to write than is the case with the ARM. > > The other common problem with Codewarrior (and other compilers I've > used) is that there seem to be a lot of redundant register loads > from stack-based variables. This may be because I generally > set optimization to the lowest level. That generally makes it > easier to read the assembly language output and single-step through the > code. >
You set the compiler flags for low optimisation, and are surprised by getting sub-optimal code? When you need to read or single-step generated assembly, it's often best not to have too low optimisation (or too high) - all these redundant stack accesses make the code hard to follow.
On Oct 16, 10:56=A0pm, Mark Borgerson wrote:
> The other common problem with Codewarrior (and other compilers I've > used) is that there seem to be a lot of redundant register loads > from stack-based variables. =A0 This may be because I generally > set optimization to the lowest level. =A0That generally makes it > easier to read the assembly language output and single-step through the > code.
Why are you single stepping the machine instructions of the compiler output so much that this is an issue? Is your compiler unreliable? Paul
In article <d24a67aa-bc21-43c9-9d63-fd6b1cee0436
@y29g2000hsf.googlegroups.com>, lacuna@saturnine.org.uk says...
> On Oct 16, 10:56=A0pm, Mark Borgerson wrote: > > The other common problem with Codewarrior (and other compilers I've > > used) is that there seem to be a lot of redundant register loads > > from stack-based variables. =A0 This may be because I generally > > set optimization to the lowest level. =A0That generally makes it > > easier to read the assembly language output and single-step through the > > code. >=20 > Why are you single stepping the machine instructions of the compiler > output so much that this is an issue? Is your compiler unreliable? >=20
When I'm working on peripheral data transfers where I want=20 to transfer as quickly as possible, I quite often look at the generated assembly language. I never did find an optimization level for the M68K compiler where it used the DBRA instructions. Another reason that I keep the Codewarrior M68K compiler at a low optimization level is that it was recommended by the SBC vendor. This may have something to do with the fact that the compiler was really targeted for the PalmOS, but was being used with another vendor's libraries and hardware. There was a time when you could get Codewarrior for the PalmOS for about $400, while the standard Codewarrior M68K was over $2000. I generally don't step through the M68K code, as the SBC that I use doesn't have good debug facilities. I do sometimes step through MSP430 code using a JTAG debugger. The compiler that I use (Imagecraft) doesn't have a lot of optimization choices---but does have some redundant register loads. Sorry if I got the two different cases mixed up in the original post.=20 Mark Borgerson
In article <SKWdnS2drL8lt2XVnZ2dneKdnZydnZ2d@lyse.net>, 
david.brown@hesbynett.removethisbit.no says...
> Mark Borgerson wrote: > > In article <48F6D20E.8335FFB8@bytecraft.com>, walter@bytecraft.com > > says... > >> > >> Anton Erasmus wrote: > >> > >>> We ported all our 68K Code from commercial compilers to GCC for 68k. > >>> This is the same GCC used for Coldfire. The code was significantly > >>> faster using GCC. > >>> > >> Anton, > >> > >> Did you find out where the GCC was faster? > >> > >> Which compilers? > >> > > I've not used GCC for the M68K, but I have many years experience with > > Codewarrior 68K. I was able to speed up some loops by factors near > > two by using the DBRA (decrement and branch) instruction in assembly > > language rewrites of C code. This was generally only necessary in > > very tight loops for high speed data collection. The instruction > > set and architecture of the M68K made assembly language routines > > much simpler to write than is the case with the ARM. > > > > The other common problem with Codewarrior (and other compilers I've > > used) is that there seem to be a lot of redundant register loads > > from stack-based variables. This may be because I generally > > set optimization to the lowest level. That generally makes it > > easier to read the assembly language output and single-step through the > > code. > > > > You set the compiler flags for low optimisation, and are surprised by > getting sub-optimal code? > > When you need to read or single-step generated assembly, it's often best > not to have too low optimisation (or too high) - all these redundant > stack accesses make the code hard to follow. > >
I seem to recall a classic example from an early 8051 compiler: If you set optimization high and to minimize memory, it would overlay variables in the limited RAM space. That made reading the assembly language pretty confusing at times. Mark Borgerson

Mark Borgerson wrote:

> I seem to recall a classic example from an early 8051 compiler: If you > set optimization high and to minimize memory, it would overlay > variables in the limited RAM space. That made reading the assembly > language pretty confusing at times. > > Mark Borgerson
Mark, The assembly can look confusing, but in a well implemented compiler the variable can be followed by symbolic name as the compiled code walks through the code. Physical RAM locations contain different variables depending on the current PC value. The ChipTools 8051 symbolic debuggers did a good job of tracking code in Keil's 8051 compiler as early as the mid 90's The source level debugging code should be able to track a variable even when it temporarily resides in a register. This resolves cases where the local variable location is reassigned instead of being moved; x and y both local y = x; x = 29; This code should not generate any code for y = x only a symbol table change and source level debug reference change.. Regards, -- Walter Banks Byte Craft Limited http://www.bytecraft.com
In article <48F9EE2C.CA6F7E19@bytecraft.com>, walter@bytecraft.com 
says...
> > > Mark Borgerson wrote: > > > I seem to recall a classic example from an early 8051 compiler: If you > > set optimization high and to minimize memory, it would overlay > > variables in the limited RAM space. That made reading the assembly > > language pretty confusing at times. > > > > Mark Borgerson > > Mark, > > The assembly can look confusing, but in a well implemented compiler > the variable can be followed by symbolic name as the compiled code > walks through the code. Physical RAM locations contain different > variables depending on the current PC value. The ChipTools 8051 > symbolic debuggers did a good job of tracking code in Keil's 8051 > compiler as early as the mid 90's
The early 90's is about the time frame that I was using the 8051. IIRC, it was a small form factor package with only about 2K of EPROM. At the time I was using that 8051 chip, a PIC variant, the MC68HC16, and the M68K. I TRIED to stick with one chip or another for at least a week to minimize the context switch overhead, but was not generally successful. IIRC, debugggers at that time generally involved external hardware with emulator pods---which were well above the company budget limits.
> > The source level debugging code should be able to track a variable > even when it temporarily resides in a register. This resolves cases > where the local variable location is reassigned instead of being moved; > > x and y both local > > y = x; > x = 29; > > This code should not generate any code for y = x only a symbol table > change and source level debug reference change..
I expect that if I ever go back to an 8051 variant, I will better understand the development system and expect better debugging facilities. However, as I'm in a low-volume market where unit cost is not a major constraint, I'll probably stick with the MSP430 series for very low power systems and one or another of the ARM series where I need more processing power. Mark Borgerson