Reply by James Dabbs May 22, 20062006-05-22
Rasmus Fink wrote:

> The device is a AT91SAM7S256, so code space is right now not really an > issue, _YET_ - but the expected product life time is ~7 years, so much > can still happen...
I would look at GCC V4 because of its *much* better THUMB implementation. With that particular part, in most cases, THUMB is the better choice for both size and performance.
Reply by Rasmus Fink May 22, 20062006-05-22
Hi again,

Thanks for the inputs everyone. Just keep them tips'n tricks comin' :-)

I'll post some test results when I dig further into the project.

Cheers
Rasmus


David Brown wrote:
> Roberto Waltman wrote: >> David Brown wrote: >> >>> ... The fun win gcc 4.1 comes when you use the "--combine" and >>> "-fwhole-program" options (along with -O2 or -O3 optimisation). The >>> --combine option tells the compiler to take all the C files on the >>> command line together and compile them at once, including doing >>> inter-procedural optimisations. The -fwhole-program flag can be >>> thought of as creating a new scope level between global and file >>> static, with ordinary global or extern data falling in this level. >>> Only "main" and explicitly declared "externally_visible" items are >>> now at the true global level. Thus the compiler knows all uses of >>> ordinary global data and code, and can optimise appropriately. >>> >>> For example, supposing you have a function in a file "uart.c" such as: >>> >>> void setBaud(unsigned int newBaud) { >>> unsigned int divisor = (osc / 16) / newBaud; >>> divLoReg = (divisor & 0xffff); >>> divHiReg = (divisor >> 16); >>> } >>> >>> with "osc" being defined as a constant in a different module. >>> Another module, say "protocol.c" calls this function as >>> "setBaud(19200)". >>> >>> In many cases, the setBaud function is only ever called from one >>> place in the program, and with a constant value. Yet the compiler >>> must generate the full function, and use an expensive division >>> operation even though all the values are known at compile time. The >>> traditional way to improve this is by making setBaud a macro or, >>> better, a static inline function. >>> >>> Using the "-combine" option, if uart.c and protocol.c are compiled at >>> the same time, the compiler can inline the definition of setBaud into >>> the implementation in protocol.c, and reduce the whole thing down to >>> a couple of memory operations. The code for the setBaud function is >>> still generated, of course, which is a waste of space. It can be >>> removed using the "-ffunction-section" method described by John >>> above, or by using the "-fwhole-program" flag which lets the compiler >>> figure out that it doesn't have to generate code for setBaud at all. >> >> Interesting, thanks for the information. How stable/reliable gcc 4.x >> is when asked to perform this type of optimizations? >> > > I haven't used gcc 4.1 much as yet, and only on the ColdFire, but I've > found no problems with it so far. I haven't made use of -combine or > -fwhole-program for anything other than small test cases. However, I've > not heard of any issues with 4.1 from anyone else - the gcc team have > classified it as stable and are already onto 4.2. In the case of the > ColdFire, the code generator is pretty stable and hasn't changed much in > years, so the changes are all in the front-end and middle-end, which > benefit from being shared with common PC ports and thus are extensively > tested. > >> Also, does anybody know of a tool that would perform these analysis >> and conversions at the C/C++ source code level, so it can be used with >> other compilers? >> >> > > What you would need would be a tool to collect together all your source > files into one source file. Every "static" name should be changed to > have the original module's name as a prefix, and every global name > (except "main") should be made static. Code that abuses the > preprocessor by using different #defines for the same macro name > depending on the module are going to have problems.
Reply by Anton Erasmus May 19, 20062006-05-19
On Wed, 17 May 2006 07:57:20 -0600, "Not Really Me"
<scott@exoXYZtech.com> wrote:

> >"John Devereux" <jdREMOVE@THISdevereux.me.uk> wrote in message >news:87r72t9h2x.fsf@cordelia.devereux.me.uk... >> Rasmus Fink <i.hate.spammming@me.dk> writes: >> >>> Hi, > ><SNIP> > >> Rowley Associates <http://www.rowley.co.uk/> package gcc with their >> own debugger and libraries, this may be an option for you. Not used >> them but looks good. They sell low cost debugging hardware too it >> would appear. >> >> -- >> >> John Devereux > >For GNU tools we have used both Rowley and Microcross. Of the two we find >the Microcross to be generally better. The IDE for debugging seems rather >finicky on the Rowley. The linker config is also better on the Microcross.
One of the big things Rowley supplies is flash programming software for most of the new Flash ARM MCUs. This is usable even with cheap home built JTAG interfaces. What sort of flash programming support does Microcross provide ? Regards Anton Erasmus
Reply by David Brown May 18, 20062006-05-18
Roberto Waltman wrote:
> David Brown wrote: > >> ... The fun >> win gcc 4.1 comes when you use the "--combine" and "-fwhole-program" >> options (along with -O2 or -O3 optimisation). The --combine option >> tells the compiler to take all the C files on the command line together >> and compile them at once, including doing inter-procedural >> optimisations. The -fwhole-program flag can be thought of as creating a >> new scope level between global and file static, with ordinary global or >> extern data falling in this level. Only "main" and explicitly declared >> "externally_visible" items are now at the true global level. Thus the >> compiler knows all uses of ordinary global data and code, and can >> optimise appropriately. >> >> For example, supposing you have a function in a file "uart.c" such as: >> >> void setBaud(unsigned int newBaud) { >> unsigned int divisor = (osc / 16) / newBaud; >> divLoReg = (divisor & 0xffff); >> divHiReg = (divisor >> 16); >> } >> >> with "osc" being defined as a constant in a different module. Another >> module, say "protocol.c" calls this function as "setBaud(19200)". >> >> In many cases, the setBaud function is only ever called from one place >> in the program, and with a constant value. Yet the compiler must >> generate the full function, and use an expensive division operation even >> though all the values are known at compile time. The traditional way to >> improve this is by making setBaud a macro or, better, a static inline >> function. >> >> Using the "-combine" option, if uart.c and protocol.c are compiled at >> the same time, the compiler can inline the definition of setBaud into >> the implementation in protocol.c, and reduce the whole thing down to a >> couple of memory operations. The code for the setBaud function is still >> generated, of course, which is a waste of space. It can be removed >> using the "-ffunction-section" method described by John above, or by >> using the "-fwhole-program" flag which lets the compiler figure out that >> it doesn't have to generate code for setBaud at all. > > Interesting, thanks for the information. How stable/reliable gcc 4.x > is when asked to perform this type of optimizations? >
I haven't used gcc 4.1 much as yet, and only on the ColdFire, but I've found no problems with it so far. I haven't made use of -combine or -fwhole-program for anything other than small test cases. However, I've not heard of any issues with 4.1 from anyone else - the gcc team have classified it as stable and are already onto 4.2. In the case of the ColdFire, the code generator is pretty stable and hasn't changed much in years, so the changes are all in the front-end and middle-end, which benefit from being shared with common PC ports and thus are extensively tested.
> Also, does anybody know of a tool that would perform these analysis > and conversions at the C/C++ source code level, so it can be used with > other compilers? > >
What you would need would be a tool to collect together all your source files into one source file. Every "static" name should be changed to have the original module's name as a prefix, and every global name (except "main") should be made static. Code that abuses the preprocessor by using different #defines for the same macro name depending on the module are going to have problems.
Reply by Roberto Waltman May 18, 20062006-05-18
David Brown wrote:

>... The fun >win gcc 4.1 comes when you use the "--combine" and "-fwhole-program" >options (along with -O2 or -O3 optimisation). The --combine option >tells the compiler to take all the C files on the command line together >and compile them at once, including doing inter-procedural >optimisations. The -fwhole-program flag can be thought of as creating a >new scope level between global and file static, with ordinary global or >extern data falling in this level. Only "main" and explicitly declared >"externally_visible" items are now at the true global level. Thus the >compiler knows all uses of ordinary global data and code, and can >optimise appropriately. > >For example, supposing you have a function in a file "uart.c" such as: > >void setBaud(unsigned int newBaud) { > unsigned int divisor = (osc / 16) / newBaud; > divLoReg = (divisor & 0xffff); > divHiReg = (divisor >> 16); >} > >with "osc" being defined as a constant in a different module. Another >module, say "protocol.c" calls this function as "setBaud(19200)". > >In many cases, the setBaud function is only ever called from one place >in the program, and with a constant value. Yet the compiler must >generate the full function, and use an expensive division operation even >though all the values are known at compile time. The traditional way to >improve this is by making setBaud a macro or, better, a static inline >function. > >Using the "-combine" option, if uart.c and protocol.c are compiled at >the same time, the compiler can inline the definition of setBaud into >the implementation in protocol.c, and reduce the whole thing down to a >couple of memory operations. The code for the setBaud function is still >generated, of course, which is a waste of space. It can be removed >using the "-ffunction-section" method described by John above, or by >using the "-fwhole-program" flag which lets the compiler figure out that >it doesn't have to generate code for setBaud at all.
Interesting, thanks for the information. How stable/reliable gcc 4.x is when asked to perform this type of optimizations? Also, does anybody know of a tool that would perform these analysis and conversions at the C/C++ source code level, so it can be used with other compilers?
Reply by David Brown May 18, 20062006-05-18
John Devereux wrote:
> David Brown <david@westcontrol.removethisbit.com> writes: > >> For smaller programs, gcc 4.1 has the potential to produce smaller and >> faster code by compiling the entire program at once, letting it do >> inter-procedural optimisations even across modules. > > Something related to this that I found makes a big difference for me > is the compiler switches: > > -ffunction-sections -fdata-sections -Wl,--gc-sections > > This puts every function and every data object into its own > section. The -gc-sections link option then strips out sections that > are not used. This happens even if they are global and appear in the > same module (source file) as items that *are* used. > > You also need to modify the link control file changing > > *(.data) to *(.data.*) > and > *(.text) to *(.text.*) > > To pick up the modified section names. > > This then allows you e.g. to write libraries with lots of extra > functions in them, many of which functions might not get used in every > application. > > This seems to work fine in 3.4 (as well as 4.1, presumably). >
Yes, this works (on most gcc targets) for modern gcc versions. The fun win gcc 4.1 comes when you use the "--combine" and "-fwhole-program" options (along with -O2 or -O3 optimisation). The --combine option tells the compiler to take all the C files on the command line together and compile them at once, including doing inter-procedural optimisations. The -fwhole-program flag can be thought of as creating a new scope level between global and file static, with ordinary global or extern data falling in this level. Only "main" and explicitly declared "externally_visible" items are now at the true global level. Thus the compiler knows all uses of ordinary global data and code, and can optimise appropriately. For example, supposing you have a function in a file "uart.c" such as: void setBaud(unsigned int newBaud) { unsigned int divisor = (osc / 16) / newBaud; divLoReg = (divisor & 0xffff); divHiReg = (divisor >> 16); } with "osc" being defined as a constant in a different module. Another module, say "protocol.c" calls this function as "setBaud(19200)". In many cases, the setBaud function is only ever called from one place in the program, and with a constant value. Yet the compiler must generate the full function, and use an expensive division operation even though all the values are known at compile time. The traditional way to improve this is by making setBaud a macro or, better, a static inline function. Using the "-combine" option, if uart.c and protocol.c are compiled at the same time, the compiler can inline the definition of setBaud into the implementation in protocol.c, and reduce the whole thing down to a couple of memory operations. The code for the setBaud function is still generated, of course, which is a waste of space. It can be removed using the "-ffunction-section" method described by John above, or by using the "-fwhole-program" flag which lets the compiler figure out that it doesn't have to generate code for setBaud at all. Obviously a function like this one, which is called once, is not time-critical - but the principle applies. That's the theory, anyway - I don't know how well it works in practice other than for a simple test case on the Coldfire. mvh., David
Reply by John Devereux May 17, 20062006-05-17
David Brown <david@westcontrol.removethisbit.com> writes:

> > For smaller programs, gcc 4.1 has the potential to produce smaller and > faster code by compiling the entire program at once, letting it do > inter-procedural optimisations even across modules.
Something related to this that I found makes a big difference for me is the compiler switches: -ffunction-sections -fdata-sections -Wl,--gc-sections This puts every function and every data object into its own section. The -gc-sections link option then strips out sections that are not used. This happens even if they are global and appear in the same module (source file) as items that *are* used. You also need to modify the link control file changing *(.data) to *(.data.*) and *(.text) to *(.text.*) To pick up the modified section names. This then allows you e.g. to write libraries with lots of extra functions in them, many of which functions might not get used in every application. This seems to work fine in 3.4 (as well as 4.1, presumably). -- John Devereux
Reply by David Brown May 17, 20062006-05-17
Peter Dickerson wrote:
> "Rasmus Fink" <rfi@lortemail.dk> wrote in message > news:446aec08$0$47058$edfadb0f@dread15.news.tele.dk... >> Thanks for the input, guys. >> >> It's nice to hear that the 4.x.x is considered stable not only by its >> authors. Have any of you compared the code size generated by V3.x.x vs >> 4.x.x ? >> >> The device is a AT91SAM7S256, so code space is right now not really an >> issue, _YET_ - but the expected product life time is ~7 years, so much >> can still happen... >> >> /Rasmus >> >> Richard wrote: >>>> I don't think I have ever found a bug with gcc-arm. >>> Some of the earlier 3.x.x versions had big problems in ISR code > generation >>> and ARM/THUMB interworking. I think this has been fixed for some time > now. > > I found 4.1.0 to be a tiny bit bigger than 3.4.3 but also to feel a bit > faster. I'm guessing a few things product inline code rather than call > support routines resulting in a little bloat in return for speed. > > Peter >
For smaller programs, gcc 4.1 has the potential to produce smaller and faster code by compiling the entire program at once, letting it do inter-procedural optimisations even across modules.
Reply by Not Really Me May 17, 20062006-05-17
"John Devereux" <jdREMOVE@THISdevereux.me.uk> wrote in message 
news:87r72t9h2x.fsf@cordelia.devereux.me.uk...
> Rasmus Fink <i.hate.spammming@me.dk> writes: > >> Hi,
<SNIP>
> Rowley Associates <http://www.rowley.co.uk/> package gcc with their > own debugger and libraries, this may be an option for you. Not used > them but looks good. They sell low cost debugging hardware too it > would appear. > > -- > > John Devereux
For GNU tools we have used both Rowley and Microcross. Of the two we find the Microcross to be generally better. The IDE for debugging seems rather finicky on the Rowley. The linker config is also better on the Microcross. -- Scott Validated Software Corp.
Reply by Peter Dickerson May 17, 20062006-05-17
"Rasmus Fink" <rfi@lortemail.dk> wrote in message
news:446aec08$0$47058$edfadb0f@dread15.news.tele.dk...
> Thanks for the input, guys. > > It's nice to hear that the 4.x.x is considered stable not only by its > authors. Have any of you compared the code size generated by V3.x.x vs > 4.x.x ? > > The device is a AT91SAM7S256, so code space is right now not really an > issue, _YET_ - but the expected product life time is ~7 years, so much > can still happen... > > /Rasmus > > Richard wrote: > >> I don't think I have ever found a bug with gcc-arm. > > > > Some of the earlier 3.x.x versions had big problems in ISR code
generation
> > and ARM/THUMB interworking. I think this has been fixed for some time
now.
> >
I found 4.1.0 to be a tiny bit bigger than 3.4.3 but also to feel a bit faster. I'm guessing a few things product inline code rather than call support routines resulting in a little bloat in return for speed. Peter