Code size reduction migrating from PIC18 to Cortex M0| page 3

Reply by David Brown ●June 23, 20122012-06-23

On 23/06/12 01:03, Mark Borgerson wrote:
> In article<svCdnfOCcv-WvHnSnZ2dnUVZ8rednZ2d@lyse.net>,
> david@westcontrol.removethisbit.com says...
>>
>> On 22/06/2012 02:42, Mark Borgerson wrote:
>>> In article<9b55cce9-96db-46f4-909a-1f6500deb237

>>> It seems that GCC just doesn't match up to IAR at producing compact
>>> code at low optimization levels.    OTOH, given that EW_ARM costs
>>> several KBucks, it SHOULD do better!
>>>
>>>
>>
>> The problems here don't lie with the compiler - they lie with the user.
>>    I'm sure that EW_ARM produces better code than gcc (correctly used) in
>> some cases - but I am also sure that gcc can do better than EW_ARM in
>> other cases.  I really don't think there is going to be a big difference
>> in code generation quality - if that's why you paid K$ for EW, you've
>> probably wasted your money.  There are many reasons for choosing
>> different toolchains, but generally speaking I don't see a large
>> difference in code generation quality between the major toolchains
>> (including gcc) for 32-bit processors.  Occasionally you'll see major
>> differences in particular kinds of code, but for the most part it is the
>> user that makes the biggest difference.
>>
>> On place where EW_ARM might score over the gcc setup this user has (he
>> hasn't yet said anything about the rest - is it home-made, CodeSourcery,
>> Code Red, etc.?) is that EW_ARM might make it easier to get the compiler
>> switches correct, and avoid this "I don't know how to enable debugging
>> and optimisation" or "what's a warning?" nonsense.
>
> One of the reasons I like the EW_ARM system is that the IDE handles all
> the compiler and linker flags with a pretty good GUI.   You can override
> the GUI options with #pragma statements in the code----which I haven't
> found reason to do for the most part.

As I say, we don't know what toolchain package the poster here was 
using, but there certainly are gcc-based toolchain packages available 
with gcc that handle this fine.  We use Code Sourcery for a couple of 
different processors - they package gcc along with libraries, debugger 
support, and Eclipse to give similar ease-of-use.  Although Code 
Sourcery is that package I am most familiar with, I know that others 
such as Code Red are similar.  I don't know what the poster uses that 
makes it apparently so hard to get it right.

Personally, I prefer to use makefiles and explicit compiler flags (or 
pragmas / function attributes as needed).  I think that gives better 
control, more replicable results, and is more suitable for re-use on 
different projects, different development hosts, and different tool 
versions.  But that's a matter of taste - and I don't recommend it as 
the first step for someone unfamiliar with the tools.

>>
>>
>> It hardly needs saying, but when run properly, my brief test with gcc
>> produces the same code here as you get with EW_ARM, and the same
>> warnings about x and y.
>>
> That's comforting in a way.  While I now use EW_ARM for most of my
> current projects,  I spent about 5 years using GCC_ARM on a project
> based on Linux.   I would hate to think that I was producing crap code
> all that time!   I had some experienced Linux users to set up my dev
> system and show me how to generate good make files, so I probably
> got pretty good results there.
>

gcc has very extensive static error checking and warning mechanisms, and 
they've been getting better with each version.  It doesn't have MISRA 
rule checking, which I believe EW has, but otherwise it is top-class. 
Of course, you have to enable the warnings!

> I'm using EW_ARM for projects that don't have the resources of a
> Linux OS, and I prefer it for these projects.

Here is the key point that makes EW worth the money for /you/ - you 
prefer it.  When choosing tools and judging value for money, questions 
of code generation quality are normally secondary to what the developer 
finds most productive - developer time outweighs tool costs.

>>
>> I'm sure that EW_ARM has a similar option, but gcc has a "-fno-common"
>> switch to disable "common" sections.  With this disabled, definitions
>> like "unsigned long a, b;" can only appear once in the program for each
>> global identifier, and the space is allocated directly in the .bss
>> inside the module that made the definition.  gcc can use this extra
>> information to take advantage of relative placement between variables,
>> and generate addressing via section anchors:
>>
>>
>> Command line:
>> arm-none-eabi-gcc -mcpu=cortex-m3 -mthumb -S testcode.c -Wall -Os
>> -fno-common
>>
>>
>> test:
>>           @ args = 0, pretend = 0, frame = 0
>>           @ frame_needed = 0, uses_anonymous_args = 0
>>           @ link register save eliminated.
>>           ldr     r3, .L6
>>           ldr     r0, [r3, #4]
>>           adds    r2, r0, #5
>>           str     r2, [r3, #0]
>>           bx      lr
>> .L7:
>>           .align  2
>> .L6:
>>           .word   .LANCHOR0
>>           .size   test, .-test
>>           .global b
>>           .global a
>>           .bss
>>           .align  2
>>           .set    .LANCHOR0,. + 0
>>           .type   a, %object
>>           .size   a, 4
>> a:
>>           .space  4
>>           .type   b, %object
>>           .size   b, 4
>> b:
>>           .space  4
>>           .ident  "GCC: (Sourcery CodeBench Lite 2011.09-69) 4.6.1"
>>
>>
>> It's all about learning to use the tools you have, rather than buying
>> more expensive tools.
>
> Which reminds me----when counting bytes in code like this, it's easy to
> forget the bytes used in the constant tables that provide the addresses
> of variables.   A 16-bit variable may require a 32-bit table entry.
>

Indeed - people trying to "hand-optimise" their code often miss out 
details like that.  ("I'll use a 16-bit variable instead of a 32-bit 
variable to save memory space...".)

If you can use section anchors (like above), or a "small data section" 
(as used by the PPC ABI, though not the ARM, for some reason), then you 
can avoid most of the individual storage and loads of addresses.

>
> I started with EW_ARM about three years before I started on the Linux
> project.  The original compiler was purchased by the customer---who had
> no preferences, but was developing a project with fairly limited
> hardware resources.  They asked what compiler I'd like and I picked EW-
> ARM.   At that time, I'd been using CodeWarrior for the M68K for many
> years and EW_ARM  had the same 'feel'.  When it came time to do the
> Linux project, the transition to GCC took MUCH longer than the
> transition from CodeWarrior to EW_ARM.  Of course, much of that was in
> setting up a virtual machine on the PC  and learning Linux so that I
> could use GCC.

Learning to develop Linux programs is a lot more than just learning gcc, 
as you've found out.

One gets used to one's tools.  I've been using gcc for embedded 
development for some 15 years, and have used it on perhaps 8 different 
processor architectures.  So for me, gcc is always the obvious choice 
for new devices, since I am most familiar with it.

I actually think there is a fair similarity between modern CodeWarrior 
and gcc - CW supports many gcc extensions such as the inline assembly 
syntax and several attributes.  On the IDE side, of course, gcc has no 
IDE - it's a compiler.  But gcc is often used with Eclipse, which is 
what CW now uses for most targets.  (The "classic" CW IDE was horrible - 
if EW has a similar feel, then I'll remember not to buy it!)

>
> One thing that I missed on the Linux project is that I didn't have a
> debugger equivalent to C-Spy that is integrated into EW_ARM.  Debugging
> on the Linux system was mostly  "Save everything and analyze later".
>

There are /lots/ of debugging options for Linux development, that can be 
much more powerful than C-Spy (depending on the type of programming you 
are doing, of course).  However, it all involves a lot more learning and 
experimenting than the ease-of-use of an integrated debugger in an IDE.

>
>
>
> Of course, the original poster is discussing the type of code that few
> Linux programmers write----direct interfacing to peripherals.  My recent
> experience with Linux and digital cameras was pretty frustrating.  I was
> dependent on others to provide the drivers--and they often didn't work
> quite right with the particular camera I was using.  That's a story for
> another time, though.

Indeed.

mvh.,

David

>>
>> mvh.,
>>
>> David
>
> Mark Borgerson
>

Reply by hamilton ●June 23, 20122012-06-23

This discussion is kind of silly.

Has anyone tried a "simple" program with multiplying two or four 
flouting point numbers and displaying the result to an LCD ??

As has been mentioned, you don't buy a V8 just to watch the cylinders go 
up and down.

You buy a V8 to use it.

How much code space would the PIC18 take to multiply two/four floats ??

hamilton

Reply by Mark Borgerson ●June 23, 20122012-06-23

In article <cOidnQo3hoKmCnjSnZ2dnUVZ7tOdnZ2d@lyse.net>, 
david.brown@removethis.hesbynett.no says...
> 
> On 23/06/12 01:03, Mark Borgerson wrote:
> > In article<svCdnfOCcv-WvHnSnZ2dnUVZ8rednZ2d@lyse.net>,
> > david@westcontrol.removethisbit.com says...
> >>
> >> On 22/06/2012 02:42, Mark Borgerson wrote:
> >>> In article<9b55cce9-96db-46f4-909a-1f6500deb237
> 
> >>> It seems that GCC just doesn't match up to IAR at producing compact
> >>> code at low optimization levels.    OTOH, given that EW_ARM costs
> >>> several KBucks, it SHOULD do better!
> >>>
> >>>
> >>
> >> The problems here don't lie with the compiler - they lie with the user.
> >>    I'm sure that EW_ARM produces better code than gcc (correctly used) in
> >> some cases - but I am also sure that gcc can do better than EW_ARM in
> >> other cases.  I really don't think there is going to be a big difference
> >> in code generation quality - if that's why you paid K$ for EW, you've
> >> probably wasted your money.  There are many reasons for choosing
> >> different toolchains, but generally speaking I don't see a large
> >> difference in code generation quality between the major toolchains
> >> (including gcc) for 32-bit processors.  Occasionally you'll see major
> >> differences in particular kinds of code, but for the most part it is the
> >> user that makes the biggest difference.
> >>
> >> On place where EW_ARM might score over the gcc setup this user has (he
> >> hasn't yet said anything about the rest - is it home-made, CodeSourcery,
> >> Code Red, etc.?) is that EW_ARM might make it easier to get the compiler
> >> switches correct, and avoid this "I don't know how to enable debugging
> >> and optimisation" or "what's a warning?" nonsense.
> >
> > One of the reasons I like the EW_ARM system is that the IDE handles all
> > the compiler and linker flags with a pretty good GUI.   You can override
> > the GUI options with #pragma statements in the code----which I haven't
> > found reason to do for the most part.
> 
> As I say, we don't know what toolchain package the poster here was 
> using, but there certainly are gcc-based toolchain packages available 
> with gcc that handle this fine.  We use Code Sourcery for a couple of 
> different processors - they package gcc along with libraries, debugger 
> support, and Eclipse to give similar ease-of-use.  Although Code 
> Sourcery is that package I am most familiar with, I know that others 
> such as Code Red are similar.  I don't know what the poster uses that 
> makes it apparently so hard to get it right.
> 
> Personally, I prefer to use makefiles and explicit compiler flags (or 
> pragmas / function attributes as needed).  I think that gives better 
> control, more replicable results, and is more suitable for re-use on 
> different projects, different development hosts, and different tool 
> versions.  But that's a matter of taste - and I don't recommend it as 
> the first step for someone unfamiliar with the tools.

I agree.  I was glad I had other programmers and sample files  to help 
me through the initial setup of GCC-ARM.
> 
> >>
> >>
> >> It hardly needs saying, but when run properly, my brief test with gcc
> >> produces the same code here as you get with EW_ARM, and the same
> >> warnings about x and y.
> >>
> > That's comforting in a way.  While I now use EW_ARM for most of my
> > current projects,  I spent about 5 years using GCC_ARM on a project
> > based on Linux.   I would hate to think that I was producing crap code
> > all that time!   I had some experienced Linux users to set up my dev
> > system and show me how to generate good make files, so I probably
> > got pretty good results there.
> >
> 
> gcc has very extensive static error checking and warning mechanisms, and 
> they've been getting better with each version.  It doesn't have MISRA 
> rule checking, which I believe EW has, but otherwise it is top-class. 
> Of course, you have to enable the warnings!

EW does have MISRA rule checking---but I haven't started to use it yet.
> 
> > I'm using EW_ARM for projects that don't have the resources of a
> > Linux OS, and I prefer it for these projects.
> 
> Here is the key point that makes EW worth the money for /you/ - you 
> prefer it.  When choosing tools and judging value for money, questions 
> of code generation quality are normally secondary to what the developer 
> finds most productive - developer time outweighs tool costs.

I found that to be very true when I switched from a very low-end PCB 
layout program to PADS PCB.   The $4K cost of that system has been paid 
back many times over in time saved in the design and layout of PC boards 
for customers.   Heck, I've even learned to trust the autorouter when it 
is properly set up. ("Trust, but verify"  still applies, though).  The 
autorouter really helps with those fine-pitch QFP STM32 chips!  I was 
able to pull an MSP430 from an existing design and plug in an STM32F205  
in just a few days.
> 
<< Snip Example Code>>
> >>
> >> It's all about learning to use the tools you have, rather than buying
> >> more expensive tools.
> >
> > Which reminds me----when counting bytes in code like this, it's easy to
> > forget the bytes used in the constant tables that provide the addresses
> > of variables.   A 16-bit variable may require a 32-bit table entry.
> >
> 
> Indeed - people trying to "hand-optimise" their code often miss out 
> details like that.  ("I'll use a 16-bit variable instead of a 32-bit 
> variable to save memory space...".)
> 
> If you can use section anchors (like above), or a "small data section" 
> (as used by the PPC ABI, though not the ARM, for some reason), then you 
> can avoid most of the individual storage and loads of addresses.

It took me a while when I first started looking at disassembled ARM code 
to realize that constants were being loaded using PC-relative offsets.  
When I finally figured that out it took me back to the mid-80's when I 
was writing Macintosh code using position-independent modules.  What a 
rush of nostalgia that was!
> 
> >
> > I started with EW_ARM about three years before I started on the Linux
> > project.  The original compiler was purchased by the customer---who had
> > no preferences, but was developing a project with fairly limited
> > hardware resources.  They asked what compiler I'd like and I picked EW-
> > ARM.   At that time, I'd been using CodeWarrior for the M68K for many
> > years and EW_ARM  had the same 'feel'.  When it came time to do the
> > Linux project, the transition to GCC took MUCH longer than the
> > transition from CodeWarrior to EW_ARM.  Of course, much of that was in
> > setting up a virtual machine on the PC  and learning Linux so that I
> > could use GCC.
> 
> Learning to develop Linux programs is a lot more than just learning gcc, 
> as you've found out.
> 
> One gets used to one's tools.  I've been using gcc for embedded 
> development for some 15 years, and have used it on perhaps 8 different 
> processor architectures.  So for me, gcc is always the obvious choice 
> for new devices, since I am most familiar with it.
> 
> I actually think there is a fair similarity between modern CodeWarrior 
> and gcc - CW supports many gcc extensions such as the inline assembly 
> syntax and several attributes.  On the IDE side, of course, gcc has no 
> IDE - it's a compiler.  But gcc is often used with Eclipse, which is 
> what CW now uses for most targets.  (The "classic" CW IDE was horrible - 
> if EW has a similar feel, then I'll remember not to buy it!)

I think I mis-stated part of that.  It is really the editor and
project file window that are similar between EW and Codewarrior.  The 
other parts of EW are much better with fairly straightforward menus and 
dialog boxes for setting  compiler,linker, and debugger options.  I was 
using Codewarrior to develop code for the Persistor micro data loggers.  
That was a pretty tightly constrained development environment and 
Persistor provided most of the setup files and initial project files.  
If you have to start from scratch,  I agree that the old Codewarrior was 
a true PITA to get set up.   The Persistor logger didn't have a true 
debug capability, so I can't comment on the capabilities of Codewarrior 
in that regard.   I really like the integrated C-Spy debugger in EW, 
though.
> 
> >
> > One thing that I missed on the Linux project is that I didn't have a
> > debugger equivalent to C-Spy that is integrated into EW_ARM.  Debugging
> > on the Linux system was mostly  "Save everything and analyze later".
> >
> 
> There are /lots/ of debugging options for Linux development, that can be 
> much more powerful than C-Spy (depending on the type of programming you 
> are doing, of course).  However, it all involves a lot more learning and 
> experimenting than the ease-of-use of an integrated debugger in an IDE.

Of of the problems with debugging the Linux-based system was that the 
code was controlling an autonomous parafoil supply delivery system.  
When the system was operating, it started 20,000 feet above and several 
miles away from the programmer!  Thus, the "record everything and 
analyze later paradigm"   I did develop a simulated version that grabbed 
the GPS and other sensor values via hooks and substituted simulated 
values based on a simple flight model. That sim ran on the target 
hardware while it sat on the bench.  I got really tired of listening to 
the whining control servos!  Unfortunately, the flight model wasn't up 
to simulating GPS loss, sticky servos, and all the things that can 
happen to parafoils stressed beyond their flight limits.

Flight algorithms were tested in sims running on Borland CPP Builder on 
a PC. That allowed good debugging, lots of intermediate variable 
recording and graphic displays.  The CPP Builder sims were written to 
use the same C control code that ran on the target hardware.
> 
> >
> >
> >
> > Of course, the original poster is discussing the type of code that few
> > Linux programmers write----direct interfacing to peripherals.  My recent
> > experience with Linux and digital cameras was pretty frustrating.  I was
> > dependent on others to provide the drivers--and they often didn't work
> > quite right with the particular camera I was using.  That's a story for
> > another time, though.
> 
> Indeed.
> 

Mark Borgerson

Reply by David Brown ●June 23, 20122012-06-23

On 23/06/12 17:23, hamilton wrote:
>
> This discussion is kind of silly.
>
> Has anyone tried a "simple" program with multiplying two or four
> flouting point numbers and displaying the result to an LCD ??
>
> As has been mentioned, you don't buy a V8 just to watch the cylinders go
> up and down.
>
> You buy a V8 to use it.
>
> How much code space would the PIC18 take to multiply two/four floats ??
>
> hamilton
>

It is certainly true that you can only get a real-world test using 
real-world code.  The trouble is, real-world code is not very suitable 
for discussing in a newsgroup.  So the best we can do is look at some 
simple sample functions, and work from there.  And when a poster is 
having such trouble generating good code from a simple "a = b + 5" 
function, it makes sense to start with that and work up.

You bring up another point here, of course - while the PIC18 may have 
compact code for setting a bit or adding a couple of 8-bit numbers, the 
ARM code will be more compact (and /much/ faster) for multiplying 
floats.  Code comparisons on wildly different architectures are heavily 
dependent on the sort of code used.

Reply by dp ●June 24, 20122012-06-24

On Jun 24, 1:37=A0am, David Brown <david.br...@removethis.hesbynett.no>
wrote:
>....
> ARM code will be more compact (and /much/ faster) for multiplying
> floats. =A0Code comparisons on wildly different architectures are heavily
> dependent on the sort of code used.

Hi David,
have you tried how fast ARM does at say 64 (or 32) bit MAC in a filter
loop?
Not so long ago I had to reach the limit for a power CPU (MPC5200B)
which is specified at 2 cycles per MAC. Was not trivial at all, doing
it in a loop DSP-like took about 10 cycles mainly because of data
dependencies. Had to spread things over many registers until I got
there, 2.1 cycles (with load/store included). That for a filer with
hundreds of taps.
How many FP registers do ARM have? It took using 24 out of the 32
to get to 2.1 cycles (although using the same technique over 18
registers yielded about 2.3 cycles, 15 was dramatically worse,
data dependencies began to kick in - likely a 6 stage pipeline).

Dimiter

------------------------------------------------------
Dimiter Popoff               Transgalactic Instruments

http://www.tgi-sci.com
------------------------------------------------------
http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

Reply by David Brown ●June 24, 20122012-06-24

On 24/06/12 12:19, dp wrote:
> On Jun 24, 1:37 am, David Brown<david.br...@removethis.hesbynett.no>
> wrote:
>> ....
>> ARM code will be more compact (and /much/ faster) for multiplying
>> floats.  Code comparisons on wildly different architectures are heavily
>> dependent on the sort of code used.
>
> Hi David,
> have you tried how fast ARM does at say 64 (or 32) bit MAC in a filter
> loop?
> Not so long ago I had to reach the limit for a power CPU (MPC5200B)
> which is specified at 2 cycles per MAC. Was not trivial at all, doing
> it in a loop DSP-like took about 10 cycles mainly because of data
> dependencies. Had to spread things over many registers until I got
> there, 2.1 cycles (with load/store included). That for a filer with
> hundreds of taps.
> How many FP registers do ARM have? It took using 24 out of the 32
> to get to 2.1 cycles (although using the same technique over 18
> registers yielded about 2.3 cycles, 15 was dramatically worse,
> data dependencies began to kick in - likely a 6 stage pipeline).
>
> Dimiter
>

I haven't tried anything exactly like that (I love doing that sort of 
thing, but seldom have the need).  On the PPC cores I have used (e200z7 
recently), it can make a big difference to the speed of the code when 
things are spread out over many registers.  Most PPC cores have quite 
long pipelines, and some have super-scaler execution, speculative 
execution or loads, etc., which make it a big challenge getting 
everything right here.  You also need to take into account the cache - 
getting the computation flow to match the cache flow is vital.

In comparison, the Cortex-M0 is very simple.  It has pipelining, but not 
nearly as deep, and it does not have a cache to consider.  Some Cortex-M 
devices have a bit of cache, and may also have tightly-coupled memory, 
so there you have to consider the flow of data into and out of the cpu 
core.  But I expect it is easier to get close to peak performance from 
an M0 than from a typical PPC core.

Reply by Mark Borgerson ●June 24, 20122012-06-24

In article <398b2c50-5de0-4eb0-8775-d4a200ad7a30
@a16g2000vby.googlegroups.com>, dp@tgi-sci.com says...
> 
> On Jun 24, 1:37&#4294967295;am, David Brown <david.br...@removethis.hesbynett.no>
> wrote:
> >....
> > ARM code will be more compact (and /much/ faster) for multiplying
> > floats. &#4294967295;Code comparisons on wildly different architectures are heavily
> > dependent on the sort of code used.
> 
> Hi David,
> have you tried how fast ARM does at say 64 (or 32) bit MAC in a filter
> loop?
> Not so long ago I had to reach the limit for a power CPU (MPC5200B)
> which is specified at 2 cycles per MAC. Was not trivial at all, doing
> it in a loop DSP-like took about 10 cycles mainly because of data
> dependencies. Had to spread things over many registers until I got
> there, 2.1 cycles (with load/store included). That for a filer with
> hundreds of taps.
> How many FP registers do ARM have? It took using 24 out of the 32
> to get to 2.1 cycles (although using the same technique over 18
> registers yielded about 2.3 cycles, 15 was dramatically worse,
> data dependencies began to kick in - likely a 6 stage pipeline).
> 
> Dimiter
> 
> ------------------------------------------------------
> Dimiter Popoff               Transgalactic Instruments
> 
> http://www.tgi-sci.com
> ------------------------------------------------------
> http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/

I don't think the ARM-CortexM4 chips have 64-bit FPUs and their clock 
speeds top out at about 160MHz.  You're not going to get anything near 
the MPC5200B performance.

The closest you might get with an ARM-based chip might be one of the TI 
OMAP chips which has a fixed/floating  DSP coprocessor.

Mark Borgerson

Previous 1 23Next

Code size reduction migrating from PIC18 to Cortex M0

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group