EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

Code size reduction migrating from PIC18 to Cortex M0

Started by Kvik May 24, 2012
On 23/06/12 01:03, Mark Borgerson wrote:
> In article<svCdnfOCcv-WvHnSnZ2dnUVZ8rednZ2d@lyse.net>, > david@westcontrol.removethisbit.com says... >> >> On 22/06/2012 02:42, Mark Borgerson wrote: >>> In article<9b55cce9-96db-46f4-909a-1f6500deb237
>>> It seems that GCC just doesn't match up to IAR at producing compact >>> code at low optimization levels. OTOH, given that EW_ARM costs >>> several KBucks, it SHOULD do better! >>> >>> >> >> The problems here don't lie with the compiler - they lie with the user. >> I'm sure that EW_ARM produces better code than gcc (correctly used) in >> some cases - but I am also sure that gcc can do better than EW_ARM in >> other cases. I really don't think there is going to be a big difference >> in code generation quality - if that's why you paid K$ for EW, you've >> probably wasted your money. There are many reasons for choosing >> different toolchains, but generally speaking I don't see a large >> difference in code generation quality between the major toolchains >> (including gcc) for 32-bit processors. Occasionally you'll see major >> differences in particular kinds of code, but for the most part it is the >> user that makes the biggest difference. >> >> On place where EW_ARM might score over the gcc setup this user has (he >> hasn't yet said anything about the rest - is it home-made, CodeSourcery, >> Code Red, etc.?) is that EW_ARM might make it easier to get the compiler >> switches correct, and avoid this "I don't know how to enable debugging >> and optimisation" or "what's a warning?" nonsense. > > One of the reasons I like the EW_ARM system is that the IDE handles all > the compiler and linker flags with a pretty good GUI. You can override > the GUI options with #pragma statements in the code----which I haven't > found reason to do for the most part.
As I say, we don't know what toolchain package the poster here was using, but there certainly are gcc-based toolchain packages available with gcc that handle this fine. We use Code Sourcery for a couple of different processors - they package gcc along with libraries, debugger support, and Eclipse to give similar ease-of-use. Although Code Sourcery is that package I am most familiar with, I know that others such as Code Red are similar. I don't know what the poster uses that makes it apparently so hard to get it right. Personally, I prefer to use makefiles and explicit compiler flags (or pragmas / function attributes as needed). I think that gives better control, more replicable results, and is more suitable for re-use on different projects, different development hosts, and different tool versions. But that's a matter of taste - and I don't recommend it as the first step for someone unfamiliar with the tools.
>> >> >> It hardly needs saying, but when run properly, my brief test with gcc >> produces the same code here as you get with EW_ARM, and the same >> warnings about x and y. >> > That's comforting in a way. While I now use EW_ARM for most of my > current projects, I spent about 5 years using GCC_ARM on a project > based on Linux. I would hate to think that I was producing crap code > all that time! I had some experienced Linux users to set up my dev > system and show me how to generate good make files, so I probably > got pretty good results there. >
gcc has very extensive static error checking and warning mechanisms, and they've been getting better with each version. It doesn't have MISRA rule checking, which I believe EW has, but otherwise it is top-class. Of course, you have to enable the warnings!
> I'm using EW_ARM for projects that don't have the resources of a > Linux OS, and I prefer it for these projects.
Here is the key point that makes EW worth the money for /you/ - you prefer it. When choosing tools and judging value for money, questions of code generation quality are normally secondary to what the developer finds most productive - developer time outweighs tool costs.
>> >> I'm sure that EW_ARM has a similar option, but gcc has a "-fno-common" >> switch to disable "common" sections. With this disabled, definitions >> like "unsigned long a, b;" can only appear once in the program for each >> global identifier, and the space is allocated directly in the .bss >> inside the module that made the definition. gcc can use this extra >> information to take advantage of relative placement between variables, >> and generate addressing via section anchors: >> >> >> Command line: >> arm-none-eabi-gcc -mcpu=cortex-m3 -mthumb -S testcode.c -Wall -Os >> -fno-common >> >> >> test: >> @ args = 0, pretend = 0, frame = 0 >> @ frame_needed = 0, uses_anonymous_args = 0 >> @ link register save eliminated. >> ldr r3, .L6 >> ldr r0, [r3, #4] >> adds r2, r0, #5 >> str r2, [r3, #0] >> bx lr >> .L7: >> .align 2 >> .L6: >> .word .LANCHOR0 >> .size test, .-test >> .global b >> .global a >> .bss >> .align 2 >> .set .LANCHOR0,. + 0 >> .type a, %object >> .size a, 4 >> a: >> .space 4 >> .type b, %object >> .size b, 4 >> b: >> .space 4 >> .ident "GCC: (Sourcery CodeBench Lite 2011.09-69) 4.6.1" >> >> >> It's all about learning to use the tools you have, rather than buying >> more expensive tools. > > Which reminds me----when counting bytes in code like this, it's easy to > forget the bytes used in the constant tables that provide the addresses > of variables. A 16-bit variable may require a 32-bit table entry. >
Indeed - people trying to "hand-optimise" their code often miss out details like that. ("I'll use a 16-bit variable instead of a 32-bit variable to save memory space...".) If you can use section anchors (like above), or a "small data section" (as used by the PPC ABI, though not the ARM, for some reason), then you can avoid most of the individual storage and loads of addresses.
> > I started with EW_ARM about three years before I started on the Linux > project. The original compiler was purchased by the customer---who had > no preferences, but was developing a project with fairly limited > hardware resources. They asked what compiler I'd like and I picked EW- > ARM. At that time, I'd been using CodeWarrior for the M68K for many > years and EW_ARM had the same 'feel'. When it came time to do the > Linux project, the transition to GCC took MUCH longer than the > transition from CodeWarrior to EW_ARM. Of course, much of that was in > setting up a virtual machine on the PC and learning Linux so that I > could use GCC.
Learning to develop Linux programs is a lot more than just learning gcc, as you've found out. One gets used to one's tools. I've been using gcc for embedded development for some 15 years, and have used it on perhaps 8 different processor architectures. So for me, gcc is always the obvious choice for new devices, since I am most familiar with it. I actually think there is a fair similarity between modern CodeWarrior and gcc - CW supports many gcc extensions such as the inline assembly syntax and several attributes. On the IDE side, of course, gcc has no IDE - it's a compiler. But gcc is often used with Eclipse, which is what CW now uses for most targets. (The "classic" CW IDE was horrible - if EW has a similar feel, then I'll remember not to buy it!)
> > One thing that I missed on the Linux project is that I didn't have a > debugger equivalent to C-Spy that is integrated into EW_ARM. Debugging > on the Linux system was mostly "Save everything and analyze later". >
There are /lots/ of debugging options for Linux development, that can be much more powerful than C-Spy (depending on the type of programming you are doing, of course). However, it all involves a lot more learning and experimenting than the ease-of-use of an integrated debugger in an IDE.
> > > > Of course, the original poster is discussing the type of code that few > Linux programmers write----direct interfacing to peripherals. My recent > experience with Linux and digital cameras was pretty frustrating. I was > dependent on others to provide the drivers--and they often didn't work > quite right with the particular camera I was using. That's a story for > another time, though.
Indeed. mvh., David
>> >> mvh., >> >> David > > Mark Borgerson >
This discussion is kind of silly.

Has anyone tried a "simple" program with multiplying two or four 
flouting point numbers and displaying the result to an LCD ??

As has been mentioned, you don't buy a V8 just to watch the cylinders go 
up and down.

You buy a V8 to use it.

How much code space would the PIC18 take to multiply two/four floats ??

hamilton

In article <cOidnQo3hoKmCnjSnZ2dnUVZ7tOdnZ2d@lyse.net>, 
david.brown@removethis.hesbynett.no says...
> > On 23/06/12 01:03, Mark Borgerson wrote: > > In article<svCdnfOCcv-WvHnSnZ2dnUVZ8rednZ2d@lyse.net>, > > david@westcontrol.removethisbit.com says... > >> > >> On 22/06/2012 02:42, Mark Borgerson wrote: > >>> In article<9b55cce9-96db-46f4-909a-1f6500deb237 > > >>> It seems that GCC just doesn't match up to IAR at producing compact > >>> code at low optimization levels. OTOH, given that EW_ARM costs > >>> several KBucks, it SHOULD do better! > >>> > >>> > >> > >> The problems here don't lie with the compiler - they lie with the user. > >> I'm sure that EW_ARM produces better code than gcc (correctly used) in > >> some cases - but I am also sure that gcc can do better than EW_ARM in > >> other cases. I really don't think there is going to be a big difference > >> in code generation quality - if that's why you paid K$ for EW, you've > >> probably wasted your money. There are many reasons for choosing > >> different toolchains, but generally speaking I don't see a large > >> difference in code generation quality between the major toolchains > >> (including gcc) for 32-bit processors. Occasionally you'll see major > >> differences in particular kinds of code, but for the most part it is the > >> user that makes the biggest difference. > >> > >> On place where EW_ARM might score over the gcc setup this user has (he > >> hasn't yet said anything about the rest - is it home-made, CodeSourcery, > >> Code Red, etc.?) is that EW_ARM might make it easier to get the compiler > >> switches correct, and avoid this "I don't know how to enable debugging > >> and optimisation" or "what's a warning?" nonsense. > > > > One of the reasons I like the EW_ARM system is that the IDE handles all > > the compiler and linker flags with a pretty good GUI. You can override > > the GUI options with #pragma statements in the code----which I haven't > > found reason to do for the most part. > > As I say, we don't know what toolchain package the poster here was > using, but there certainly are gcc-based toolchain packages available > with gcc that handle this fine. We use Code Sourcery for a couple of > different processors - they package gcc along with libraries, debugger > support, and Eclipse to give similar ease-of-use. Although Code > Sourcery is that package I am most familiar with, I know that others > such as Code Red are similar. I don't know what the poster uses that > makes it apparently so hard to get it right. > > Personally, I prefer to use makefiles and explicit compiler flags (or > pragmas / function attributes as needed). I think that gives better > control, more replicable results, and is more suitable for re-use on > different projects, different development hosts, and different tool > versions. But that's a matter of taste - and I don't recommend it as > the first step for someone unfamiliar with the tools.
I agree. I was glad I had other programmers and sample files to help me through the initial setup of GCC-ARM.
> > >> > >> > >> It hardly needs saying, but when run properly, my brief test with gcc > >> produces the same code here as you get with EW_ARM, and the same > >> warnings about x and y. > >> > > That's comforting in a way. While I now use EW_ARM for most of my > > current projects, I spent about 5 years using GCC_ARM on a project > > based on Linux. I would hate to think that I was producing crap code > > all that time! I had some experienced Linux users to set up my dev > > system and show me how to generate good make files, so I probably > > got pretty good results there. > > > > gcc has very extensive static error checking and warning mechanisms, and > they've been getting better with each version. It doesn't have MISRA > rule checking, which I believe EW has, but otherwise it is top-class. > Of course, you have to enable the warnings!
EW does have MISRA rule checking---but I haven't started to use it yet.
> > > I'm using EW_ARM for projects that don't have the resources of a > > Linux OS, and I prefer it for these projects. > > Here is the key point that makes EW worth the money for /you/ - you > prefer it. When choosing tools and judging value for money, questions > of code generation quality are normally secondary to what the developer > finds most productive - developer time outweighs tool costs.
I found that to be very true when I switched from a very low-end PCB layout program to PADS PCB. The $4K cost of that system has been paid back many times over in time saved in the design and layout of PC boards for customers. Heck, I've even learned to trust the autorouter when it is properly set up. ("Trust, but verify" still applies, though). The autorouter really helps with those fine-pitch QFP STM32 chips! I was able to pull an MSP430 from an existing design and plug in an STM32F205 in just a few days.
>
<< Snip Example Code>>
> >> > >> It's all about learning to use the tools you have, rather than buying > >> more expensive tools. > > > > Which reminds me----when counting bytes in code like this, it's easy to > > forget the bytes used in the constant tables that provide the addresses > > of variables. A 16-bit variable may require a 32-bit table entry. > > > > Indeed - people trying to "hand-optimise" their code often miss out > details like that. ("I'll use a 16-bit variable instead of a 32-bit > variable to save memory space...".) > > If you can use section anchors (like above), or a "small data section" > (as used by the PPC ABI, though not the ARM, for some reason), then you > can avoid most of the individual storage and loads of addresses.
It took me a while when I first started looking at disassembled ARM code to realize that constants were being loaded using PC-relative offsets. When I finally figured that out it took me back to the mid-80's when I was writing Macintosh code using position-independent modules. What a rush of nostalgia that was!
> > > > > I started with EW_ARM about three years before I started on the Linux > > project. The original compiler was purchased by the customer---who had > > no preferences, but was developing a project with fairly limited > > hardware resources. They asked what compiler I'd like and I picked EW- > > ARM. At that time, I'd been using CodeWarrior for the M68K for many > > years and EW_ARM had the same 'feel'. When it came time to do the > > Linux project, the transition to GCC took MUCH longer than the > > transition from CodeWarrior to EW_ARM. Of course, much of that was in > > setting up a virtual machine on the PC and learning Linux so that I > > could use GCC. > > Learning to develop Linux programs is a lot more than just learning gcc, > as you've found out. > > One gets used to one's tools. I've been using gcc for embedded > development for some 15 years, and have used it on perhaps 8 different > processor architectures. So for me, gcc is always the obvious choice > for new devices, since I am most familiar with it. > > I actually think there is a fair similarity between modern CodeWarrior > and gcc - CW supports many gcc extensions such as the inline assembly > syntax and several attributes. On the IDE side, of course, gcc has no > IDE - it's a compiler. But gcc is often used with Eclipse, which is > what CW now uses for most targets. (The "classic" CW IDE was horrible - > if EW has a similar feel, then I'll remember not to buy it!)
I think I mis-stated part of that. It is really the editor and project file window that are similar between EW and Codewarrior. The other parts of EW are much better with fairly straightforward menus and dialog boxes for setting compiler,linker, and debugger options. I was using Codewarrior to develop code for the Persistor micro data loggers. That was a pretty tightly constrained development environment and Persistor provided most of the setup files and initial project files. If you have to start from scratch, I agree that the old Codewarrior was a true PITA to get set up. The Persistor logger didn't have a true debug capability, so I can't comment on the capabilities of Codewarrior in that regard. I really like the integrated C-Spy debugger in EW, though.
> > > > > One thing that I missed on the Linux project is that I didn't have a > > debugger equivalent to C-Spy that is integrated into EW_ARM. Debugging > > on the Linux system was mostly "Save everything and analyze later". > > > > There are /lots/ of debugging options for Linux development, that can be > much more powerful than C-Spy (depending on the type of programming you > are doing, of course). However, it all involves a lot more learning and > experimenting than the ease-of-use of an integrated debugger in an IDE.
Of of the problems with debugging the Linux-based system was that the code was controlling an autonomous parafoil supply delivery system. When the system was operating, it started 20,000 feet above and several miles away from the programmer! Thus, the "record everything and analyze later paradigm" I did develop a simulated version that grabbed the GPS and other sensor values via hooks and substituted simulated values based on a simple flight model. That sim ran on the target hardware while it sat on the bench. I got really tired of listening to the whining control servos! Unfortunately, the flight model wasn't up to simulating GPS loss, sticky servos, and all the things that can happen to parafoils stressed beyond their flight limits. Flight algorithms were tested in sims running on Borland CPP Builder on a PC. That allowed good debugging, lots of intermediate variable recording and graphic displays. The CPP Builder sims were written to use the same C control code that ran on the target hardware.
> > > > > > > > > Of course, the original poster is discussing the type of code that few > > Linux programmers write----direct interfacing to peripherals. My recent > > experience with Linux and digital cameras was pretty frustrating. I was > > dependent on others to provide the drivers--and they often didn't work > > quite right with the particular camera I was using. That's a story for > > another time, though. > > Indeed. >
Mark Borgerson
On 23/06/12 17:23, hamilton wrote:
> > This discussion is kind of silly. > > Has anyone tried a "simple" program with multiplying two or four > flouting point numbers and displaying the result to an LCD ?? > > As has been mentioned, you don't buy a V8 just to watch the cylinders go > up and down. > > You buy a V8 to use it. > > How much code space would the PIC18 take to multiply two/four floats ?? > > hamilton >
It is certainly true that you can only get a real-world test using real-world code. The trouble is, real-world code is not very suitable for discussing in a newsgroup. So the best we can do is look at some simple sample functions, and work from there. And when a poster is having such trouble generating good code from a simple "a = b + 5" function, it makes sense to start with that and work up. You bring up another point here, of course - while the PIC18 may have compact code for setting a bit or adding a couple of 8-bit numbers, the ARM code will be more compact (and /much/ faster) for multiplying floats. Code comparisons on wildly different architectures are heavily dependent on the sort of code used.
On Jun 24, 1:37=A0am, David Brown <david.br...@removethis.hesbynett.no>
wrote:
>.... > ARM code will be more compact (and /much/ faster) for multiplying > floats. =A0Code comparisons on wildly different architectures are heavily > dependent on the sort of code used.
Hi David, have you tried how fast ARM does at say 64 (or 32) bit MAC in a filter loop? Not so long ago I had to reach the limit for a power CPU (MPC5200B) which is specified at 2 cycles per MAC. Was not trivial at all, doing it in a loop DSP-like took about 10 cycles mainly because of data dependencies. Had to spread things over many registers until I got there, 2.1 cycles (with load/store included). That for a filer with hundreds of taps. How many FP registers do ARM have? It took using 24 out of the 32 to get to 2.1 cycles (although using the same technique over 18 registers yielded about 2.3 cycles, 15 was dramatically worse, data dependencies began to kick in - likely a 6 stage pipeline). Dimiter ------------------------------------------------------ Dimiter Popoff Transgalactic Instruments http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
On 24/06/12 12:19, dp wrote:
> On Jun 24, 1:37 am, David Brown<david.br...@removethis.hesbynett.no> > wrote: >> .... >> ARM code will be more compact (and /much/ faster) for multiplying >> floats. Code comparisons on wildly different architectures are heavily >> dependent on the sort of code used. > > Hi David, > have you tried how fast ARM does at say 64 (or 32) bit MAC in a filter > loop? > Not so long ago I had to reach the limit for a power CPU (MPC5200B) > which is specified at 2 cycles per MAC. Was not trivial at all, doing > it in a loop DSP-like took about 10 cycles mainly because of data > dependencies. Had to spread things over many registers until I got > there, 2.1 cycles (with load/store included). That for a filer with > hundreds of taps. > How many FP registers do ARM have? It took using 24 out of the 32 > to get to 2.1 cycles (although using the same technique over 18 > registers yielded about 2.3 cycles, 15 was dramatically worse, > data dependencies began to kick in - likely a 6 stage pipeline). > > Dimiter >
I haven't tried anything exactly like that (I love doing that sort of thing, but seldom have the need). On the PPC cores I have used (e200z7 recently), it can make a big difference to the speed of the code when things are spread out over many registers. Most PPC cores have quite long pipelines, and some have super-scaler execution, speculative execution or loads, etc., which make it a big challenge getting everything right here. You also need to take into account the cache - getting the computation flow to match the cache flow is vital. In comparison, the Cortex-M0 is very simple. It has pipelining, but not nearly as deep, and it does not have a cache to consider. Some Cortex-M devices have a bit of cache, and may also have tightly-coupled memory, so there you have to consider the flow of data into and out of the cpu core. But I expect it is easier to get close to peak performance from an M0 than from a typical PPC core.
In article <398b2c50-5de0-4eb0-8775-d4a200ad7a30
@a16g2000vby.googlegroups.com>, dp@tgi-sci.com says...
> > On Jun 24, 1:37&#4294967295;am, David Brown <david.br...@removethis.hesbynett.no> > wrote: > >.... > > ARM code will be more compact (and /much/ faster) for multiplying > > floats. &#4294967295;Code comparisons on wildly different architectures are heavily > > dependent on the sort of code used. > > Hi David, > have you tried how fast ARM does at say 64 (or 32) bit MAC in a filter > loop? > Not so long ago I had to reach the limit for a power CPU (MPC5200B) > which is specified at 2 cycles per MAC. Was not trivial at all, doing > it in a loop DSP-like took about 10 cycles mainly because of data > dependencies. Had to spread things over many registers until I got > there, 2.1 cycles (with load/store included). That for a filer with > hundreds of taps. > How many FP registers do ARM have? It took using 24 out of the 32 > to get to 2.1 cycles (although using the same technique over 18 > registers yielded about 2.3 cycles, 15 was dramatically worse, > data dependencies began to kick in - likely a 6 stage pipeline). > > Dimiter > > ------------------------------------------------------ > Dimiter Popoff Transgalactic Instruments > > http://www.tgi-sci.com > ------------------------------------------------------ > http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
I don't think the ARM-CortexM4 chips have 64-bit FPUs and their clock speeds top out at about 160MHz. You're not going to get anything near the MPC5200B performance. The closest you might get with an ARM-based chip might be one of the TI OMAP chips which has a fixed/floating DSP coprocessor. Mark Borgerson

The 2024 Embedded Online Conference