In article <svCdnfOCcv-WvHnSnZ2dnUVZ8rednZ2d@lyse.net>,
david@westcontrol.removethisbit.com says...
>
> On 22/06/2012 02:42, Mark Borgerson wrote:
> > In article <9b55cce9-96db-46f4-909a-1f6500deb237
> > @j9g2000vbk.googlegroups.com>, peter_gotkatov@supergreatmail.com says...
> >>
> >> On Jun 21, 4:09 am, David Brown <da...@westcontrol.removethisbit.com>
> >> wrote:
> >>> On 20/06/2012 21:21, peter_gotka...@supergreatmail.com wrote:
> >>> It is not a surprise that the code
> >>> size has increased in moving from the PIC18 - differences here will vary
> >>> wildly according to the type of code. But it /is/ a surprise that you
> >>> only have 2K of startup, vector tables, and library code.
> >>
> >> Not all that surprising, here are the sizes in bytes:
> >> .vectors 304
> >> .init 508
> >> __putchar 40
> >> __vprintf 1498
> >> memcpy 56
> >>
> >>>> Instead of a macro I thought about making an ASM inline function that
> >>>> would use the CLZ instruction to do this efficiently but for some
> >>>> reason gcc didn't want to inline any of my functions (C or ASM) in
> >>>> debug mode so I just gave up at that point.
> >>>
> >>> First off, you should not need to resort to assembly to get basic
> >>> instructions working - the compiler should produce near-optimal code as
> >>> long as you let it (by enabling optimisations and writing appropriate C
> >>> code).
> >>
> >> I've tried several ways of writing a counleadingzeroes() function that
> >> would use the Cortex CLZ instruction but even with optimization turned
> >> on it still wouldn't do it.
> >>
> >>> Secondly, don't use "ASM functions" - they are normally only needed by
> >>> more limited compilers. If you need to use assembly with gcc, use gcc's
> >>> extended "asm" syntax.
> >>
> >> There are some things like the bootloader that need to be ASM
> >> functions in their own separate .S file anyway since they need to copy
> >> portions of themselves to RAM in order to execute. But a bootloader is
> >> a special case and I do agree that normal code shouldn't need to rely
> >> on ASM functions. I must say I'm not familiar with gcc's extended asm
> >> syntax and although I did look at it briefly it seemed like it was
> >> more complicated than a plain old .S file and it was mostly geared
> >> towards mixing C and ASM together in the same function and accessing
> >> variables by name etc. Not something I needed for a simple bootloader.
> >>
> >>> Finally, if you are not getting inlining when debugging it is because
> >>> you have got incorrect compiler switches. You should not have different
> >>> "debug" and "release" (or "optimised") builds - do a single build with
> >>> the proper optimisation settings (typically -Os unless you know what you
> >>> are doing) and "-g" to enable debugging. You never want to be releasing
> >>> code that is built differently from the code you debugged.
> >>
> >> I was fighting with this for a while when I was first handed this
> >> toolchain, and it seems that in debug mode, there is no -O switch at
> >> all and in release mode it defaults to -O1. When I change this -Os it
> >> does produce the same code as the sample that you posted above from
> >> gcc 4.6.1 (mine is 4.4.4 by the way). However even with manually
> >> adding the -g switch I still don't get source annotation in the ELF
> >> file unless I use debug mode. This effectively limits any development/
> >> debugging to unoptimized code, which still has to fit into the 256K
> >> somehow.
> >>
> >> As for using register keywords and accessing globals through pointers,
> >> I normally don't do this (haven't used the register keyword in years)
> >> and I certainly wouldn't be doing it at all if it didn't have such a
> >> significant effect on the code size:
> >>
> >> unsigned long a,b;
> >>
> >> void test(void) {
> >> B4B0 push {r4-r5, r7}
> >> AF00 add r7, sp, #0
> >> ----------------------------------------
> >> register unsigned long x, y;
> >> a=b+5;
> >> F2402314 movw r3, #0x214
> >> F2C20300 movt r3, #0x2000
> >> 681B ldr r3, [r3, #0]
> >> F1030205 add.w r2, r3, #5
> >> F240231C movw r3, #0x21C
> >> F2C20300 movt r3, #0x2000
> >> 601A str r2, [r3, #0]
> >> ----------------------------------------
> >> x=y+5;
> >> F1040505 add.w r5, r4, #5
> >> ----------------------------------------
> >> }
> >> 46BD mov sp, r7
> >> BCB0 pop {r4-r5, r7}
> >> 4770 bx lr
> >> BF00 nop
> >
> > (36 bytes of code)
> >
> > I'd be surprised that your compiler did not complain about x and y not
> > being initialized before the addition. EW ARM warned me that y was used
> > before its value was set and that x was set but never used.
> >
> > Here is the code it gave me----with some extra comments added afterwards
> >
> >
> > 111 void test(void){
> > 112 register unsigned long x,y;
> > 113 a = b+5;
> > \ test:
> > \ 00000000 0x.... LDR.N R1,??DataTable8_6
> > \ 00000002 0x6809 LDR R1,[R1, #+0]
> > \ 00000004 0x1D49 ADDS R1,R1,#+5
> > \ 00000006 0x.... LDR.N R2,??DataTable8_7
> > \ 00000008 0x6011 STR R1,[R2, #+0]
> > 114 x = y+5;
> > \ 0000000A 0x1D40 ADDS R0,R0,#+5 // R0 is y
> > // sum not saved
> > 115 }
> > \ 0000000C 0x4770 BX LR ;; return
> > 116
> > (14 bytes of code)
> >
> > Apparently, R0, R1, R2 are scratch registers for IAR and don't need to
> > be saved and restored.
> >
> > Adding actual initialization to x and y and saving the result in b
> > produced the following:
> >
> > In section .text, align 2, keep-with-next
> > 110 void test(void){
> > 111 register unsigned long x=3,y=4;
> > \ test:
> > \ 00000000 0x2003 MOVS R0,#+3
> > \ 00000002 0x2104 MOVS R1,#+4
> > 112 a = b+5;
> > \ 00000004 0x.... LDR.N R2,??DataTable8_6
> > \ 00000006 0x6812 LDR R2,[R2, #+0]
> > \ 00000008 0x1D52 ADDS R2,R2,#+5
> > \ 0000000A 0x.... LDR.N R3,??DataTable8_7
> > \ 0000000C 0x601A STR R2,[R3, #+0]
> > 113 x = y+5;
> > \ 0000000E 0x1D49 ADDS R1,R1,#+5
> > \ 00000010 0x0008 MOVS R0,R1 // x = sum
> > 114 b = x; // this time save the result
> > \ 00000012 0x.... LDR.N R1,??DataTable8_6
> > \ 00000014 0x6008 STR R0,[R1, #+0]
> > 115 }
> > \ 00000016 0x4770 BX LR ;; return
> > 116
> >
> > Still accomplished with scratch registers----no need to save any on the
> > stack. I changed from my default optimization of 'low' to 'none'
> > and got exactly the same code.
> >
> > Finally, I took out the 'register' key word before x and y----and
> > got exactly the same result as above.
> >
> > It seems that GCC just doesn't match up to IAR at producing compact
> > code at low optimization levels. OTOH, given that EW_ARM costs
> > several KBucks, it SHOULD do better!
> >
> >
>
> The problems here don't lie with the compiler - they lie with the user.
> I'm sure that EW_ARM produces better code than gcc (correctly used) in
> some cases - but I am also sure that gcc can do better than EW_ARM in
> other cases. I really don't think there is going to be a big difference
> in code generation quality - if that's why you paid K$ for EW, you've
> probably wasted your money. There are many reasons for choosing
> different toolchains, but generally speaking I don't see a large
> difference in code generation quality between the major toolchains
> (including gcc) for 32-bit processors. Occasionally you'll see major
> differences in particular kinds of code, but for the most part it is the
> user that makes the biggest difference.
>
> On place where EW_ARM might score over the gcc setup this user has (he
> hasn't yet said anything about the rest - is it home-made, CodeSourcery,
> Code Red, etc.?) is that EW_ARM might make it easier to get the compiler
> switches correct, and avoid this "I don't know how to enable debugging
> and optimisation" or "what's a warning?" nonsense.
One of the reasons I like the EW_ARM system is that the IDE handles all
the compiler and linker flags with a pretty good GUI. You can override
the GUI options with #pragma statements in the code----which I haven't
found reason to do for the most part.
>
>
> It hardly needs saying, but when run properly, my brief test with gcc
> produces the same code here as you get with EW_ARM, and the same
> warnings about x and y.
>
That's comforting in a way. While I now use EW_ARM for most of my
current projects, I spent about 5 years using GCC_ARM on a project
based on Linux. I would hate to think that I was producing crap code
all that time! I had some experienced Linux users to set up my dev
system and show me how to generate good make files, so I probably
got pretty good results there.
I'm using EW_ARM for projects that don't have the resources of a
Linux OS, and I prefer it for these projects.
>
> I'm sure that EW_ARM has a similar option, but gcc has a "-fno-common"
> switch to disable "common" sections. With this disabled, definitions
> like "unsigned long a, b;" can only appear once in the program for each
> global identifier, and the space is allocated directly in the .bss
> inside the module that made the definition. gcc can use this extra
> information to take advantage of relative placement between variables,
> and generate addressing via section anchors:
>
>
> Command line:
> arm-none-eabi-gcc -mcpu=cortex-m3 -mthumb -S testcode.c -Wall -Os
> -fno-common
>
>
> test:
> @ args = 0, pretend = 0, frame = 0
> @ frame_needed = 0, uses_anonymous_args = 0
> @ link register save eliminated.
> ldr r3, .L6
> ldr r0, [r3, #4]
> adds r2, r0, #5
> str r2, [r3, #0]
> bx lr
> .L7:
> .align 2
> .L6:
> .word .LANCHOR0
> .size test, .-test
> .global b
> .global a
> .bss
> .align 2
> .set .LANCHOR0,. + 0
> .type a, %object
> .size a, 4
> a:
> .space 4
> .type b, %object
> .size b, 4
> b:
> .space 4
> .ident "GCC: (Sourcery CodeBench Lite 2011.09-69) 4.6.1"
>
>
> It's all about learning to use the tools you have, rather than buying
> more expensive tools.
Which reminds me----when counting bytes in code like this, it's easy to
forget the bytes used in the constant tables that provide the addresses
of variables. A 16-bit variable may require a 32-bit table entry.
I started with EW_ARM about three years before I started on the Linux
project. The original compiler was purchased by the customer---who had
no preferences, but was developing a project with fairly limited
hardware resources. They asked what compiler I'd like and I picked EW-
ARM. At that time, I'd been using CodeWarrior for the M68K for many
years and EW_ARM had the same 'feel'. When it came time to do the
Linux project, the transition to GCC took MUCH longer than the
transition from CodeWarrior to EW_ARM. Of course, much of that was in
setting up a virtual machine on the PC and learning Linux so that I
could use GCC.
One thing that I missed on the Linux project is that I didn't have a
debugger equivalent to C-Spy that is integrated into EW_ARM. Debugging
on the Linux system was mostly "Save everything and analyze later".
Of course, the original poster is discussing the type of code that few
Linux programmers write----direct interfacing to peripherals. My recent
experience with Linux and digital cameras was pretty frustrating. I was
dependent on others to provide the drivers--and they often didn't work
quite right with the particular camera I was using. That's a story for
another time, though.
>
> mvh.,
>
> David
Mark Borgerson