And -- pow! My memory was gone.| page 4

Reply by Tim Wescott ●February 22, 20112011-02-22

On 02/22/2011 11:51 AM, Grant Edwards wrote:
> On 2011-02-22, Tim Wescott<tim@seemywebsite.com>  wrote:
>
>> Mostly I was sharing my amazement about how much of a chunk one
>> (supposedly) itty bitty mathematical function took up, and how much
>> more space the gnu embedded library for the ARM
>
> What is "the Gnu embedded library for the ARM"?
>
> I can think of probably a half-dozen different "libc" implimentations
> I've used with gcc.  Which one are you using?
>
>> takes up than the alternative, commercial, tool.  (With the library
>> sprintf, the thing compiles to something like 78kB, which is a
>> barrier to progress given that the processor in question only has
>> 64kB of flash).
>>
>> I do appreciate your suggestion about -ffastmath.
>
> If you're using glibc and hoping for something not-huge, you're going
> to be pretty frustrated.  uClibc and newlib are less huge and still
> pretty full-featured.  There are others that are smaller-still, but
> they often tend to be specific to a particular architecture, and lack
> some features (e.g. they may not provide any of the calls that use the
> heap).
>
Sorry.  Newlib.  I've always ever just started from bare metal or used 
vendor-provided libraries.  The free software landscape is an 
interesting one (and this is one of the reasons I'm using it for a hobby 
project -- so I'll be familiar with the pitfalls if I use it 
professionally).

If I could figure out how to keep C++ without using the heap, I'd be a 
happy camper.  I'm not even sure if malloc &c. are getting _called_, or 
just pulled in because C++ uses "new", which uses -- well, you get the 
idea.  For embedded I pretty much avoid dynamic deallocation like the 
plague*; while this is against the C++ desktop paradigm, it lets you use 
a very useful subset of the language without a whole 'heap' of trouble.

* Meaning I'll "new" things at system startup but only if they're going 
to live until power-down.  This gives one great flexibility in making 
portable libraries while still not fragmenting the heap through 
new-delete-new-delete-new sequences.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" was written for you.
See details at http://www.wescottdesign.com/actfes/actfes.html

Reply by Tim Wescott ●February 22, 20112011-02-22

On 02/22/2011 12:12 PM, 42Bastian Schick wrote:
> On Tue, 22 Feb 2011 11:05:49 -0800, Tim Wescott<tim@seemywebsite.com>
> wrote:
>
>>> But generally speaking, beware libraries - whether C or C++ - they are
>>> often the cause of extra code space (and wasted run time).
>>
>> There seems to be about 3-4kB of stuff that's specifically associated
>> with C++, starting with malloc (which I know _I'm_ not using) and going
>>from there.
>
> 1st: Be sure to set -fno-exceptions -fno-rtti !

Check!

> 2nd: Check the map-file who pulls in malloc. If you use sprintf, there
> you are.

I should know this -- how do I do that?  I can see it in the symbol 
table, but I don't know how, from the object file, I can see what ended 
up pulling in malloc.

> 3rd: Get the sprintf() out of the newlib sources and make a local copy
> where you remove the un-needed stuff.

That's what I'm doing, to find out that lazily using 'pow' calls in a 
whole bunch-o-memory.

>> gcc 4.5.2 -- is that too out of date?  Maybe there's a beta version that
>> I should be using?
>
> Using the latest build might not always lead to the best results.
> Did you try e.g. CodeSourcery ?

I was having trouble with CodeSourcery several months ago, and on-line 
comments were pointing toward building the latest.

>> I will look into the -ffast-math switch.  Hopefully I can use it without
>> rebuilding the libraries, but I can do that, too.
>
> That'll won't improve the code size. This switch only tells GCC to be
> a bit less strict w.r.t. IEEE754

Hey! It does too -- it holds things down by ten or twenty whole bytes!!

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" was written for you.
See details at http://www.wescottdesign.com/actfes/actfes.html

Reply by Tim Wescott ●February 22, 20112011-02-22

On 02/22/2011 12:05 PM, Arlet Ottens wrote:
> On 02/22/2011 08:05 PM, Tim Wescott wrote:
>
>> Mostly I was sharing my amazement about how much of a chunk one
>> (supposedly) itty bitty mathematical function took up, and how much more
>> space the gnu embedded library for the ARM takes up than the
>> alternative, commercial, tool. (With the library sprintf, the thing
>> compiles to something like 78kB, which is a barrier to progress given
>> that the processor in question only has 64kB of flash).
>
> It makes sense, though. GCC has a large number of target architectures,
> and therefore these libraries have been written in C to make them easier
> to port.
>
> Commercial tools are usually aimed towards a single target, which makes
> it a lot easier to hand optimize the math libs in assembly.

Even hand optimizing in C for a specific processor, or doing mostly C 
with assembly just in the spots where it really matters, can do 
considerable good for both size and run time.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" was written for you.
See details at http://www.wescottdesign.com/actfes/actfes.html

Reply by Tim Wescott ●February 22, 20112011-02-22

On 02/22/2011 12:27 PM, Arlet Ottens wrote:
> On 02/22/2011 09:17 PM, 42Bastian Schick wrote:
>> On Mon, 21 Feb 2011 18:17:38 -0800, Tim Wescott<tim@seemywebsite.com>
>> wrote:
>>
>>> As I usually do, I started easy -- which, in my case, means locating the
>>> decimal point with log10, and moving it around using pow(10.0, n).
>>
>> Table ! Table ! Table !
>>
>> Are you using float or double ?
>>
>> What is the range for<n> ?
>>
>>> 'pow' is over 1kB!! And it pulls in something called __pow_754, which
>>> is HUGE -- like, 5kB or something! Eeek! I mean -- the code worked,
>>> but without much room in the end.
>>
>> Ok. Do the math: 1KB => table for pow(10,0) .. pow(10,255)
>>
>
> Or just multiply by 10 in a loop.

Given that in this particular case the code is servicing a 
maintenance/debug interface that never, ever has to go fast, that's 
pretty much what I'm doing now.

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" was written for you.
See details at http://www.wescottdesign.com/actfes/actfes.html

Reply by glen herrmannsfeldt ●February 22, 20112011-02-22

In comp.dsp 42Bastian Schick <bastian42@yahoo.com> wrote:
(snip)

> Ok. Do the math: 1KB => table for pow(10,0) .. pow(10,255)

Or a table of powers of 1e10 (over the range necessary),
and powers of 10 from 0 to 9.  Then one multiply.
(Useful for decimal conversions.)

Or, two tables based on a power of two division probably
makes even more sense.  Powers of 10 from 0 to 15,
powers of 1e16 from -32 to 31 or so.  Not so may bytes at all.

-- glen

Reply by Arlet Ottens ●February 22, 20112011-02-22

On 02/22/2011 10:17 PM, Tim Wescott wrote:
> On 02/22/2011 12:27 PM, Arlet Ottens wrote:
>> On 02/22/2011 09:17 PM, 42Bastian Schick wrote:
>>> On Mon, 21 Feb 2011 18:17:38 -0800, Tim Wescott<tim@seemywebsite.com>
>>> wrote:
>>>
>>>> As I usually do, I started easy -- which, in my case, means locating
>>>> the
>>>> decimal point with log10, and moving it around using pow(10.0, n).
>>>
>>> Table ! Table ! Table !
>>>
>>> Are you using float or double ?
>>>
>>> What is the range for<n> ?
>>>
>>>> 'pow' is over 1kB!! And it pulls in something called __pow_754, which
>>>> is HUGE -- like, 5kB or something! Eeek! I mean -- the code worked,
>>>> but without much room in the end.
>>>
>>> Ok. Do the math: 1KB => table for pow(10,0) .. pow(10,255)
>>>
>>
>> Or just multiply by 10 in a loop.
>
> Given that in this particular case the code is servicing a
> maintenance/debug interface that never, ever has to go fast, that's
> pretty much what I'm doing now.
>

For practical values of 'n' (less than 10), it probably runs faster than 
pow() anyway.

Reply by Jon Kirwan ●February 22, 20112011-02-22

On Tue, 22 Feb 2011 13:12:58 -0800, Tim Wescott
<tim@seemywebsite.com> wrote:

>On 02/22/2011 12:05 PM, Arlet Ottens wrote:
>> On 02/22/2011 08:05 PM, Tim Wescott wrote:
>>
>>> Mostly I was sharing my amazement about how much of a chunk one
>>> (supposedly) itty bitty mathematical function took up, and how much more
>>> space the gnu embedded library for the ARM takes up than the
>>> alternative, commercial, tool. (With the library sprintf, the thing
>>> compiles to something like 78kB, which is a barrier to progress given
>>> that the processor in question only has 64kB of flash).
>>
>> It makes sense, though. GCC has a large number of target architectures,
>> and therefore these libraries have been written in C to make them easier
>> to port.
>>
>> Commercial tools are usually aimed towards a single target, which makes
>> it a lot easier to hand optimize the math libs in assembly.
>
>Even hand optimizing in C for a specific processor, or doing mostly C 
>with assembly just in the spots where it really matters, can do 
>considerable good for both size and run time.

This is what crossed my mind, Tim, in your case.  I've
written custom log functions which execute more quickly than
an FP div and take up rather little space on an 8051, for
example.  I don't even blink about such things; just set down
and get 'er done and then run some tests against it in
simulation to make sure it works over the range of inputs.

For specialized cases like this, I'd write specialized code.
There are a few handy books to have about, which I suspect
you may already have.  I'll mention a few that may bear
directly on the topic.  I think PJ Plauger has written a book
on the c library that is worth having around (nicely exposes
all of those sprintf details, for example.)  I also enjoy
having Analog Device's numerical approximation books they
developed for the ADSP-21xx family (the black, hard-bound
book in particular would be good here.)  They used to give
them away for free (as well as sell them to those who didn't
bother to ask for a free copy and took the still easier way
out and wrote checks.)  Although there are some errors in it,
another nice book with a different kind of segue is "Math
Toolkit for Real-Time Programming," by Jack W. Crenshaw.  And
everyone MUST have an edition or two of "Numerical Recipes."

Jon

P.S. 
Not really related to the above issue but which come to mind
right now because they helped me a lot:

* E Oran Brigham's "The Fast Fourier Transform" (I use the
1974 first edition) -- the one book for FFTs that in my
estimation stands above all others I've experienced.

* Ronald Graham, Donald Knuth, and Oren Patashnik, "Concrete
Mathematics" -- a unique place for this book.  Nothing else
quite like it and a must-have, I think.

* Nicholas John Loy's "An Engineer's Guide to FIR Digital
Filters" -- a very intuitive approach to understanding FIR
filters and applications of them.

Reply by ●February 22, 20112011-02-22

On 22.02.2011 22:17, Tim Wescott wrote:
> On 02/22/2011 12:27 PM, Arlet Ottens wrote:
>> On 02/22/2011 09:17 PM, 42Bastian Schick wrote:
>>> On Mon, 21 Feb 2011 18:17:38 -0800, Tim Wescott<tim@seemywebsite.com>
>>> wrote:

>>>> 'pow' is over 1kB!! And it pulls in something called __pow_754,
>>>> which is HUGE -- like, 5kB or something!

So what?  Consider yourself lucky it's not even bigger than that.

Seriously, though, setting aside the *printf() and *scanf() families 
themselves, pow() is easily the most complicated function in the entire 
Standard C Library, even with hardware FPU support.  Without it, it can 
easily be twice as large as that if you want to really cover all the 
corner cases (IEEE754 special data formats, getting it right even at the 
boundaries of precision or range, ...).

>> Or just multiply by 10 in a loop.
>
> Given that in this particular case the code is servicing a
> maintenance/debug interface that never, ever has to go fast, that's
> pretty much what I'm doing now

Oh, come on guys, you can do better than that, even under code size 
pressure.  Has everybody already forgot about iterated squaring and 
multiplying?  In a nutshell:

	while exponent left
	  if exponent uneven:
	      multiply base into result
           right-shift exponent by one
	  replace base by its square

Hardly any more complex than iterated multiply, but O(log(exponent)) 
instead of O(exponent).

Or do yourself a favour and use hexadecimal floating point format 
instead.  _Much_ easier on the binary ALU, standardized format, and 
still (marginally) human-readable.

Reply by David Brown ●February 22, 20112011-02-22

On 22/02/11 20:05, Tim Wescott wrote:
> On 02/22/2011 12:15 AM, David Brown wrote:
>> On 22/02/2011 03:17, Tim Wescott wrote:
>>> I'm shaking the box, trying to get the software to settle enough to fit
>>> into a 64kB memory space. This is made difficult by the fact that it's a
>>> human-interface rich environment with a bit of extra fat contributed by
>>> the fact that I'm writing in C++.
>>>
>>
>> C++ doesn't necessarily add fat - it depends on how you use it. If
>> you've got exceptions and RTTI enabled, expect code to be bigger and
>> slower. If you use the STL, expect code to be a /lot/ bigger. If you use
>> a lot of virtual stuff, it may be a lot bigger (depending on the
>> processor) - but it may be smaller than alternative C-style solutions to
>> the same issues (such as with explicit function pointers).
>>
>> But generally speaking, beware libraries - whether C or C++ - they are
>> often the cause of extra code space (and wasted run time).
>
> There seems to be about 3-4kB of stuff that's specifically associated
> with C++, starting with malloc (which I know _I'm_ not using) and going
> from there.
>

Cross-reference outputs from the linker step can be very useful in 
tracing why library code has been linked in.  But one of the unfortunate 
aspects of C++ libraries with all their templates is the horrendous 
mangled names that make this task difficult.

>>> This is for an application that worked just peachy-keen on a captive
>>> vendor's tool set, but which is too big using gnu tools. It's to be
>>> expected, I suppose.
>>>
>>
>> You /expect/ that the gnu tools should make your code much bigger? There
>> are some targets for which there are proprietary tools that are much
>> better than open source alternatives. But on most major gcc targets, the
>> compiler is on a par with the commercial tools in terms of code size and
>> space. Sure, there will be some variation - but not as large as you are
>> implying here.
>>
>> So what is the target?
>
> Cortex M3
>

OK - useful to know.

>> And what are the libraries? From your post, it looks like the libraries
>> are your main issue - and here there can be a lot bigger differences.
>
> Newlib.
>

OK.  Newlib can be big - it has a lot of features, but it's not the 
smallest library around.

>> And what about tool versions? Where did you get them? There a lot of
>> websites offering downloads of embedded gcc toolchains that are many
>> years out of date.
>
> gcc 4.5.2 -- is that too out of date? Maybe there's a beta version that
> I should be using?
>

No, gcc 4.5 is the current version.  So no problem there.

> This is a fresh build using the "summon-arm-toolchain" script that seems
> to be one's best bet for getting something that'll work on the Cortex M3.
>

There are /many/ ways to get hold of a gcc toolchain, especially for a 
popular target like the ARM.  For most people, it is best to get a 
pre-packaged setup.  There are free builds, and commercial builds, with 
different pros and cons.  For example, Rowley sell a package with their 
own library, which may suit you better.  Personally, I like CodeSourcery 
- they are the people who do most of the work on the ARM port of gcc, 
and you can get packages ranging from free to fairly expensive depending 
on the support packages, additional libraries, integration with Eclipse, 
etc.  The paid-for packages are /much/ cheaper than Kiel or IAR - they 
are well within the reach of most hobbyists.

>> And how are you using the tools? Are you using appropriate compiler
>> flags? (The manual has lots of information there, but you can also ask
>> here on c.a.e. or in mailing lists.) For example, for bigger targets,
>> the floating point code will often be IEEE standard - because that's the
>> C standard behaviour. But it means much bigger and slower code than if
>> you ignore the possibility of NaNs and allow slightly loser rules about
>> rounding, etc. You have to use the "-ffast-math" switch to get faster
>> but non-IEEE behaviour - many commercial compilers work like that by
>> default.
>
> I will look into the -ffast-math switch. Hopefully I can use it without
> rebuilding the libraries, but I can do that, too.
>

You shouldn't need to rebuild the libraries.  However, I can't give a 
categoric answer off-hand.  But -ffast-math can make a big difference to 
floating point code.  Obviously you'll also want to use at least -Os 
optimisation for your code too.  And if you want the smallest and 
fastest code, look into the LTO flags for doing full program 
optimisation (though it makes debugging a lot harder).

>>> As soon as I saw all of the I/O junk in the map file I realized that
>>> maybe I shouldn't be using sprintf (even if it _did_ fit before). So I
>>> took it out, and rather than contort my code, I'm writing a replacement
>>> for just the formats that I'm using (yes, I know, bomb in the code).
>>>
>>> As I usually do, I started easy -- which, in my case, means locating the
>>> decimal point with log10, and moving it around using pow(10.0, n).
>>>
>>
>> Are you trying to display floating point data with decimals? There are
>> /much/ more efficient ways to do that than using log10! Let us know your
>> real problem, rather than the issues you are having with your solution,
>> and you can get better help.
>
> I _know_ there's more efficient ways! Like I said -- I started out with
> 'easy', and planned on working from there.
>

Maybe we are looking at this from different angles - using log10 when 
displaying a floating point number sounds like a /hard/ way to do it!

>>> 'pow' is over 1kB!! And it pulls in something called __pow_754, which is
>>> HUGE -- like, 5kB or something! Eeek! I mean -- the code worked, but
>>> without much room in the end.
>>>
>>
>> This is a library issue.
>
> Obviously.
>
>> What you are seeing is a large general-purpose
>> routine, rather than smaller and more specialised functions. By the
>> name, it is probably the IEEE-754 version of the code and will work
>> correctly even when given awkward data (i.e., it will generate the
>> specified NaNs, etc.). It is probably also a double precision version,
>> which might not be what you need. The can probably be mitigated to some
>> extent by using -ffast-math, and by using float instead of double (if
>> appropriate). But in the end, it might just be a design choice of the
>> library - larger but more general functions are more efficient for
>> larger programs but are less efficient for smaller code.
>
> Ayup. And I'm not terribly inclined to rewrite the library, so will
> probably go with working around the problems.
>
>>> So I'm un-powing my sprintf, shaking my head at the vicissitudes of free
>>> software, and (since it's a hobby project) thinking about processors
>>> with more memory space.
>>>
>>
>> A vast amount of embedded development is done using free software, but
>> it is not always the best choice. /Exactly/ the same applies to
>> commercial tools. You have to pick appropriate tools for the task in
>> hand, and know how to use them effectively.
>
> Well yes. In this case, the 'different bucket' principal applies --
> one's budget for profit-generating work has a few more zeros at the end
> than one's budget for playing around.
>

Of course, you get to learn a lot more when playing around (my boss 
always worries if I say a prospective project sounds "interesting", or 
"a chance to learn").

>> And you have to have tools that suit you, and the way you work. For some
>> people, they like the way particular commercial tools work, and can't
>> get their heads round gcc - they see it as some sort of refugee from the
>> DOS days. Other people find most commercial tools awkward and
>> inconvenient compared to the streamlined ease and consistency of gcc -
>> they see commercial IDEs as too much lipstick on a pig. It's a matter of
>> viewpoint, habit, experience and personal preference.
>
> I vastly prefer the command-line tool approach. If I can't go to a
> command line and type 'make' to kick things off, then I haven't finished
> the job yet. Using an IDE to do your thinking for you is just an
> invitation to broken code in a month or two, and an endless life of
> never knowing if you've gotten all of the critical files archived.
>

I agree with that.  Most tools at least let you use a command line, but 
I've used some commercial tools that will only run from within their 
IDE, which is invariably a terrible editor.

>> But no matter what tools you are using, you'll get on a lot better if
>> you ask for help with your problems, and give specifics, rather than
>> whining that a toolchain you got for free doesn't work in exactly the
>> same way as a toolchain that costs a lot of money.
>
> Mostly I was sharing my amazement about how much of a chunk one
> (supposedly) itty bitty mathematical function took up, and how much more
> space the gnu embedded library for the ARM takes up than the
> alternative, commercial, tool. (With the library sprintf, the thing
> compiles to something like 78kB, which is a barrier to progress given
> that the processor in question only has 64kB of flash).
>
> I do appreciate your suggestion about -ffastmath.
>

Also remember to avoid doubles unless you really need them - they are a 
lot slower than floats, and need larger library code.

I'll try and have a look tomorrow at a project I have on the M3 using 
CodeSourcery's tools - I /think/ it uses snprintf (it's not my code, so 
I don't know it all).

Reply by Tim Wescott ●February 22, 20112011-02-22

On 02/22/2011 03:51 PM, Hans-Bernhard Br&#4294967295;ker wrote:
> On 22.02.2011 22:17, Tim Wescott wrote:
>> On 02/22/2011 12:27 PM, Arlet Ottens wrote:
>>> On 02/22/2011 09:17 PM, 42Bastian Schick wrote:
>>>> On Mon, 21 Feb 2011 18:17:38 -0800, Tim Wescott<tim@seemywebsite.com>
>>>> wrote:
>
>>>>> 'pow' is over 1kB!! And it pulls in something called __pow_754,
>>>>> which is HUGE -- like, 5kB or something!
>
> So what? Consider yourself lucky it's not even bigger than that.
>
> Seriously, though, setting aside the *printf() and *scanf() families
> themselves, pow() is easily the most complicated function in the entire
> Standard C Library, even with hardware FPU support. Without it, it can
> easily be twice as large as that if you want to really cover all the
> corner cases (IEEE754 special data formats, getting it right even at the
> boundaries of precision or range, ...).
>
>>> Or just multiply by 10 in a loop.
>>
>> Given that in this particular case the code is servicing a
>> maintenance/debug interface that never, ever has to go fast, that's
>> pretty much what I'm doing now
>
> Oh, come on guys, you can do better than that, even under code size
> pressure. Has everybody already forgot about iterated squaring and
> multiplying? In a nutshell:
>
> while exponent left
> if exponent uneven:
> multiply base into result
> right-shift exponent by one
> replace base by its square
>
> Hardly any more complex than iterated multiply, but O(log(exponent))
> instead of O(exponent).
>
> Or do yourself a favour and use hexadecimal floating point format
> instead. _Much_ easier on the binary ALU, standardized format, and still
> (marginally) human-readable.

Well...

I misnomered* that.  The use case for the system as a whole is to be a 
control systems training platform for embedded software guys.  It's just 
closing a loop at 100Hz or a kHz.  The whole thing could be done in 
integer or other fixed point math, with very light loading on the 
processor.  But I want people to be able to focus on the control systems 
part of things, not picking through all the fussy little details that it 
would make sense to pick through for a true production machine.

So all the math is done in floating point, and presented that way to the 
outside world.  In order to do that, I use a processor that has an 
excess of resources -- except flash.  That'll be fixed when I stop using 
eval boards and actually put processors down on my own board.

* Dang -- 'misnomered' is actually in my spell checker!

-- 

Tim Wescott
Wescott Design Services
http://www.wescottdesign.com

Do you need to implement control loops in software?
"Applied Control Theory for Embedded Systems" was written for you.
See details at http://www.wescottdesign.com/actfes/actfes.html