EmbeddedRelated.com
Forums
Memfault State of IoT Report

And -- pow! My memory was gone.

Started by Tim Wescott February 21, 2011
On 22/02/11 22:12, Tim Wescott wrote:
> On 02/22/2011 12:05 PM, Arlet Ottens wrote: >> On 02/22/2011 08:05 PM, Tim Wescott wrote: >> >>> Mostly I was sharing my amazement about how much of a chunk one >>> (supposedly) itty bitty mathematical function took up, and how much more >>> space the gnu embedded library for the ARM takes up than the >>> alternative, commercial, tool. (With the library sprintf, the thing >>> compiles to something like 78kB, which is a barrier to progress given >>> that the processor in question only has 64kB of flash). >> >> It makes sense, though. GCC has a large number of target architectures, >> and therefore these libraries have been written in C to make them easier >> to port. >> >> Commercial tools are usually aimed towards a single target, which makes >> it a lot easier to hand optimize the math libs in assembly. > > Even hand optimizing in C for a specific processor, or doing mostly C > with assembly just in the spots where it really matters, can do > considerable good for both size and run time. >
Once you are at the stage of a 32-bit processor, then you are generally unlikely to do much better by specialising your library code per processor, or using assembly. It makes a much bigger difference for smaller targets. The exception is if you are able to make use of particular odd instructions or capabilities that your processor has, but that the compiler cannot generate automatically.
On 02/23/2011 01:04 AM, David Brown wrote:
> On 22/02/11 22:12, Tim Wescott wrote: >> On 02/22/2011 12:05 PM, Arlet Ottens wrote: >>> On 02/22/2011 08:05 PM, Tim Wescott wrote: >>> >>>> Mostly I was sharing my amazement about how much of a chunk one >>>> (supposedly) itty bitty mathematical function took up, and how much >>>> more >>>> space the gnu embedded library for the ARM takes up than the >>>> alternative, commercial, tool. (With the library sprintf, the thing >>>> compiles to something like 78kB, which is a barrier to progress given >>>> that the processor in question only has 64kB of flash). >>> >>> It makes sense, though. GCC has a large number of target architectures, >>> and therefore these libraries have been written in C to make them easier >>> to port. >>> >>> Commercial tools are usually aimed towards a single target, which makes >>> it a lot easier to hand optimize the math libs in assembly. >> >> Even hand optimizing in C for a specific processor, or doing mostly C >> with assembly just in the spots where it really matters, can do >> considerable good for both size and run time. >> > > Once you are at the stage of a 32-bit processor, then you are generally > unlikely to do much better by specialising your library code per > processor, or using assembly. It makes a much bigger difference for > smaller targets. > > The exception is if you are able to make use of particular odd > instructions or capabilities that your processor has, but that the > compiler cannot generate automatically.
Those odd instructions are typically something you can use in math operations. Often a CPU will have math instructions that are useful, but don't align directly with C conventions, such as non-standard multiply/divide bit sizes. Also, assembly will give you direct access to carry and overflow flags, and other odd instructions, such as count-leading-zeroes. A good C compiler will also provide ways to gain access to those, but may be at a price of having to write convoluted code, and frequently checking assembler output. At that point, it's easier just to write the assembly yourself. I'm not advocating that the average user go write his own math lib in assembly, but for a vendor trying to sell a high-performance toolchain, hiring an expert to optimize the last cycle out of math code can be worth it.
Tim Wescott wrote:

> If I could figure out how to keep C++ without using the heap, I'd be a > happy camper.
Don't use 'new'. Put your objects on the stack and let subroutines use them through references as function arguments, e.g.: static int foo(MyClass& c) { c.do_something(); } int main() { int some_data; ... some code ... MyClass my_object(some_data); ... some code ... foo(my_object); ... some code ... }
> I'm not even sure if malloc &c. are getting _called_, or > just pulled in because C++ uses "new", which uses -- well, you get the > idea.
If your classes don't call new and your functions never call new and if you don't use classes from other libraries then malloc() should not be linked in. That's theory though :-(
> For embedded I pretty much avoid dynamic deallocation like the > plague*; while this is against the C++ desktop paradigm, it lets you use > a very useful subset of the language without a whole 'heap' of trouble. > > * Meaning I'll "new" things at system startup but only if they're going > to live until power-down. This gives one great flexibility in making > portable libraries while still not fragmenting the heap through > new-delete-new-delete-new sequences.
For this approach (and placement new, see http://en.wikipedia.org/wiki/Placement_syntax) malloc is linked in. bye Andreas -- Andreas H�nnebeck | email: acmh@gmx.de ----- privat ---- | www : http://www.huennebeck-online.de Fax/Anrufbeantworter: 0721/151-284301 GPG-Key: http://www.huennebeck-online.de/public_keys/andreas.asc PGP-Key: http://www.huennebeck-online.de/public_keys/pgp_andreas.asc
On 23/02/2011 07:59, Arlet Ottens wrote:
> On 02/23/2011 01:04 AM, David Brown wrote: >> On 22/02/11 22:12, Tim Wescott wrote: >>> On 02/22/2011 12:05 PM, Arlet Ottens wrote: >>>> On 02/22/2011 08:05 PM, Tim Wescott wrote: >>>> >>>>> Mostly I was sharing my amazement about how much of a chunk one >>>>> (supposedly) itty bitty mathematical function took up, and how much >>>>> more >>>>> space the gnu embedded library for the ARM takes up than the >>>>> alternative, commercial, tool. (With the library sprintf, the thing >>>>> compiles to something like 78kB, which is a barrier to progress given >>>>> that the processor in question only has 64kB of flash). >>>> >>>> It makes sense, though. GCC has a large number of target architectures, >>>> and therefore these libraries have been written in C to make them >>>> easier >>>> to port. >>>> >>>> Commercial tools are usually aimed towards a single target, which makes >>>> it a lot easier to hand optimize the math libs in assembly. >>> >>> Even hand optimizing in C for a specific processor, or doing mostly C >>> with assembly just in the spots where it really matters, can do >>> considerable good for both size and run time. >>> >> >> Once you are at the stage of a 32-bit processor, then you are generally >> unlikely to do much better by specialising your library code per >> processor, or using assembly. It makes a much bigger difference for >> smaller targets. >> >> The exception is if you are able to make use of particular odd >> instructions or capabilities that your processor has, but that the >> compiler cannot generate automatically. > > Those odd instructions are typically something you can use in math > operations. Often a CPU will have math instructions that are useful, but > don't align directly with C conventions, such as non-standard > multiply/divide bit sizes. Also, assembly will give you direct access to > carry and overflow flags, and other odd instructions, such as > count-leading-zeroes. > > A good C compiler will also provide ways to gain access to those, but > may be at a price of having to write convoluted code, and frequently > checking assembler output. At that point, it's easier just to write the > assembly yourself. >
Sometimes the good C compiler will generate odd instructions automatically - in which case it is better to write the more "natural" C code, as this gives the compiler the best chance of generating even better code. You'll want to check the generated assembly, of course. However, although compilers have got better at this sort of thing, they are far from perfect. They are also hampered by the limitations of C. For example, there is no good way to write a "rol" or "ror" instruction in C, and few if any compilers would generate it from raw C. I agree with you that writing the assembly by hand is sometimes the easiest and clearest way - "intrinsic" C function wrappers around odd instructions are okay for an instruction or two, but quickly become messy. A spot of inline assembly can be much clearer than convoluted pseudo-C functions.
> I'm not advocating that the average user go write his own math lib in > assembly, but for a vendor trying to sell a high-performance toolchain, > hiring an expert to optimize the last cycle out of math code can be > worth it. >
It can be worth the effort - but not always. You (or the compiler vendor) also has to take into account things like maintenance, testability, and stability in the face of changes to the compiler. It is more important to be correct, and to be sure that your code stays correct, than to be small and fast. There are also different balances, which makes things difficult - often you can write code that is small /and/ fast, but sometimes it's a choice of small /or/ fast. Then there is accuracy and conformance to standards - do you write your libraries to conform exactly to the standards (for IEEE-754, this means special handling of NaNs, signed zeros, etc., which often mean a hardware FPU unit is of only limited use), or do you write your libraries to conform to what your customers actually need (embedded systems seldom need NaNs, etc.)? Sometimes it /is/ worth writing your own maths code (though it is seldom worth doing it in assembly). For example, it is typically not hard to write a sine function that is smaller and much faster than the maths library sinf(), and yet which is accurate enough for your motor controller or other application.
On 23/02/2011 09:20, Andreas Huennebeck wrote:
> Tim Wescott wrote: > >> If I could figure out how to keep C++ without using the heap, I'd be a >> happy camper. > > Don't use 'new'. Put your objects on the stack and let subroutines use > them through references as function arguments, e.g.: > > static int foo(MyClass& c) > { > c.do_something(); > } > > int main() > { > int some_data; > ... some code ... > MyClass my_object(some_data); > ... some code ... > foo(my_object); > ... some code ... > } > >> I'm not even sure if malloc&c. are getting _called_, or >> just pulled in because C++ uses "new", which uses -- well, you get the >> idea. > > If your classes don't call new and your functions never call new and if you > don't use classes from other libraries then malloc() should not be linked in. > That's theory though :-( > >> For embedded I pretty much avoid dynamic deallocation like the >> plague*; while this is against the C++ desktop paradigm, it lets you use >> a very useful subset of the language without a whole 'heap' of trouble. >> >> * Meaning I'll "new" things at system startup but only if they're going >> to live until power-down. This gives one great flexibility in making >> portable libraries while still not fragmenting the heap through >> new-delete-new-delete-new sequences. > > For this approach (and placement new, see http://en.wikipedia.org/wiki/Placement_syntax) > malloc is linked in. >
It is also possible to use static objects that get constructed before main() - this will not lead to any mallocs and will also allow absolute addressing modes which are more efficient on some processors. Of course, you then have to be sure of the ordering of construction of these static objects (or use a two-part constructor and init method).
Mel <mwilson@the-wire.com> writes:

> And somebody's delay routine brought in the floating-point package > so they could take their parameter in seconds.
That's what you get for neither reading the documentation, nor obeying the #warnings that are printed. The floating-point calculations are done at *compile time* only, and in order for them to be optimized away, well, you have to enable optimizations. But that's all documented ... -- Joerg Wunsch * Development engineer, Dresden, Germany Atmel Automotive GmbH, Theresienstrasse 2, D-74027 Heilbronn Geschaeftsfuehrung: Steven A. Laub, Stephen Cumming, Thomas Hoetzel Amtsgericht Stuttgart, Registration HRB 106594
On 23/02/2011 13:35, Joerg Wunsch wrote:
> Mel<mwilson@the-wire.com> writes: > >> And somebody's delay routine brought in the floating-point package >> so they could take their parameter in seconds. > > That's what you get for neither reading the documentation, nor > obeying the #warnings that are printed. > > The floating-point calculations are done at *compile time* only, and > in order for them to be optimized away, well, you have to enable > optimizations. But that's all documented ... >
I take it you are talking about the delay routines in avr-libc (the default library for avr-gcc, for those that don't know) - they work fine, as the floating point calculations are done at compile time. If someone is getting floating point code using avr-libc delay functions, then they are, as you say, not using the code correctly. But I've seen other people's delay routines which /try/ to work this way, but fail - for various reasons, they end up doing the calculations at run time, even with optimisations enabled.
On 02/23/2011 12:20 AM, Andreas Huennebeck wrote:
> Tim Wescott wrote: > >> If I could figure out how to keep C++ without using the heap, I'd be a >> happy camper. > > Don't use 'new'. Put your objects on the stack and let subroutines use > them through references as function arguments, e.g.: > > static int foo(MyClass& c) > { > c.do_something(); > } > > int main() > { > int some_data; > ... some code ... > MyClass my_object(some_data); > ... some code ... > foo(my_object); > ... some code ... > } > >> I'm not even sure if malloc&c. are getting _called_, or >> just pulled in because C++ uses "new", which uses -- well, you get the >> idea. > > If your classes don't call new and your functions never call new and if you > don't use classes from other libraries then malloc() should not be linked in. > That's theory though :-( > >> For embedded I pretty much avoid dynamic deallocation like the >> plague*; while this is against the C++ desktop paradigm, it lets you use >> a very useful subset of the language without a whole 'heap' of trouble. >> >> * Meaning I'll "new" things at system startup but only if they're going >> to live until power-down. This gives one great flexibility in making >> portable libraries while still not fragmenting the heap through >> new-delete-new-delete-new sequences. > > For this approach (and placement new, see http://en.wikipedia.org/wiki/Placement_syntax) > malloc is linked in. > > bye > Andreas
void * malloc(size_t n) { assert(false); return 0; } is pretty small -- maybe I'll use that. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com Do you need to implement control loops in software? "Applied Control Theory for Embedded Systems" was written for you. See details at http://www.wescottdesign.com/actfes/actfes.html
On 02/23/2011 02:30 AM, David Brown wrote:
> On 23/02/2011 09:20, Andreas Huennebeck wrote: >> Tim Wescott wrote: >> >>> If I could figure out how to keep C++ without using the heap, I'd be a >>> happy camper. >> >> Don't use 'new'. Put your objects on the stack and let subroutines use >> them through references as function arguments, e.g.: >> >> static int foo(MyClass& c) >> { >> c.do_something(); >> } >> >> int main() >> { >> int some_data; >> ... some code ... >> MyClass my_object(some_data); >> ... some code ... >> foo(my_object); >> ... some code ... >> } >> >>> I'm not even sure if malloc&c. are getting _called_, or >>> just pulled in because C++ uses "new", which uses -- well, you get the >>> idea. >> >> If your classes don't call new and your functions never call new and >> if you >> don't use classes from other libraries then malloc() should not be >> linked in. >> That's theory though :-( >> >>> For embedded I pretty much avoid dynamic deallocation like the >>> plague*; while this is against the C++ desktop paradigm, it lets you use >>> a very useful subset of the language without a whole 'heap' of trouble. >>> >>> * Meaning I'll "new" things at system startup but only if they're going >>> to live until power-down. This gives one great flexibility in making >>> portable libraries while still not fragmenting the heap through >>> new-delete-new-delete-new sequences. >> >> For this approach (and placement new, see >> http://en.wikipedia.org/wiki/Placement_syntax) >> malloc is linked in. >> > > It is also possible to use static objects that get constructed before > main() - this will not lead to any mallocs and will also allow absolute > addressing modes which are more efficient on some processors. Of course, > you then have to be sure of the ordering of construction of these static > objects (or use a two-part constructor and init method).
That's what I'm doing! Unless I'm creating something dynamically and I've forgotten it (the code is old, and getting resurrected) -- now I need to go over the code with a fine-toothed comb. -- Tim Wescott Wescott Design Services http://www.wescottdesign.com Do you need to implement control loops in software? "Applied Control Theory for Embedded Systems" was written for you. See details at http://www.wescottdesign.com/actfes/actfes.html
On 23/02/2011 16:47, Tim Wescott wrote:
> On 02/23/2011 12:20 AM, Andreas Huennebeck wrote: >> Tim Wescott wrote: >> >>> If I could figure out how to keep C++ without using the heap, I'd be a >>> happy camper. >> >> Don't use 'new'. Put your objects on the stack and let subroutines use >> them through references as function arguments, e.g.: >> >> static int foo(MyClass& c) >> { >> c.do_something(); >> } >> >> int main() >> { >> int some_data; >> ... some code ... >> MyClass my_object(some_data); >> ... some code ... >> foo(my_object); >> ... some code ... >> } >> >>> I'm not even sure if malloc&c. are getting _called_, or >>> just pulled in because C++ uses "new", which uses -- well, you get the >>> idea. >> >> If your classes don't call new and your functions never call new and >> if you >> don't use classes from other libraries then malloc() should not be >> linked in. >> That's theory though :-( >> >>> For embedded I pretty much avoid dynamic deallocation like the >>> plague*; while this is against the C++ desktop paradigm, it lets you use >>> a very useful subset of the language without a whole 'heap' of trouble. >>> >>> * Meaning I'll "new" things at system startup but only if they're going >>> to live until power-down. This gives one great flexibility in making >>> portable libraries while still not fragmenting the heap through >>> new-delete-new-delete-new sequences. >> >> For this approach (and placement new, see >> http://en.wikipedia.org/wiki/Placement_syntax) >> malloc is linked in. >> >> bye >> Andreas > > void * malloc(size_t n) > { > assert(false); > return 0; > } > > is pretty small -- maybe I'll use that. >
void * malloc(size_t n) { return 0; } is smaller. "assert" can bring in all sorts of junk, depending on how it is defined, and what flags you have enabled - if it tries to print out a message on stderr, then this may be the cause of some of your problems.

Memfault State of IoT Report