There are 57 messages in this thread.
You are currently looking at messages 10 to 20.
Tomás Ó hÉilidhe wrote:
> On May 10, 9:11 am, "aamer" <raqeeb...@yahoo.com> wrote:
>> Dear all,
>>
>> Are there any hard and fast rules for code optimization in C targetting a
>> processor.
>
>
> I'd advocate using types like "uint_fast8_t" instead of "unsigned
> int"; that way you'll get good performance out of all kinds of
> machine, whether they be 8-Bit, 16-Bit or 5-billion-Bit. For instance
> if you use "unsigned int" on an 8-Bit microcontroller where an 8-Bit
> integer would suffice, then your code will be at least twice as slow
> because multiple instructions are used everytime you do simple
> arithmetic.
>
Using the "fast" types can make sense, especially for speed-critical
code. There are advantages in using the size-specific types, however -
specifying "uint8_t" rather than "uint_fast8_t" may let the compiler (or
linter) spot range errors that would not be found if "uint_fast8_t"
boils down to a 32-bit value. Given that the compiler can often
optimise the generated code to use the best sized types available, it's
seldom worth specifying "fast" types explicitly.
> Also I'd advocate using "built-in" parts of the language where
> possible, e.g.:
>
> unsigned arr[12] = {0};
>
That's good advice, except that using "unsigned" contradicts your
previous advice. Personally, I dislike abbreviated types like
"unsigned" - I always write the implicit "int" explicitly.
> instead of:
>
> unsigned arr;
> memset(arr,0,sizeof arr);
>
I presume you meant "unsigned arr[12];" here.
The main reason for using the {} initialiser rather than memset() or
other methods is that it gives clearer and shorter source code - smaller
and faster object code is a bonus (in some circumstances, compilers will
optimise the memset() call to the same code anyway).
> (Also the former is fully portable for dealing with types like
> pointers and floating point types whose "zero value" might not be all-
> bits-zero)
>
It is virtually impossible to write fully portable code - and totally
impossible within the world of embedded programming. Forget the
machines that have weird values for zeros, or bizarre numbers of bits
(although some DSP's have 16-bit or 32-bit chars), or something other
than two's complement arithmetic, or non-ASCII for their basic character
set. It's not worth it - code suitable for an ARM is not suitable for
running on a 1970's mainframe anyway.
> Another thing would be about the use of the post-increment and post-
> decrement operators in a conditional. For instance:
>
> void strcpy(char *dst, char const *src)
> {
> while (*dst++ = *src++);
> }
>
> The idiom of using *p++ is widespread, but unfortunately its use is no
> longer advisable because hardware has moved on. I think it was the
> PDP11 that had a single instruction for dereferencing a pointer and
> also incrementing it at the same time, thus it was beneficial to use *p
> ++ wherever possible -- however modern machines don't have such an
> instruction, so the assembler produced for *p++ when used as the
> conditional in an if statement, for instance, might be sub-optimal. So
> I'd say opt for:
>
> for ( ; *dst = *src; ++dst, ++src) ;
>
In any code review, that form would be taken out and shot. Just because
it is legal in C to write an ugly mess inside a for() statement, does
not mean that it is sensible to write it. It's not even going to
produce smaller or faster code - any compiler that can't produce tight
code for the original while() will produce poor code from this construct
too.
The first idiom is so commonly used that it is clear to any reader -
although I'd have two sets of parenthesis (gcc convention to disable a
warning) and perhaps a comment to say that I really meant a single "=".
For less capable compilers, you are probably better with:
while ((*dst = *src)) {
dst++;
src++;
}
That's far clearer to the reader, and easier for a less sophisticated
compiler.
It's always important to examine the generated assembly code, and learn
to know your target architecture and your compiler's idiosyncrasies if
you want to get the best from it - don't guess randomly at the most
obfuscated expression you can think of.
> Moving on...
>
> On most machines, I would use pointers instead of element indices for
> iterating thru an array. For example:
>
> char *p = arr;
> char const *const pend = arr + LENGTH;
>
> do if ('a' == *p) return 1;
> while (pend != ++p);
>
> intead of:
>
> unsigned i = 0;
>
> do if ('a' == arr[i]) return 1;
> while (LENGTH != ++i);
>
First off, get yourself a decent compiler. It will do the same job, and
let you write the source code using proper array constructs.
Secondly, don't write a loop like that (first or second forms) without
using brackets - it's unclear, and it changes can easily break the code.
Third, forget the silly "if (constant == variable)" form of expression
unless you are working for MISRA nazis (i.e., those that think the rules
are unbendable). The logical and sensible ordering when reading such a
comparison is normally "if (variable == constant)". If your compiler
does not spot mistakes such as using a single "=" when you meant "==",
get a better compiler or a better linter.
> The latter, on most architectures, is a hell of a lot slower. But then
> again there are some PC's that have a single instruction for "pointer
> + offset", so I can't discredit that technique altogether.
>
Do you have any evidence whatsoever for such a wild claim? A good
compiler will use pointer instructions for array access, and will do the
strength reduction turning the array loop into an incrementing pointer.
Also, there are plenty of current modern architectures that have array
memory modes that will be used as appropriate.
> On all architectures, I advocate the use of look-up tables instead of
> switch statements where applicable, especially when it's possible to
> have a look-up table containing function pointers.
>
You can advocate all you want - fortunately most people will ignore you.
The compiler will almost always generate better code for common switch
cases than a lookup table - and will generate a jump table automatically
as necessary. This will be significantly smaller and faster than a
lookup table of function pointers. (There are plenty of good reasons
for using a table of function pointers as a code construct - it's just
that replacing switch statements is not one of them.)
> If you're ever dealing with a struct that has a lot of information in
> it which is common to a "type", then it might be advisable to follow C+
> +'s idom of removing that stuff from the struct and replacing it with
> a pointer to a single object which contains all the relevant
> information for that time (a V-Table, that is).
>
It *might* be, but it sounds very unlikely. What you describe is not a
C++ idiom, and it's not a vtable - you are describing static data members.
> Emmm they're the main ones that come to mind right now.
My main-points (for speed and size) are: 1. Benchmark what's worth optimizting 2. Do all algorithmic optimizations first. 3. Get a good compiler. If you're using GCC consider compiling a new one. 4. Learn what the restrict-keyword from C99 does. Most compiles support it these days. Use restrict whenever possible, but never if you're not sure if it can be applied. 5. Don't use unsigned integers for loop-variables unless you need the wrap-around feature. 6. Let the compiler decide what to inline and what not. Don't inline functions just because you think the code will benefit from it. 7. Embedded CPU's often have small caches and slow external memory. Try to keep your working-set small. Packing multiple booleans or enums in a single integer may look dirty (less so if you hide the dirty details with macros), but if it can increase the cache efficiency a lot. And last: It's not worth to outsmart the compiler. Changing loops from indexing to pointer increment style is not worth it anymore. The compiler will do this job for you.
Nils wrote: 0. Starting with clear, well structured, code will help if you need to optimize it later. > 1. Benchmark what's worth optimizting Also use the benchmark to determine if there is a performance issue in the first place. Though aiming for efficient code is a lofty goal, other goals like correctness, robustness, maintainability, clarity...etc are often at least as (and usually more) important. No one will care how fast your code can produce incorrect results. > 2. Do all algorithmic optimizations first. Algorithmic optimizations can improve performance by orders of magnitude, code optimizations rarely improve performance by more than 30% and usually much less than that. > 3. Get a good compiler. If you're using GCC consider compiling a new one. > > 4. Learn what the restrict-keyword from C99 does. Most compiles support > it these days. Use restrict whenever possible, but never if you're not > sure if it can be applied. > > 5. Don't use unsigned integers for loop-variables unless you need the > wrap-around feature. > > 6. Let the compiler decide what to inline and what not. Don't inline > functions just because you think the code will benefit from it. > > 7. Embedded CPU's often have small caches and slow external memory. Try > to keep your working-set small. Packing multiple booleans or enums in a > single integer may look dirty (less so if you hide the dirty details > with macros), but if it can increase the cache efficiency a lot. > > And last: It's not worth to outsmart the compiler. Changing loops from > indexing to pointer increment style is not worth it anymore. The > compiler will do this job for you. Or more general: never assume some 'clever trick' will generate faster or smaller code - instead prove that the 'clever trick' will yield the desired effect. With prove I mean measure (before and after) and/or check the compiler output (which also helps to develop a feel what is expensive and what not). Also remember that not all compilers are alike; some compilers optimize certain code sequences better than others. I have seen too many examples of people obfuscating the source code assuming they are helping the compiler to generate more efficient code, while in reality they made things performance wise no better and sometimes even worse.
In article <6...@mid.uni-berlin.de>, Nils <n...@cubic.org> wrote: >My main-points (for speed and size) are: >5. Don't use unsigned integers for loop-variables unless you need the >wrap-around feature. Not necessarily true: ................... while (my_unsigned8var < 5) {} 1BF0: MOVF x85,W 1BF2: SUBLW 04 1BF4: BNC 1BF8 1BF6: BRA 1BF0 ................... while (my_signed8var < 5) {} 1BF8: BTFSC x86.7 1BFA: BRA 1C02 1BFC: MOVF x86,W 1BFE: SUBLW 04 1C00: BNC 1C04 1C02: BRA 1BF8 ................... On this particular combination of target and compiler (PIC18 with CCS C) unsigned in always faster than signed.
On 2008-05-10, Tom <t...@nospam.com> wrote: > In article <6...@mid.uni-berlin.de>, Nils <n...@cubic.org> wrote: >>My main-points (for speed and size) are: >>5. Don't use unsigned integers for loop-variables unless you need the >>wrap-around feature. > > Not necessarily true: > > ................... while (my_unsigned8var < 5) {} > 1BF0: MOVF x85,W > 1BF2: SUBLW 04 > 1BF4: BNC 1BF8 > 1BF6: BRA 1BF0 > ................... while (my_signed8var < 5) {} > 1BF8: BTFSC x86.7 > 1BFA: BRA 1C02 > 1BFC: MOVF x86,W > 1BFE: SUBLW 04 > 1C00: BNC 1C04 > 1C02: BRA 1BF8 > ................... > > On this particular combination of target and compiler (PIC18 with CCS C) > unsigned in always faster than signed. I've seen various other compilers/targets where use of unsigned loop indexes is faster. For example, one of the tips/tricks listed when using GCC for the MSP430 target: Tips and trick for efficient programming [...] 10. Use unsigned int for indices - the compiler will snip _lots_ of code. On second thought, that might be refering to array indexes instead of loop indexes. Hmm... -- Grant Edwards grante Yow! I want to read my new at poem about pork brains and visi.com outer space ...
On May 10, 8:19=A0pm, David Brown <david.br...@hesbynett.removethisbit.no> wrote: > It is virtually impossible to write fully portable code - and totally > impossible within the world of embedded programming. =A0Forget the > machines that have weird values for zeros, or bizarre numbers of bits > (although some DSP's have 16-bit or 32-bit chars), or something other > than two's complement arithmetic, or non-ASCII for their basic character > set. =A0It's not worth it - code suitable for an ARM is not suitable for > running on a 1970's mainframe anyway. I write fully-portable code all the time and I find it to be a simple task a lot of the time. The C Standard provides you with plenty of information to write fully-portable algorithms and programs.
I myself only use signed integer types when I need to store negative numbers. Other reasons for going with unsigned are: 1) With signed integer types, you get undefined behaviour upon overflow. 2) On machines other than two's complement, arithmetic can be less efficient with signed. 3) You can be left with a trap representation if you play around with the bits of a signed, depending on the system. I see signed integer types as nasty and so I only use them when I really have to.
On Sat, 10 May 2008 19:56:03 -0700, Tomás Ó hÉilidhe wrote: > On May 10, 8:19 pm, David Brown > <david.br...@hesbynett.removethisbit.no> wrote: > >> It is virtually impossible to write fully portable code - and totally >> impossible within the world of embedded programming.  Forget the >> machines that have weird values for zeros, or bizarre numbers of bits >> (although some DSP's have 16-bit or 32-bit chars), or something other >> than two's complement arithmetic, or non-ASCII for their basic >> character set.  It's not worth it - code suitable for an ARM is not >> suitable for running on a 1970's mainframe anyway. > > > I write fully-portable code all the time and I find it to be a simple > task a lot of the time. The C Standard provides you with plenty of > information to write fully-portable algorithms and programs. "I'm only 21 years of age". Chances are good that you're pontificating to someone who's been earning money at this game since before you were an orgasm. So is your experience vast, or your statement half-vast? -- Tim Wescott Control systems and communications consulting http://www.wescottdesign.com Need to learn how to apply control theory in your embedded system? "Applied Control Theory for Embedded Systems" by Tim Wescott Elsevier/Newnes, http://www.wescottdesign.com/actfes/actfes.html
Walter Banks <w...@bytecraft.com> writes: > Compiler texts in general devote > a lot of space to parsing an activity that takes a small > fraction of the time to implement a compiler. Most compilers don't need optimization, since most compilers are for things other than general-purpose programming languages. Therefore far more compiler writers need to know about parsing than need to know about optimization. Try books like _Advanced Compiler Design and Implementation_ by Steven Muchnick. Eric
Eric Smith wrote: > Walter Banks <w...@bytecraft.com> writes: > > Compiler texts in general devote > > a lot of space to parsing an activity that takes a small > > fraction of the time to implement a compiler. > > Most compilers don't need optimization, since most compilers are for > things other than general-purpose programming languages. Therefore > far more compiler writers need to know about parsing than need to > know about optimization. This is a good point. > Try books like _Advanced Compiler Design and Implementation_ by > Steven Muchnick. It was one of the books I was referring to that has good descriptions of individual optimization techniques but deal with optimization management and application level optimization strategy very well. Regards -- Walter Banks Byte Craft Limited Tel. (519) 888-6911 http://www.bytecraft.com w...@bytecraft.com