EmbeddedRelated.com
Forums
Memfault Beyond the Launch

whose 8051 cc overlays static inline stack frames

Started by PPAATT January 10, 2004
Hans-Bernhard,
I agree.

Please allow me to add my 2 cents worth in the discussion.
Using the TASKING CC51 v7.0r8, large memory model

The code snippet:


#define tbd  /* ... */
#define STATIC_INLINE _inline
extern void x(char chx);

STATIC_INLINE void c(char chc);
STATIC_INLINE void b(char chb);

STATIC_INLINE void c(char chc)
{ 
 tbd; 
 x(chc); 
 tbd; 
}

STATIC_INLINE void b(char chb) 
{ 
 tbd; 
 c(chb); 
 tbd; 
}


void a(char cha) 
{ 
 tbd; 
 b(cha); 
 tbd; 
}


compiled with the following compiler options:

memory model large (using XDATA as default), 
static data overlay allowed with size 20 bytes (for non-reentrant functions)

the following optimization features enabled:

- CSE (common subexpresseion elimination)
- constant and copy propagation
- peephole optimizer 
- invariant code relocation
- optimization into compound assignents
- code order rearranging
- extra flow optimization pass
- register parameter passing

results in the following assembly output:


; TASKING 8051 C compiler v7.0r8 Build 148 
; options: -ne -It:\tk008024\rel7_0r8\include -Ms -rl -ivo=0x0000 -Ci8051
;          -OAcdFhikLmpsVrtw -c20 -b0 -a20 -A1 -wstrict -s -mid=128
$CASE
 NAME TEST_OVERLAY
; test_overlay.c    1 #define tbd // _nop(); /* ... */ 
; test_overlay.c    2 #define STATIC_INLINE _inline 
; test_overlay.c    3 extern void x(char chx); 
; test_overlay.c    4  
; test_overlay.c    5 STATIC_INLINE void c(char chc); 
; test_overlay.c    6 STATIC_INLINE void b(char chb); 
; test_overlay.c    7  
; test_overlay.c    8 STATIC_INLINE void c(char chc)  
; test_overlay.c    9 {  
; test_overlay.c   10  tbd;  
; test_overlay.c   11  x(chc);  
; test_overlay.c   12  tbd;  
; test_overlay.c   13 } 
; test_overlay.c   14  
; test_overlay.c   15 STATIC_INLINE void b(char chb)  
; test_overlay.c   16 {  
; test_overlay.c   17  tbd;  
; test_overlay.c   18  c(chb);  
; test_overlay.c   19  tbd;  
; test_overlay.c   20 } 
; test_overlay.c   21  
; test_o
verlay.c   22
; test_overlay.c   23 void a(char cha)
; test_overlay.c   24

 PUBLIC _?a
TEST_OVERLAY_A_DA SEGMENT DATA OVERLAY( 0 )
 RSEG TEST_OVERLAY_A_DA
 PUBLIC _a_BYTE
_a_BYTE: DS 1
; cha = _a_BYTE (register parameter)
TEST_OVERLAY_A_PR SEGMENT CODE
 RSEG TEST_OVERLAY_A_PR
_?a:
 USING 0
 MOV _a_BYTE,R7
; test_overlay.c   25  tbd;
; test_overlay.c   26  b(cha);
 LCALL _?x
; test_overlay.c   27  tbd;
; test_overlay.c   28 }
 RET

; test_overlay.c   29

 EXTRN CODE(_?x)
 EXTRN CODE(SMALL)
 END

Please note that the _inline extended keyword does exactly what is expected:
It places the _inline function's code in the instruction sequence instead of
making a call.
Very useful to save some microseconds in time critical modules.
However it does increase the code size. But that is a traditional trade-off


To cite the manual:
"With the _inline keyword, a C function can be defined to be inlined by the
compiler. An inline function must be defined in the same source file before
it is 'called'. When an inline function has to be called in several source
files, each file must include the definition of the inline function. This is
typically solved by defining the inline function in a header file.

Not using a function which is defined as an _inline function does not
produce any code. Also during a debug session, the inlined function is not
known.

The pragmas asm and endasm are allowed in inline functions. This makes it
possible to define inline assembly functions. ..."

Maybe this helps to resolve the issue.

regards
/jan





Hans-Bernhard Broeker <broeker@physik.rwth-aachen.de> schrieb in im
Newsbeitrag: btq9es$2n8$1@nets3.rz.RWTH-Aachen.DE...
> Pat LaVarre <ppaatt@aol.com> wrote: > [...] > > > Thanks for helping me learn I should have said more, sorry we disagree > > over how elastic the meaning of jargon can be. > > IMHO, sloppy use of jargon has no place in what really was a rather > thinly veiled public accusation of the entire community of '51 C > compiler makers of being lazy. If you want to voice a complaint, you > should give a complete and accurate record of the facts. > > > Again my C89 fragment was: > > > extern void x(char chx); > > static /* inline */ void c(char chc) { ...; x(chc); ... } > > static /* inline */ void b(char chb) { ...; c(chb); ... } > > void a(char cha) { ...; b(cha); ... } > > > Ouch now I see I neglected to mention: I also know that in the actual > > code here shown as "..." ellipses, there were no mentions of cha chb > > chc. That's the observation that tells me we can store cha chb chc > > all in the same static byte. > > You may be overlooking some of the pickier details of C, most notably > the "aliasing problem". If there's even a single pointer being used > to access any char object hidden in those "..."s, the compiler has to > assume it no longer knows whether that, e.g. cha is still needed even > after the return of function b(). What appears obvious to you isn't > necessarily obvious to the compiler, too. It may generally be > impossible for it to find out. > > > Is there no 8051 cc compiler available that can make that same > > observation and act on it? > > I see no reason why they shouldn't --- figuring this out should be no > unsurmountable obstacle for the kind of static analysis these > compilers have to run. But, as they say, the proof of the pudding is > in the eating. Post a complete, compilable example, and people might > even feed it to their compilers of choice and report on results. > > Ah, heck, I'll give it a shot myself... > > -- > Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de) > Even if all the snow were burnt, ashes would remain.
> From: CBFalconer ... > Date: Sun, Jan 11, 2004 6:51 am > Message-id: <40014FD1.5A99436F@yahoo.com> > ... > Surprising it is valid code,
Thanks for saying, but surprising why?
> c:\c\junk>gcc -c -fomit-frame-pointer -O3 junk.c > c:\c\junk>objdump -dS junk.o > > junk.o: file format coff-go32 > > Disassembly of section .text: > > 00000000 <_a>: > #define tbd /* ... */ > extern void x(char chx); > static /* inline */ void c(char chc) { tbd; x(chc); tbd; } > 0: 0f be 54 24 04 movsbl 0x4(%esp,1),%edx > 5: 89 54 24 04 mov %edx,0x4(%esp,1) > 9: e9 f2 ff ff ff jmp 0 <_a> > e: 90 nop > f: 90 nop
From the c: prompt and x90 = nop I gather you compiled for x86. Are nop past a jmp spurious in x86? If spurious, why emit them despite -O3?
> junk.c: (in function c) > junk.c(3,43): Undetected modification possible from call to > unconstrained function x: x > An unconstrained function is called in a function body where > modifications are checked. Since the unconstrained function > may modify anything, there may be undetected modifications in > the checked function. (Use -modunconnomods to inhibit warning) > ... > junk.c: (in function b) > junk.c(4,43): Undetected modification possible from call to > unconstrained function c: c > ... > junk.c: (in function a) > junk.c(5,23): Undetected modification possible from call to > unconstrained function b: b
Perhaps this English merely says we gave splint only the same separately compiled source file view of code that cc gets without the benefit of an integrated linker, else: Help, lost me?
> > #define tbd /* ... */ > > extern void x(char chx); > > static /* inline */ void c(char chc) { tbd; x(chc); tbd; } > > static /* inline */ void b(char chb) { tbd; c(chb); tbd; } > > void a(char cha) { tbd; b(cha); tbd; }
// Fun to see 'Splint 3.0.1.6 --- 11 Feb 2002' results // appear as a gratis web service, thank you. // // Now I wonder if we prefer the less incisive example: #define FIXME() do { ((void) 0); } while (0) /* ... */ extern volatile int i; extern void x(char chx); extern void a(char cha); volatile int i = 0; void x(char chx) { i = chx; } static /* inline */ void c(char chc) { FIXME(); x(chc); FIXME(); } static /* inline */ void b(char chb) { FIXME(); c(chb); FIXME(); } int main(int argc, char * argv[]) { argv = argv; FIXME(); b(argc); FIXME(); return 0; }
> passes splint. > However splint -strict is outraged :-)
Also passes the 3.3 gcc -c -Wall -W of Mac OS X 10.3 Developer. (No splint delivered there. Monday I hope to try an x86 Linux.)
> junk.c(2,13): Function x declared but not defined > A function or variable is declared, but not defined in any > source code file. (Use -declundef to inhibit warning)
Does this make the example "incomplete"? If yes, what fix do we prefer? How should we express the idea of a side-effect not to be omitted from the machine code, since in 8051 we have no standard libraries.
> junk.c(5,6): Function a declared but not used > A function is declared but not used. Use /*@unused@*/ in front > of function header to suppress message. (Use -fcnuse to inhibit > warning) > junk.c(5,34): Definition of a
'Does this make the example "incomplete"? If yes, what fix do we prefer?' How should we express the idea of a root entry point. Surely not the int main(int argc, char * argv[]) standard of C89 and Unix?
> junk.c(3,43): Statement has no effect (possible undected > modification through call to unconstrained > function x): x(chc) > Statement has no visible effect --- no values are modified. It > may modify something through a call to an unconstrained > function. (Use -noeffectuncon to inhibit warning) > ... > junk.c(4,43): Statement has no effect (possible undected > modification through call to unconstrained > function c): c(chb) > ... > junk.c(5,23): Statement has no effect (possible undected > modification through call to unconstrained > function b): b(cha)
Aye, the tbd statements have no effect on purpose. To express this idea of a consciously-empty-statement, I think I remember gcc folk advocate an explicit ((void) 0). I hesitate because I remember gcc -Wall -W rejecting cast-to-void as a way of saying arg-intentially-not-used in Linux sg utils. But I see now the 3.3 gcc -c -Wall -W of Mac OS X 10.3 Developer does accept cast-to-void as a way of saying zero-intentionally-not-used.
> junk.c(2,20): Declaration parameter has name: chx > A parameter in a function prototype has a name. This is > dangerous, since a macro definition could be visible here. > (Use either -protoparamname or -namechecks to inhibit > warning)
Yes. All the same, naming parameters in a usenet post helps us refer to them.
> junk.c(2,13): Function x exported but not declared in header file > A declaration is exported, but does not appear in a header file. > (Use -exportheader to inhibit warning) > junk.c(5,6): Function a exported but not declared in header file > junk.c(5,34): Definition of a
Yes. Pat LaVarre
> From: "Jan Homuth" ... > Message-id: <btrm87$al4ak$1@ID-139563.news.uni-berlin.de> > ... > _inline ... does increase the code size
Only when applied to subroutines actually called more than once. For subroutines called once, inline wins on code size, run time, locality, etc.
> From: Hans-Bernhard Broeker ... > Message-id: <btrm5s$3lt$1@nets3.rz.RWTH-Aachen.DE> > ... > Get your own Keil eval copy and see for yourself.
Does this actually work? Back when I was paid to work 8051, I couldn't talk here. Now that I'm not paid to work 8051, (a) I have little money for 8051 tools and (b) I won't experience a concentrated interest. The model of time-limited eval in anticipation of much money exchanged doesn't fit me now.
> From: "Jan Homuth" ... > ... > ; TASKING 8051 C compiler v7.0r8 Build 148 > ; options: -ne -It:\tk008024\rel7_0r8\include -Ms -rl -ivo=0x0000 -Ci8051 > ; -OAcdFhikLmpsVrtw -c20 -b0 -a20 -A1 -wstrict -s -mid=128 > $CASE > NAME TEST_OVERLAY > ...
> _a_BYTE: DS 1 > MOV _a_BYTE,R7 > LCALL _?x > RET
Thanks for the demo. I think I see: a) LCALL followed by RET. I wonder why that's not an LJMP. b) One byte allocated for the reused parm, not three or four. c) Max stack depth of return-from-a and return-from-x, rather than the return-from-a return-from-b return-from-c return-from-x stack I saw before. Google suggests we're here talking of http://www.tasking.com/
> ; TASKING 8051 C compiler v7.0r8 Build 148 > ... > To cite the manual:
Not available online? The cited caveats sound normal to me.
> From: Hans-Bernhard Broeker ... > ... > > #define tbd /* ... */ > > extern void x(char chx); > > static /* inline */ void c(char chc) { tbd; x(chc); tbd; } > > static /* inline */ void b(char chb) { tbd; c(chb); tbd; } > > void a(char cha) { tbd; b(cha); tbd; } > ... > compiler I chose ... > allocated all of cha, chb and chc to register R7, ... > and the calls to b, c and x all became plain JMP operations ...
Thanks for the demo, results sound good.
> I didn't test that, but I guess the "linker > code packing" feature will reduce that even > further, to make _a consist of just JMP _x.
Good. I remember seeing unreasonable JMP to JMP in machine code.
> It does have an optimization that is supposed > to "follow through" on chained jumps like > these and retarget directly to the final one.
Sorry I'm not sure which "it" we mean. Possibly I miss the point of leaving the JMP to JMP in the object. Perhaps the overall development experience improves if only in the linker do we invest time into looking for such silliness, even though in separate compilation by definition we spend link time again for each make, not just for each compile.
> > I mean to say the C compilers I have tried > > for the 8051 do not deliver space efficiency > > comparable to what I'm used to seeing in > > proprietary 8051 asm source code. > > Well, as the saying goes, if it's asm code you > want, I trust it you know where to find it.
Here we may have lost me. I mean to be saying I know of people who are paying for extra chips they don't need, merely because the C compiler they chose wastes space unnecessarily. Human compilers work better, aye, but each version is different and none are reliably available over time. Sounds like the C99 inline keyword is gaining a following, so in time at least that trouble will go away. gcc 3.4 indirect sibcalls are the first ray of hope I've caught for "whose 8051 cc omits the insignificant bytes of call instructions".
> C compilers for '51 can do > some rather impressive tricks these days,
I find teachers most willing to help me when we both know the teacher has impressed me. I haven't yet met an impressive compiler, not when I openly review its work with the help of a paired dis/assembler and flow analysis. Pat LaVarre http://members.aol.com/ppaatt/losslessc/
Pat LaVarre wrote:
> > From: CBFalconer ... > > ... > > Surprising it is valid code, > > Thanks for saying, but surprising why?
To me, on first glance, it lacked a main, #includes, etc. On second glance, it doesn't need them. However it really should have a #include of the access header, specifying the one function externally visible. Either that or a main.
>
... snip ...
> > From the c: prompt and x90 = nop I gather you compiled for x86. > > Are nop past a jmp spurious in x86? If spurious, why emit them > despite -O3?
Has to do with controlling data alignment.
> > > junk.c: (in function c) > > junk.c(3,43): Undetected modification possible from call to > > unconstrained function x: x > > An unconstrained function is called in a function body where > > modifications are checked. Since the unconstrained function > > may modify anything, there may be undetected modifications in > > the checked function. (Use -modunconnomods to inhibit warning) > > ...
The "splint -strict" run was primarily for amusement. It is only useful when you have annoted the source very thoroughly as to intention and usage etc. Why was your reply not posted as a reply to my article? The references are fouled up. -- Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net) Available for consulting/temporary embedded and systems. <http://cbfalconer.home.att.net> USE worldnet address!
Pat LaVarre wrote:

>> From: "Jan Homuth" ... >> Message-id: <btrm87$al4ak$1@ID-139563.news.uni-berlin.de> >> ... >> _inline ... does increase the code size > > Only when applied to subroutines actually called more than once.
Forgive my ignorance but isn't 'called more than once' intrinsic in the definition of subroutine? Ian
Ian Bell wrote:
> Pat LaVarre wrote: > >> From: "Jan Homuth" ... > >> > >> ... > >> _inline ... does increase the code size > > > > Only when applied to subroutines actually called more than once. > > Forgive my ignorance but isn't 'called more than once' intrinsic > in the definition of subroutine?
No, not from the point of view of the writer. Breaking something up into logical units that perform simple understandable actions correctly facilitates writing accurate code. It prevents creating long monolythic obtuse routines. There is usually a tradeoff point in numbers of calls where net code becomes smaller. Which is why the inlining decision is better left to the compiler, in many cases. -- Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net) Available for consulting/temporary embedded and systems. <http://cbfalconer.home.att.net> USE worldnet address!
Pat,

> > _inline ... does increase the code size > > Only when applied to subroutines actually called more than once. For > subroutines called once, inline wins on code size, run time, locality, > etc. >
Aaaw.. c'mon. Did you also read the excerpt from the help manual ? That's an abvious one. I did not say that in general _inline causes bigger code size. Only if applied more than once it does. You are right there.
> > From: "Jan Homuth" ... > > ... > > ; TASKING 8051 C compiler v7.0r8 Build 148 > > ;
options: -ne -It:\tk008024\rel7_0r8\include -Ms -rl -ivo=0x0000 -Ci8051
> > ; -OAcdFhikLmpsVrtw -c20 -b0 -a20 -A1 -wstrict -s -mid=128 > > $CASE > > NAME TEST_OVERLAY > > ... > > > _a_BYTE: DS 1 > > MOV _a_BYTE,R7 > > LCALL _?x > > RET > > Thanks for the demo. I think I see: > > a) LCALL followed by RET. I wonder why that's not an LJMP. >
If it were an LJMP how would a return be possible ? LCALL stores the return address on the 8051' stack. If you use LJMP there is no way to "know" where to return to. The compiler translates a call to a global object (which function x() is) to an LCALL instruction since the course of execution will have to return to this point one time or another. Yes yes yes .... I know: I am not talking about the use of function pointers or RTOS environments. That is an entirely different theater. void a(char cha) { tbd; b(cha); tbd; } a calls b which calls c which calls x() b and c are _inline functions. a() is a regular function being visible throughout the application as x() is. On the "C" side this object can be made visible to other translation units by "extern" declaration. Whatever happens in these functions -- it has to follow a method agreed upon : this method is to return from a call and implement a call as a call and not as a goto (as you imply by asking for an LJMP instruction). Thus: LCALL not LJMP.
> b) One byte allocated for the reused parm, not three or four. >
Yes. All th ere is.
> c) Max stack depth of return-from-a and return-from-x, rather than the > return-from-a return-from-b return-from-c return-from-x stack I saw > before. > > Google suggests we're here talking of http://www.tasking.com/ > > > ; TASKING 8051 C compiler v7.0r8 Build 148 > > ... > > To cite the manual: > > Not available online? The cited caveats sound normal to me.
Available with the demo... regards /jan
> From: Jan Homuth ... > ... > If it were an LJMP how would a return be > possible ? LCALL stores the return address on > the 8051' stack. If you use LJMP there is no > way to "know" where to return to.
Somehow we're speaking past each other out of context. I'm saying: LCALL p ... p: LCALL q RET q: ... RET often may equivalently be written: LCALL p ... p: LJMP q ... q: ... RET Am I yet more clear than mud? The second expression of this same idea requires only enough stack to fit one return address. The first, more naive, expression, wastefully requires enough stack to fit two return addresses. For the example of: #define tbd /* ... */ extern void x(char chx); static inline void c(char chc) { tbd; x(chc); tbd; } static inline void b(char chb) { tbd; c(chb); tbd; } void a(char cha) { tbd; b(cha); tbd; } what I call reasonable is: a: ljmp x I think we saw the tasking.com/ compile instead produce: a: lcall x ret Situations where an 8051 processor behaves better when asked to lcall, ret rather than ljmp are rare.
> return ... as a goto ...
I use that if a subroutine called only once has some good reason to be stored elsewhere, rather than inline.
> Aaaw.. c'mon. Did you also ... abvious ...
Sorry I misunderstood, not on purpose, honestly, I did and I do customarily review all the text of this thread, and my own drafts of my own text, repeatedly before posting. Pat LaVarre
> really should have a #include of the access > header, specifying the one function > externally visible. Either that or a main.
Help. 1) Is there no portable way for 8051 .c to express the idea of an unspecified side effect that should not be omitted? My attempt was: extern void x(char chx); 2) What kind of main do we like, if we need an arg? Surely not the Unix: int main(int argc, char * argv[]) { ...
> > > surprising why? > > > > ... lacked a main, #includes, etc. > > ... > > nop past a jmp spurious ... > > Has to do with controlling data alignment. > ... > > ... "splint -strict" run ... primarily for amusement ...
All clear now thank you.
> references ... fouled up
Sorry this happened, more sorry to hear it bothered you. As yet my news clients cannot simultaneously achieve all of: 1) Available gratis cross-platform (Mac/ Linux/ Windows). 2) Unbroken lines. 3) Correct references. 4) Instant replies. Usually I give up (4), this time I chose perhaps wrongly to give up (3). Pat LaVarre
Pat,
A simple matter:

The compiler cannot execute optimization on a call to an external routine of
a different translation unit. (C source module)

This would mean having a feature like 'global call optimization'.
That is a good idea.
Thanks for the inspiration.

Since the compiler does not have a feature x() must be CALL'ed.
(Please do not forget that x() has a parameter that is to be passed. The
compiler has calling conventions that must be used consistently)

I am aware that there is potential for improvement.

Let me ask you a question.
For the code snippet presented, what is the result of the tools available to
you ?



grtnx
/jan




Pat LaVarre <ppaatt@aol.com> schrieb in im Newsbeitrag:
2695edf1.0401120902.534a010@posting.google.com...
> > From: Jan Homuth ... > > ... > > If it were an LJMP how would a return be > > possible ? LCALL stores the return address on > > the 8051' stack. If you use LJMP there is no > > way to "know" where to return to. > > Somehow we're speaking past each other out of context. > > I'm saying: > > LCALL p > ... > p: LCALL q > RET > q: > ... > RET > > often may equivalently be written: > > LCALL p > ... > p: LJMP q > ... > q: > ... > RET > > Am I yet more clear than mud? The second expression of this same idea > requires only enough stack to fit one return address. The first, more > naive, expression, wastefully requires enough stack to fit two return > addresses. > > For the example of: > > #define tbd /* ... */ > extern void x(char chx); > static inline void c(char chc) { tbd; x(chc); tbd; } > static inline void b(char chb) { tbd; c(chb); tbd; } > void a(char cha) { tbd; b(cha); tbd; } > > what I call reasonable is: > > a: ljmp x > > I think we saw the tasking.com/ compile instead produce: > > a: lcall x > ret > > Situations where an 8051 processor behaves better when asked to lcall, > ret rather than ljmp are rare. > > > return ... as a goto ... > > I use that if a subroutine called only once has some good reason to be > stored elsewhere, rather than inline. > > > Aaaw.. c'mon. Did you also ... abvious ... > > Sorry I misunderstood, not on purpose, honestly, I did and I do > customarily review all the text of this thread, and my own drafts of > my own text, repeatedly before posting. > > Pat LaVarre

Memfault Beyond the Launch