EmbeddedRelated.com
Forums

mixing C and assembly

Started by Lax April 22, 2008
In message <fv1ut9$a74$02$1@news.t-online.com>, Hans-Bernhard Br&#4294967295;ker 
<HBBroeker@t-online.de> writes
>Walter Banks wrote: >> Hans-Bernhard Br&#4294967295;ker wrote: > >>> All of that is correct, but beside the point. For *every* piece of C >>> code anyone can possibly write, in any C compilers, there's assembler >>> code that ends up as the exact same machine code. The same is generally >>> not true for the opposite direction. So compilers can't produce faster >>> code than assemblers. > >> Compilers can produce some machine code that is exceedingly difficult >> to write and maintain in asm. > >Huh? Is something wrong with my writing or with your reading? Where >in the above did you see me talking about maintainability or >difficulty? The issue at hand is _speed_ and _size_. No more, no less.
In which case you loose... I can read the C. I cant read the ASM so I won't be able to see that what you have done is the same as the C or even correct.... :-) The whole point is that the C can be as fast and as small as the ASM but MUCH easier to read, debug and maintain. Certainly far faster to write. (BTW I do enjoy writing in asm but that is not the point) Also the compilers can do some optimisations that humans find difficult to do. Some optimisations involve the linker, not just the compiler so I am told be a compiler writer (no, it was not Walter). So in SOME cases an experienced asm writer MIGHT be able to do smaller faster code than the compiler but certainly NOT in the same time frame. Also that particular experienced ASM programmer can probably only do that for one or two MCU and not for all types of program. -- \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ \/\/\/\/\ Chris Hills Staffs England /\/\/\/\/ \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
David Brown wrote:
>
... snip ...
> > A quick test on avr-gcc 4.2.2, using 16-bit and 8-bit ints rather > than 32-bit and 16-bit (since it's an 8-bit cpu) reveals that > avr-gcc is smart enough to do a 8-bit x 8-bit -> 16-bit multiply > as desired. It's a little harder to see exactly what is > happening for bigger numbers and for division, since these use > library calls - certainly the compiler will generalise some of > these functions. But for the very common case of the multiply > like this, you get optimal code.
Defining 'optimal' is a varying target. Among others, see Knuth. In particular, in the past I have compromised on an 8 * 16 -> 24 bit heart, two of which, with an addition, produced a 16 * 16 -> 32 multiplication. This had, on the machine of interest (an 8080), significant advantages, i.e. about a 50% decrease in multiplication times. Other games are available at the compile stage where one operand is constant, especially those where the multiplier consists of some solid string of 1 bits. -- [mail]: Chuck F (cbfalconer at maineline dot net) [page]: <http://cbfalconer.home.att.net> Try the download section. ** Posted from http://www.teranews.com **
Walter Banks wrote:
> CBFalconer wrote: >
... snip ...
>> >> Well, that looks impressive, but you must be loosing something. >> You must be doing something illegal and non-understandable (to a C >> programmer) with one or more of indentation, braces placement, >> illegal statements (a call to foo should never enter bar). I see >> no reason for bar to exit while foo falls through. > > I should have used fixed point type to make the listing fragment > clearer. This is the source used in the example. > > void bar (void); > > void foo (void) > { > NOP(); > bar(); > } > > void bar (void) > { > NOP(); > } > > void main (void) > { > foo(); > bar(); > }
Well, that executes foo (and thus bar), followed by bar. I see no savings there from fall-thru. See my message of Sat. 11:13 am EDT -0400. -- [mail]: Chuck F (cbfalconer at maineline dot net) [page]: <http://cbfalconer.home.att.net> Try the download section. ** Posted from http://www.teranews.com **
David Brown wrote:
>
... snip ...
> > That's just tail call elimination (changing a "call X; ret" into > a "jmp X"), which is a standard optimisation technique (some > assemblers will do that for you). > > A better example would be: > > WriteSpace: > ld a, #' ' > WriteChar: > st a, outputCharacter > ret > > with C code: > > extern volatile char outputCharacter; > void WriteChar(char c) { > outputCharacter = c; > } > void WriteSpace(void) { > WriteChar(' '); > }
But that doesn't do anything, because normal C executes a return on the closing brace. Am I missing something? -- [mail]: Chuck F (cbfalconer at maineline dot net) [page]: <http://cbfalconer.home.att.net> Try the download section. ** Posted from http://www.teranews.com **

Hans-Bernhard Br&#4294967295;ker wrote:

> Walter Banks wrote: > > Hans-Bernhard Br&#4294967295;ker wrote: > > > These are sequences that are data or address specific that are likely > > to change or need to be checked each time the code is assembled. > > That's why the prudent assembly programmer would secure such tricks with > assemlby-time assertions. I.e. make the assumptions explicity, and make > sure that the code fails to translate if any of them is no longer true.
It is this type of check that is already embedded in C compilers. Programming in asm is both an exercise in application programming and implementation. C the focus is about application algothrims with an implementation outline.
> > The whole reason for HLL is to aid in making application code easier > > to create. > > Agreed. But you're still missing the point under discussion.
I don't think so. Most of what I have been saying is use the correct tool for the job. This is not an asm vs C issue. The importance of the work we did that created the white paper is proof that C did not have to be at a performance disadvantage to asm. That said, lets look at the other issues and see where C has an advantage. We are increasingly seeing ISA's that were designed specifically for machine generated code. Our focus has always been on making the code generation process easier. Regards -- Walter Banks Byte Craft Limited Tel. (519) 888-6911 http://www.bytecraft.com walter@bytecraft.com
In article <6fmdnRynGbho0onVRVnyjAA@lyse.net>, David Brown says...
> > A quick test on avr-gcc 4.2.2, using 16-bit and 8-bit ints rather than > 32-bit and 16-bit (since it's an 8-bit cpu) reveals that avr-gcc is > smart enough to do a 8-bit x 8-bit -> 16-bit multiply as desired.
So at least some compilers do so. Thanks. Robert ** Posted from http://www.teranews.com **
--------------0F4412A512EFDD0498C5F87E
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit



CBFalconer wrote:

> Walter Banks wrote: > > CBFalconer wrote: > > > ... snip ... > >> > >> Well, that looks impressive, but you must be loosing something. > >> You must be doing something illegal and non-understandable (to a C > >> programmer) with one or more of indentation, braces placement, > >> illegal statements (a call to foo should never enter bar). I see > >> no reason for bar to exit while foo falls through. > > > > I should have used fixed point type to make the listing fragment > > clearer. This is the source used in the example. > > > > void bar (void); > > > > void foo (void) > > { > > NOP(); > > bar(); > > } > > > > void bar (void) > > { > > NOP(); > > } > > > > void main (void) > > { > > foo(); > > bar(); > > } > > Well, that executes foo (and thus bar), followed by bar. I see no > savings there from fall-thru. See my message of Sat. 11:13 am EDT > -0400.
There is a savings Look at the listing I posted before. It follows in fixed point type. Don't start a rant about html please w.. void bar (void); void foo (void) { 0100 9D NOP NOP(); bar(); } void bar (void) { 0101 9D NOP NOP(); 0102 81 RTS } void main (void) { 0103 AD FB BSR $0100 foo(); 0105 20 FA BRA $0101 bar(); } __MAIN: FFFE 01 03 --------------0F4412A512EFDD0498C5F87E Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> <html> &nbsp; <p>CBFalconer wrote: <blockquote TYPE=CITE>Walter Banks wrote: <br>> CBFalconer wrote: <br>> <br>... snip ... <br>>> <br>>> Well, that looks impressive, but you must be loosing something. <br>>> You must be doing something illegal and non-understandable (to a C <br>>> programmer) with one or more of indentation, braces placement, <br>>> illegal statements (a call to foo should never enter bar).&nbsp; I see <br>>> no reason for bar to exit while foo falls through. <br>> <br>> I should have used fixed point type to make the listing fragment <br>> clearer. This is the source used in the example. <br>> <br>> void bar (void); <br>> <br>> void foo (void) <br>>&nbsp;&nbsp; { <br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NOP(); <br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; bar(); <br>>&nbsp;&nbsp; } <br>> <br>> void bar (void) <br>>&nbsp;&nbsp; { <br>>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NOP(); <br>>&nbsp;&nbsp; } <br>> <br>> void main (void) <br>>&nbsp;&nbsp; { <br>>&nbsp;&nbsp;&nbsp;&nbsp; foo(); <br>>&nbsp;&nbsp;&nbsp;&nbsp; bar(); <br>>&nbsp;&nbsp; } <p>Well, that executes foo (and thus bar), followed by bar.&nbsp; I see no <br>savings there from fall-thru.&nbsp; See my message of Sat. 11:13 am EDT <br>-0400.</blockquote> <p><br>There is a savings <p>Look at the listing I posted before. It follows in fixed point type. <br>Don't start a rant about html please <p>w.. <br>&nbsp; <br>&nbsp; <p><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; void bar (void);</tt> <br><tt></tt>&nbsp;<tt></tt> <p><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; void foo (void)</tt> <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {</tt> <br><tt>0100 9D&nbsp;&nbsp;&nbsp;&nbsp; NOP&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NOP();</tt> <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; bar();</tt> <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</tt><tt></tt> <p><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; void bar (void)</tt> <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {</tt> <br><tt>0101 9D&nbsp;&nbsp;&nbsp;&nbsp; NOP&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; NOP();</tt> <br><tt>0102 81&nbsp;&nbsp;&nbsp;&nbsp; RTS&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</tt> <br><tt></tt>&nbsp;<tt></tt> <p><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; void main (void)</tt> <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; {</tt> <br><tt>0103 AD FB&nbsp; BSR&nbsp;&nbsp;&nbsp; $0100&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; foo();</tt> <br><tt>0105 20 FA&nbsp; BRA&nbsp;&nbsp;&nbsp; $0101&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; bar();</tt> <br><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; }</tt><tt></tt> <p><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; __MAIN:</tt> <br><tt>FFFE 01 03</tt> <br>&nbsp;</html> --------------0F4412A512EFDD0498C5F87E--
CBFalconer wrote:
> David Brown wrote: > ... snip ... >> That's just tail call elimination (changing a "call X; ret" into >> a "jmp X"), which is a standard optimisation technique (some >> assemblers will do that for you). >> >> A better example would be: >> >> WriteSpace: >> ld a, #' ' >> WriteChar: >> st a, outputCharacter >> ret >> >> with C code: >> >> extern volatile char outputCharacter; >> void WriteChar(char c) { >> outputCharacter = c; >> } >> void WriteSpace(void) { >> WriteChar(' '); >> } > > But that doesn't do anything, because normal C executes a return on > the closing brace. Am I missing something? >
You must be missing something :-) Your example code was not very helpful, because your first version implied that foo is a callable function in its own right - making a combined fall-through foobar would require duplicating the code for foo. Thus Walter did a direct translation to C and generated code that was slightly better than your first assembly code. In the code I've given, I wrote an assembly function with two distinct entry points, and the typical equivalent C code for it. The question is, will Walter's C compiler generate a fall-through here?
CBFalconer wrote:
> David Brown wrote: > ... snip ... >> A quick test on avr-gcc 4.2.2, using 16-bit and 8-bit ints rather >> than 32-bit and 16-bit (since it's an 8-bit cpu) reveals that >> avr-gcc is smart enough to do a 8-bit x 8-bit -> 16-bit multiply >> as desired. It's a little harder to see exactly what is >> happening for bigger numbers and for division, since these use >> library calls - certainly the compiler will generalise some of >> these functions. But for the very common case of the multiply >> like this, you get optimal code. > > Defining 'optimal' is a varying target. Among others, see Knuth. > In particular, in the past I have compromised on an 8 * 16 -> 24 > bit heart, two of which, with an addition, produced a 16 * 16 -> 32 > multiplication. This had, on the machine of interest (an 8080), > significant advantages, i.e. about a 50% decrease in multiplication > times. Other games are available at the compile stage where one > operand is constant, especially those where the multiplier consists > of some solid string of 1 bits. >
Yes, "optimal" can mean different things - code size, speed, stack use and ram size being the most common points. "optimal" also depends on things like shared library code, and any other information that the compiler may have. That's why I restricted my test to a simple 8x8->16 multiply on the AVR - the generated code is simple enough to be optimal in every way.
In article <481452FF.C82B6E1C@bytecraft.com>, Walter Banks says...
> > > Robert Adsett wrote: > > > mul a,b,c ; b * c -> (a,b) 16bit x 16bit -> 32bit multiply > > div a,d ; (a,b)/d -> a 32bit / 16bit -> 16bit divide > > > > It's something I do write in asm to take advantage of a processors > > scaling capability. > > Robert, > > A lot of approach depends on processor. We use the "as if" > rule a lot in code generation. In general 8*8->16 bits will > use a processor 8*8 if we can. Similarly we grab the MS 8bits > when we multiply two 8 bit fracts rather than casting and using > a 32 bit multiply.
Good to know, thanks Walter. Robert ** Posted from http://www.teranews.com **