EmbeddedRelated.com
Forums
Memfault Beyond the Launch

Inline assembler on PowerPC

Started by David R Brooks June 13, 2005
Consider the following (compiler=GCC3.4.3, host=I686,
target=powerpc-eabi):

typedef void(*pVoid)(void);

static inline bool1 kSetVector(uint1 level, pVoid func, int type) {   
     int r;
     const int code = 0;                                  	
     __asm__ __volatile__ (                  	
     " li 0, %1 \n"  /* code */       
     " mr 3, %2 \n"  /* level */
     " mr 4, %3 \n"  /* func  */               	
     " mr 5, %4 \n"  /* type  */               	
     " sc       \n"  /* System Call: may corrupt regs: result in r3 */
     " mr %0, 3 \n"  /* Return result */      	
     : "=r" (r)
     : "rI" (code), "0" (level), "r" (func), "r" (type)  	
     : "r0", "cc", "memory"     	
     );                                      	
     return r;
}
...
(void)kSetVector(31, SerialIoInterrupt, 3);

 This compiles, & runs fine (producing the code below). However I
would like to improve the efficiency, by eliminating the "mr"
instructions to move arguments to & from registers. The "sc" needs the
data in precisely the registers shown, so GCC needs to be coaxed into
using those registers itself. 

Generated code (comments added):

  54:h/services.h  **** static inline bool1 kSetVector(uint1 level,
 pVoid func, int type) {
 203                 .loc 2 54 0
 204 019c 3940001F   li 10,31			 /* level */
 205 01a0 3D200000   lis 9,SerialIoInterrupt@ha	 /* func  */
 206 01a4 39290000   la 9,SerialIoInterrupt@l(9)
 207 01a8 39600003   li 11,3			 /* type  */
 208                .LBB3:
  55:h/services.h  ****      int r;
  56:h/services.h  ****      const int code = 0;
  57:h/services.h  ****      __asm__ __volatile__ (
 209                 .loc 2 57 0
 210 01ac 38000000   li 0, 0
 211 01b0 7D435378   mr 3, 10      /* The "mr's" I want to remove */
 212 01b4 7D244B78   mr 4, 9
 213 01b8 7D655B78   mr 5, 11
 214 01bc 44000002   sc
 215 01c0 7C6A1B78   mr 10, 3      /* result */

 In the X86 builds of GCC, there are "register loading codes", as "c",
"a" & "D" in the following example (from: "Using Inline Assembly With
gcc" by Clark L. Coleman).

asm ("cld\n\t" "rep\n\t" "stosl" 
      : /* no output registers */ 
      : "c" (count), "a" (fill_value), "D" (dest) 
      : "%ecx", "%edi" );

 Is there a similar device for the PowerPC, whereby I can tell GCC to
create the values in specific registers, so eliminating the need for
those "mr" instructions?
 TIA,

On Mon, 13 Jun 2005 21:50:28 +0800, David R Brooks wrote:

> Consider the following (compiler=GCC3.4.3, host=I686, > target=powerpc-eabi): > > typedef void(*pVoid)(void); > > static inline bool1 kSetVector(uint1 level, pVoid func, int type) { > int r; > const int code = 0; > __asm__ __volatile__ ( > " li 0, %1 \n" /* code */ > " mr 3, %2 \n" /* level */ > " mr 4, %3 \n" /* func */ > " mr 5, %4 \n" /* type */ > " sc \n" /* System Call: may corrupt regs: result in r3 */ > " mr %0, 3 \n" /* Return result */ > : "=r" (r) > : "rI" (code), "0" (level), "r" (func), "r" (type) > : "r0", "cc", "memory" > ); > return r; > } > ... > (void)kSetVector(31, SerialIoInterrupt, 3); > > This compiles, & runs fine (producing the code below). However I > would like to improve the efficiency, by eliminating the "mr" > instructions to move arguments to & from registers. The "sc" needs the > data in precisely the registers shown, so GCC needs to be coaxed into > using those registers itself.
Imho, the easiest way is to do it ... in C: static inline bool1 kSetVector (uint1 level, pVoid func, int type) { register uint1 _level __asm__ ("r3"); register pVoid _func __asm__ ("r4"); register int _type __asm__ ("r5"); _level = level; _func = func; _type = type; __asm__ __volatile__ ( "li 0, %1 \n" "sc \n" : "=r" (_level) : "rI" (code) : "r0", "cc", "memory"); return _level; } Then gcc will be able to optimise variables allocations then only produce mr or lwz if necessary. The second thing to consider is that this code is more easily readable than any inline assembly dependency. The only drawback is that you have to use the same local variable for the first argument and the returned value. [...]
l'indien wrote:
> On Mon, 13 Jun 2005 21:50:28 +0800, David R Brooks wrote: > > >>Consider the following (compiler=GCC3.4.3, host=I686, >>target=powerpc-eabi): >> >>typedef void(*pVoid)(void); >> >>static inline bool1 kSetVector(uint1 level, pVoid func, int type) { >> int r; >> const int code = 0; >> __asm__ __volatile__ ( >> " li 0, %1 \n" /* code */ >> " mr 3, %2 \n" /* level */ >> " mr 4, %3 \n" /* func */ >> " mr 5, %4 \n" /* type */ >> " sc \n" /* System Call: may corrupt regs: result in r3 */ >> " mr %0, 3 \n" /* Return result */ >> : "=r" (r) >> : "rI" (code), "0" (level), "r" (func), "r" (type) >> : "r0", "cc", "memory" >> ); >> return r; >>} >>... >>(void)kSetVector(31, SerialIoInterrupt, 3); >> >> This compiles, & runs fine (producing the code below). However I >>would like to improve the efficiency, by eliminating the "mr" >>instructions to move arguments to & from registers. The "sc" needs the >>data in precisely the registers shown, so GCC needs to be coaxed into >>using those registers itself. > > > Imho, the easiest way is to do it ... in C: > static inline bool1 kSetVector (uint1 level, pVoid func, int type) > { > register uint1 _level __asm__ ("r3"); > register pVoid _func __asm__ ("r4"); > register int _type __asm__ ("r5"); > > _level = level; > _func = func; > _type = type; > __asm__ __volatile__ ( > "li 0, %1 \n" > "sc \n" > : "=r" (_level) > : "rI" (code) > : "r0", "cc", "memory"); > > return _level; > } > > Then gcc will be able to optimise variables allocations then only produce > mr or lwz if necessary. > The second thing to consider is that this code is more easily readable > than any inline assembly dependency. > The only drawback is that you have to use the same local variable for the > first argument and the returned value. > > [...] >
Of course, you will still get pretty much the same "mr" instructions in the stand-alone version of the function (if it is generated) - it is only in in-lined versions that they could be eliminated. And I presume you are only doing this optomisation for interest and understanding, not because you are setting vectors so often that 3 cycles delay here will be a serious issue? David
On Tue, 14 Jun 2005 08:59:01 +0200, David Brown wrote:

> l'indien wrote: >> On Mon, 13 Jun 2005 21:50:28 +0800, David R Brooks wrote: >> >> >>>Consider the following (compiler=GCC3.4.3, host=I686, >>>target=powerpc-eabi): >>> >>>typedef void(*pVoid)(void); >>> >>>static inline bool1 kSetVector(uint1 level, pVoid func, int type) { >>> int r; >>> const int code = 0; >>> __asm__ __volatile__ ( >>> " li 0, %1 \n" /* code */ >>> " mr 3, %2 \n" /* level */ >>> " mr 4, %3 \n" /* func */ >>> " mr 5, %4 \n" /* type */ >>> " sc \n" /* System Call: may corrupt regs: result in r3 */ >>> " mr %0, 3 \n" /* Return result */ >>> : "=r" (r) >>> : "rI" (code), "0" (level), "r" (func), "r" (type) >>> : "r0", "cc", "memory" >>> ); >>> return r; >>>} >>>... >>>(void)kSetVector(31, SerialIoInterrupt, 3); >>> >>> This compiles, & runs fine (producing the code below). However I >>>would like to improve the efficiency, by eliminating the "mr" >>>instructions to move arguments to & from registers. The "sc" needs the >>>data in precisely the registers shown, so GCC needs to be coaxed into >>>using those registers itself. >> >> >> Imho, the easiest way is to do it ... in C: >> static inline bool1 kSetVector (uint1 level, pVoid func, int type) >> { >> register uint1 _level __asm__ ("r3"); >> register pVoid _func __asm__ ("r4"); >> register int _type __asm__ ("r5"); >> >> _level = level; >> _func = func; >> _type = type; >> __asm__ __volatile__ ( >> "li 0, %1 \n" >> "sc \n" >> : "=r" (_level) >> : "rI" (code) >> : "r0", "cc", "memory"); >> >> return _level; >> } >> >> Then gcc will be able to optimise variables allocations then only produce >> mr or lwz if necessary. >> The second thing to consider is that this code is more easily readable >> than any inline assembly dependency. >> The only drawback is that you have to use the same local variable for the >> first argument and the returned value. >> >> [...] >> > > Of course, you will still get pretty much the same "mr" instructions in > the stand-alone version of the function (if it is generated) - it is > only in in-lined versions that they could be eliminated.
You won't have any mr in the stand-alone version: as the arguments are passed in registers r3 ..., then level already is in r3, func in r4 and type in r5. As the returned argument is into r3, there won't be any mr at all. Then, when I compile this function as a standalone one, I get: 00000000 <kSetVector>: 0: 38 00 00 00 li r0,0 4: 44 00 00 02 sc 8: 4e 80 00 20 blr Which is optimal.
> And I presume you are only doing this optomisation for interest and > understanding, not because you are setting vectors so often that 3 > cycles delay here will be a serious issue?
We always want optimal code, don't we ? ;-)
Many thanks. That works with one addition: you still have to mention
all the arguments to the "sc" (_level, _func, _type) on the inputs
line, else GCC will optimise them away.
I got it down to:

static inline bool1 kSetVector (uint1 level, pVoid func, int type)
{
    register uint1 _code  __asm__ ("r0") = 0;
    register uint1 _level __asm__ ("r3") = level;
    register pVoid _func  __asm__ ("r4") = func;
    register int   _type  __asm__ ("r5") = type;

    __asm__ __volatile__ (
	"sc       \n"
	: "=r" (_level)
        : "rI" (_code), "0" (_level), "r" (_func), "r" (_type)  	
	: "cc", "memory" );

    return _level;
}


l'indien <l_indien_no_more_spams@magic.fr> wrote:

:On Mon, 13 Jun 2005 21:50:28 +0800, David R Brooks wrote:
:
:> Consider the following (compiler=GCC3.4.3, host=I686,
:> target=powerpc-eabi):
:> 
:> typedef void(*pVoid)(void);
:> 
:> static inline bool1 kSetVector(uint1 level, pVoid func, int type) {   
:>      int r;
:>      const int code = 0;                                  	
:>      __asm__ __volatile__ (                  	
:>      " li 0, %1 \n"  /* code */       
:>      " mr 3, %2 \n"  /* level */
:>      " mr 4, %3 \n"  /* func  */               	
:>      " mr 5, %4 \n"  /* type  */               	
:>      " sc       \n"  /* System Call: may corrupt regs: result in r3 */
:>      " mr %0, 3 \n"  /* Return result */      	
:>      : "=r" (r)
:>      : "rI" (code), "0" (level), "r" (func), "r" (type)  	
:>      : "r0", "cc", "memory"     	
:>      );                                      	
:>      return r;
:> }
:> ...
:> (void)kSetVector(31, SerialIoInterrupt, 3);
:> 
:>  This compiles, & runs fine (producing the code below). However I
:> would like to improve the efficiency, by eliminating the "mr"
:> instructions to move arguments to & from registers. The "sc" needs the
:> data in precisely the registers shown, so GCC needs to be coaxed into
:> using those registers itself. 
:
:Imho, the easiest way is to do it ... in C:
:static inline bool1 kSetVector (uint1 level, pVoid func, int type)
:{
:    register uint1 _level __asm__ ("r3");
:    register pVoid _func __asm__ ("r4");
:    register int _type __asm__ ("r5");
:
:    _level = level;
:    _func = func;
:    _type = type;
:    __asm__ __volatile__ (
:	"li 0, %1 \n"
:	"sc       \n"
:	: "=r" (_level)
:	: "rI" (code)
:	: "r0", "cc", "memory");
:
:    return _level;
:}
:
:Then gcc will be able to optimise variables allocations then only produce
:mr or lwz if necessary.
:The second thing to consider is that this code is more easily readable
:than any inline assembly dependency.
:The only drawback is that you have to use the same local variable for the
:first argument and the returned value.
:
:[...]

On Tue, 14 Jun 2005 18:12:31 +0800, David R Brooks wrote:

> Many thanks. That works with one addition: you still have to mention > all the arguments to the "sc" (_level, _func, _type) on the inputs > line, else GCC will optimise them away.
You're absolutely right. I have to admit I wrote it down without testing...
> I got it down to: > > static inline bool1 kSetVector (uint1 level, pVoid func, int type) > { > register uint1 _code __asm__ ("r0") = 0; > register uint1 _level __asm__ ("r3") = level; > register pVoid _func __asm__ ("r4") = func; > register int _type __asm__ ("r5") = type; > > __asm__ __volatile__ ( > "sc \n" > : "=r" (_level) > : "rI" (_code), "0" (_level), "r" (_func), "r" (_type) > : "cc", "memory" ); > > return _level; > }
I just have two questions/remarks: - why don't you directly initialise _code = code ? This would make code even more easy to read and won't product more output code. - I would use "+r" constraint for _level, to follow gcc asm constraints specifications. But, I'm not a specialist on this point, I must admit...
> l'indien <l_indien_no_more_spams@magic.fr> wrote: > > :On Mon, 13 Jun 2005 21:50:28 +0800, David R Brooks wrote: > : > :> Consider the following (compiler=GCC3.4.3, host=I686, > :> target=powerpc-eabi): > :> > :> typedef void(*pVoid)(void); > :> > :> static inline bool1 kSetVector(uint1 level, pVoid func, int type) { > :> int r; > :> const int code = 0; > :> __asm__ __volatile__ ( > :> " li 0, %1 \n" /* code */ > :> " mr 3, %2 \n" /* level */ > :> " mr 4, %3 \n" /* func */ > :> " mr 5, %4 \n" /* type */ > :> " sc \n" /* System Call: may corrupt regs: result in r3 */ > :> " mr %0, 3 \n" /* Return result */ > :> : "=r" (r) > :> : "rI" (code), "0" (level), "r" (func), "r" (type) > :> : "r0", "cc", "memory" > :> ); > :> return r; > :> } > :> ... > :> (void)kSetVector(31, SerialIoInterrupt, 3); > :> > :> This compiles, & runs fine (producing the code below). However I > :> would like to improve the efficiency, by eliminating the "mr" > :> instructions to move arguments to & from registers. The "sc" needs the > :> data in precisely the registers shown, so GCC needs to be coaxed into > :> using those registers itself. > : > :Imho, the easiest way is to do it ... in C: > :static inline bool1 kSetVector (uint1 level, pVoid func, int type) > :{ > : register uint1 _level __asm__ ("r3"); > : register pVoid _func __asm__ ("r4"); > : register int _type __asm__ ("r5"); > : > : _level = level; > : _func = func; > : _type = type; > : __asm__ __volatile__ ( > : "li 0, %1 \n" > : "sc \n" > : : "=r" (_level) > : : "rI" (code) > : : "r0", "cc", "memory"); > : > : return _level; > :} > : > :Then gcc will be able to optimise variables allocations then only produce > :mr or lwz if necessary. > :The second thing to consider is that this code is more easily readable > :than any inline assembly dependency. > :The only drawback is that you have to use the same local variable for the > :first argument and the returned value. > : > :[...]
Answering your questions:
1. _code is explicitly a constant: being the function code. There are
several similar definitions in the header file, having different names
& corresponding function codes. The number of arguments varies too.
2. "+r", although legal in pure asm, is not accepted by GCC.

l'indien <l_indien_no_more_spams@magic.fr> wrote:

:On Tue, 14 Jun 2005 18:12:31 +0800, David R Brooks wrote:
:
:> Many thanks. That works with one addition: you still have to mention
:> all the arguments to the "sc" (_level, _func, _type) on the inputs
:> line, else GCC will optimise them away.
:
:You're absolutely right. I have to admit I wrote it down without testing...
:
:> I got it down to:
:> 
:> static inline bool1 kSetVector (uint1 level, pVoid func, int type)
:> {
:>     register uint1 _code  __asm__ ("r0") = 0;
:>     register uint1 _level __asm__ ("r3") = level;
:>     register pVoid _func  __asm__ ("r4") = func;
:>     register int   _type  __asm__ ("r5") = type;
:> 
:>     __asm__ __volatile__ (
:> 	"sc       \n"
:> 	: "=r" (_level)
:>      : "rI" (_code), "0" (_level), "r" (_func), "r" (_type)  	
:> 	: "cc", "memory" );
:> 
:>     return _level;
:> }
:
:I just have two questions/remarks:
:- why don't you directly initialise _code = code ? This would make code
:even more easy to read and won't product more output code.
:- I would use "+r" constraint for _level, to follow gcc asm constraints
:specifications. But, I'm not a specialist on this point, I must admit...
:
[snip]

In article <pan.2005.06.14.07.57.25.334427@magic.fr>, 
l_indien_no_more_spams@magic.fr says...
> On Tue, 14 Jun 2005 08:59:01 +0200, David Brown wrote: > > > l'indien wrote: > > And I presume you are only doing this optomisation for interest and > > understanding, not because you are setting vectors so often that 3 > > cycles delay here will be a serious issue? > > We always want optimal code, don't we ? ;-)
Actually no. Readable (human readable) and correct first. Optimal is, at best, a distant third. Robert
On Wed, 15 Jun 2005 06:39:40 +0800, David R Brooks wrote:

> Answering your questions: > 1. _code is explicitly a constant: being the function code. There are > several similar definitions in the header file, having different names > & corresponding function codes. The number of arguments varies too.
OK, sorry, I misread your code...
> 2. "+r", although legal in pure asm, is not accepted by GCC.
I did the test, gcc does accept it. "+r" is documented in gcc documentation (I'm using gcc 2.95.3 as a PowerPC cross compiler). [...]
On Tue, 14 Jun 2005 22:49:54 -0400, R Adsett
<radsett@junk.aeolusdevelopment.cm> wrote:

>In article <pan.2005.06.14.07.57.25.334427@magic.fr>, >l_indien_no_more_spams@magic.fr says... >> On Tue, 14 Jun 2005 08:59:01 +0200, David Brown wrote: >> >> > l'indien wrote: >> > And I presume you are only doing this optomisation for interest and >> > understanding, not because you are setting vectors so often that 3 >> > cycles delay here will be a serious issue? >> >> We always want optimal code, don't we ? ;-) > >Actually no. Readable (human readable) and correct first. Optimal is, >at best, a distant third.
Optimal implies correct code. One cannot decribe anything as an optimal solution, if it does not do what it is supposed to do. Things that are obscure at first, become very "Human Readable" if it is the optimum solution to a problem. Readable code for even a complete newby programmer is total black magic to the avarage lay person. Regards Anton Erasmus

Memfault Beyond the Launch