single ARM instruction to copy C into r0 ?| page 3

Reply by Francois Grieu ●February 15, 20072007-02-15

In article <1171550594.264411.38720@j27g2000cwj.googlegroups.com>,
 "tum_" <atoumantsev_spam@mail.ru> wrote:

> On Feb 15, 1:09 pm, Francois Grieu <fgrieu@gmail.com> wrote:
>> is it worth, neutral, counterproductive or impossible to
>> reformulate STMFD  SP!,{r4} into something like
>>   STR r4, [r13,#-4]
>> (did I get this right?);
> 
> ;) you forgot the '!' at the end (but I had to peep into the manual
> to correct you, I don't know this stuff by heart).

I did mean pre-indexed, but failed to read the ! on the PDF.

"Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> wrote:
]  LDM/STM of one register takes 2 cycles on ARM9 while a
]  LDR takes just 1, so it is best to avoid single register
]  LDMs on ARM9.

Thanks!  I take good notice that STR r4, [r13,#-4]! works
faster than STMFD SP!,{r4} on ARM922T; and that apparently
the modern idiom is POP {r4}

>> that kind of thing would be wise on a
>> 680x0 (assuming condition codes do not matter).
> 
> Why would it be wise? (not familiar with 68k)

The 68k also has multiple move instructions to save and restore
several registers; and similar to ARM9, when dealing with a single
register, multiple move is slower (also: less dense) than a standard
move; an additional twist is that the effect on status bits might
not be the same.

The ARM922T looks a bit like a 68030 gone RISC, with lots of nice
additional twists (Thumb, UMLAL)

  Francois Grieu

Reply by Wilco Dijkstra ●February 15, 20072007-02-15

"Francois Grieu" <fgrieu@gmail.com> wrote in message 
news:fgrieu-5DF930.20364215022007@news-4.proxad.net...
> In article <4S_Ah.11285$Zl6.274@newsfe3-win.ntli.net>,
> "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> proposed:

>> Note this assumes r1 + r3 doesn't overflow, ie. the array pointed to by 
>> r3
>> doesn't wrap around at the end of memory.
>
> Actually the assumption is stronger, and quite a bit less safe: it is that
> the array pointed to by r3 does not REACH 0xFFFFFFFF.   ADDS r3, r3, r1
> is still a nice trick, if not one that I would dare to promote heavily.

Correct, this is the standard assumption of C/C++, eg. int arr[100]
implies that &arr[100] != NULL. Many compilers and in particular loop
optimizers use stronger assumptions than that to allow prefetching
and other advanced loop transformations. All this is considered safe,
especially since the first and last 4KB are typically reserved by the OS.

> Also I like PUSH {r14} / POP {pc}
> After considerable hunt in ARM DDI 0100E (2000-06-23), I conclude that
> "On architecture version 5 and above" (my target), it is a perfectly
> legitimate idiom to preserve a working register, and return, including
> switching back to Thumb mode as needed.

Yes, architecture 5 made interworking much simpler so you don't
need veneers or special return sequences anymore.

> This can be put to excellent use in a lot of code; looks like if a
> terminal subroutine needs to preserve some registers for temp usage,
> it pays to make r14 part of the temporary registers pool, and return
> by restoring the saved r14/LR into r15/PC, leaving r14 indeterminate,
> which is allowed by the usual calling conventions.

Yes, this is good for codesize. Good compilers even use STM to create
a small amount of stack:

PUSH {r0-r2, r4, lr} ; push r4, lr and create 12 bytes of stack space
...
POP {r1-r3, r4, pc} ; remove 12 bytes of stack, restore r4,lr and return r0

High-end ARMs typically transfer 2 registers per cycle and may execute
other instructions in parallel, so it doesn't cost much performance.

> Thanks a lot, Wilco Dijkstra. BTW that was fun to see somemone with
> your name use   B loopstart ;-)
> Is it a FAQ to ask the relationship with Edsger W. Dijkstra?

Yes... No there is no relation, but I dislike goto's as well!

Wilco

Reply by tum_ ●February 16, 20072007-02-16

On Feb 15, 7:36 pm, Francois Grieu <fgr...@gmail.com> wrote:
> In article <4S_Ah.11285$Zl6....@newsfe3-win.ntli.net>,
>  "Wilco Dijkstra" <Wilco_dot_Dijks...@ntlworld.com> proposed:
>
>
>
> > (using new UAL syntax):
>
> >; perform result = X+Y (expressed as little-endian radix 2^32)
> >; on entry:
> >;   r0 points to result
> >;   r1 and r2 point to sources X and Y
> >;   r3 length in byte of X, Y and result, a non-negative multiple of 4
> >; on exit:
> >;   r0 is 1 or 0 depending on if result overflows or not
> >     PUSH    {r14}
> >     ADDS    r3, r3, r1          ; r3 = r3 + r1, r3 points after end of X, C = 0
> >     B       loopstart
> > addloop
> >     LDR     r14, [r1], #4       ; get 32-bit from X, advance pointer
> >     LDR     r12, [r2], #4       ; get 32-bit from Y, advance pointer
> >     ADCS    r14, r14, r12       ; C:r4 = r14+r12+C  (the actual arithmetic)
> >     STR     r14, [r0], #4       ; store 32-bit into result, advance pointer
> > loopstart
> >     EORS    r14, r1, r3         ; Z = (r1==r3), r14 = 0
> >     BNE     addloop             ; -> loop until r1 reaches r3
> >     ADC     r0, r14, #0         ; r0 = C
> >     POP     {pc}
>
[snip]

> Also I like PUSH {r14} / POP {pc}
> After considerable hunt in ARM DDI 0100E (2000-06-23), I conclude that
> "On architecture version 5 and above" (my target), it is a perfectly

Errm, am I missing something - you mentioned ARM922T, which is
ARM9TDMI core and according to the manual it's ARM Architecture v4T.
Or are you just talking about a different target you're working on?

[snip]

>
> Thanks a lot, Wilco Dijkstra. BTW that was fun to see somemone with
> your name use   B loopstart ;-)
> Is it a FAQ to ask the relationship with Edsger W. Dijkstra?
>
>    Francois Grieu

:-))

Reply by Francois Grieu ●February 19, 20072007-02-19

tum_ a wrote :
> On Feb 15, 7:36 pm, Francois Grieu <fgr...@gmail.com> wrote:
> > In article <4S_Ah.11285$Zl6....@newsfe3-win.ntli.net>,
> >  "Wilco Dijkstra" <Wilco_dot_Dijks...@ntlworld.com> proposed:
> >
> > > (using new UAL syntax):
> >
> > >; perform result = X+Y (expressed as little-endian radix 2^32)
> > >; on entry:
> > >;   r0 points to result
> > >;   r1 and r2 point to sources X and Y
> > >;   r3 length in byte of X, Y and result, a non-negative multiple of 4
> > >; on exit:
> > >;   r0 is 1 or 0 depending on if result overflows or not
> > >     PUSH    {r14}
> > >     ADDS    r3, r3, r1          ; r3 = r3 + r1, r3 points after end of X, C = 0
> > >     B       loopstart
> > > addloop
> > >     LDR     r14, [r1], #4       ; get 32-bit from X, advance pointer
> > >     LDR     r12, [r2], #4       ; get 32-bit from Y, advance pointer
> > >     ADCS    r14, r14, r12       ; C:r4 = r14+r12+C  (the actual arithmetic)
> > >     STR     r14, [r0], #4       ; store 32-bit into result, advance pointer
> > > loopstart
> > >     EORS    r14, r1, r3         ; Z = (r1==r3), r14 = 0
> > >     BNE     addloop             ; -> loop until r1 reaches r3
> > >     ADC     r0, r14, #0         ; r0 = C
> > >     POP     {pc}
> >
> [snip]
>
> > Also I like PUSH {r14} / POP {pc}
> > After considerable hunt in ARM DDI 0100E (2000-06-23), I conclude that
> > "On architecture version 5 and above" (my target), it is a perfectly
>
> Errm, am I missing something - you mentioned ARM922T, which is
> ARM9TDMI core and according to the manual it's ARM Architecture v4T.
> Or are you just talking about a different target you're working on?

Thanks tum_ for pointing that self-contradiction. Indeed my current
target uses an ARM922T core, I can't figure out why I assumed
architecture version 5, and it looks like POP {pc} will not return to
Thumb mode if needed :-(

   Francois Grieu

Reply by tum_ ●February 20, 20072007-02-20

On Feb 19, 9:47 pm, "Francois Grieu" <fgr...@gmail.com> wrote:
> tum_ a wrote :
>
>
>
> > On Feb 15, 7:36 pm, Francois Grieu <fgr...@gmail.com> wrote:
> > > In article <4S_Ah.11285$Zl6....@newsfe3-win.ntli.net>,
> > >  "Wilco Dijkstra" <Wilco_dot_Dijks...@ntlworld.com> proposed:
>
> > > > (using new UAL syntax):
>
> > > >; perform result = X+Y (expressed as little-endian radix 2^32)
> > > >; on entry:
> > > >;   r0 points to result
> > > >;   r1 and r2 point to sources X and Y
> > > >;   r3 length in byte of X, Y and result, a non-negative multiple of 4
> > > >; on exit:
> > > >;   r0 is 1 or 0 depending on if result overflows or not
> > > >     PUSH    {r14}
> > > >     ADDS    r3, r3, r1          ; r3 = r3 + r1, r3 points after end of X, C = 0
> > > >     B       loopstart
> > > > addloop
> > > >     LDR     r14, [r1], #4       ; get 32-bit from X, advance pointer
> > > >     LDR     r12, [r2], #4       ; get 32-bit from Y, advance pointer
> > > >     ADCS    r14, r14, r12       ; C:r4 = r14+r12+C  (the actual arithmetic)
> > > >     STR     r14, [r0], #4       ; store 32-bit into result, advance pointer
> > > > loopstart
> > > >     EORS    r14, r1, r3         ; Z = (r1==r3), r14 = 0
> > > >     BNE     addloop             ; -> loop until r1 reaches r3
> > > >     ADC     r0, r14, #0         ; r0 = C
> > > >     POP     {pc}
>
> > [snip]
>
> > > Also I like PUSH {r14} / POP {pc}
> > > After considerable hunt in ARM DDI 0100E (2000-06-23), I conclude that
> > > "On architecture version 5 and above" (my target), it is a perfectly
>
> > Errm, am I missing something - you mentioned ARM922T, which is
> > ARM9TDMI core and according to the manual it's ARM Architecture v4T.
> > Or are you just talking about a different target you're working on?
>
> Thanks tum_ for pointing that self-contradiction. Indeed my current
> target uses an ARM922T core, I can't figure out why I assumed
> architecture version 5, and it looks like POP {pc} will not return to
> Thumb mode if needed :-(
>
>    Francois Grieu

And thanks to Wilco Dijkstra for mentioning the 'new UAL syntax', I've
never heard about it before and learned something new. DUI 0204F (from
the ARM site) explains things quite well. (Unfortunately, I don't have
the luxury of choosing development tools, I'm stuck with gcc).

Previous 1 23Next

single ARM instruction to copy C into r0 ?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group