single ARM instruction to copy C into r0 ?

Hello,

I'm trying to replace the ARM sequence
    MOV  R0, #0
    ADC  R0, R0, #0

with a single ARM instruction that copy C into R0,
clearing bits 1..31. I do not care for status bits
afterwards, and have no register with a known value.

Any idea ?

  Francois Grieu

Reply by tum_ ●February 14, 20072007-02-14

On Feb 14, 5:26 pm, Francois Grieu <fgr...@gmail.com> wrote:
> Hello,
>
> I'm trying to replace the ARM sequence
>     MOV  R0, #0
>     ADC  R0, R0, #0
>
> with a single ARM instruction that copy C into R0,
> clearing bits 1..31. I do not care for status bits
> afterwards, and have no register with a known value.
>
> Any idea ?
>
>   Francois Grieu

Hi Francois,

I gave myself about an hour to think about your riddle and I don't
think it's possible.
I'd be very glad to hear the opposite.
(Still thinking about it, although it's really time to have a nap)

PS. Why comp.arch.embedded? Try comp.sys.arm....

Reply by Boudewijn Dijkstra ●February 15, 20072007-02-15

Op Wed, 14 Feb 2007 18:26:08 +0100 schreef Francois Grieu  
<fgrieu@gmail.com>:
> I'm trying to replace the ARM sequence
>     MOV  R0, #0
>     ADC  R0, R0, #0
>
> with a single ARM instruction that copy C into R0,
> clearing bits 1..31. I do not care for status bits
> afterwards, and have no register with a known value.
>
> Any idea ?

Only one of these instructions will be executed:

MOVCC R0, #0
MOVCS R0, #1

Does that count?


-- 
Gemaakt met Opera's revolutionaire e-mailprogramma:  
http://www.opera.com/mail/

Reply by Peter Dickerson ●February 15, 20072007-02-15

"Boudewijn Dijkstra" <boudewijn@indes.com> wrote in message 
news:op.tnr849dcy6p7a2@ragnarok.lan...
> Op Wed, 14 Feb 2007 18:26:08 +0100 schreef Francois Grieu 
> <fgrieu@gmail.com>:
>> I'm trying to replace the ARM sequence
>>     MOV  R0, #0
>>     ADC  R0, R0, #0
>>
>> with a single ARM instruction that copy C into R0,
>> clearing bits 1..31. I do not care for status bits
>> afterwards, and have no register with a known value.
>>
>> Any idea ?
>
> Only one of these instructions will be executed:
>
> MOVCC R0, #0
> MOVCS R0, #1
>
> Does that count?

Surely that depends on your view of what 'executed' means in this context. 
Is there an ARM processor where the unexecuted instruction takes no (extra) 
time. I tend to think of these conditional instructions as going down the 
pipeline, having (some of) their results calculated, but then the writeback 
inhibited. To be not executed at all, not taking up any execution resources 
(saving power or time), would require the knowledge of the carry flag 
setting during decode, stalling decode if there is a carry-changing 
instruction ahead in the pipeline (or carry prediction).

Of course, if not executed means having no architechtural side effects then 
presumably NOP is also not executed. OK, I'm being a pedant.

Anyway, the best I can do in one instruction is SBC R0,Rn,Rn but this setc 
R0 to -1 if no carry and 0 if carry - one low in either case. Perhaps this 
can be compensated for later in the instruction stream but we have no info 
on how the result is to be used.

Peter

Reply by tum_ ●February 15, 20072007-02-15

On Feb 15, 9:53 am, "Peter Dickerson"
<firstname.lastn...@REMOVE.tesco.net> wrote:
> "Boudewijn Dijkstra" <boudew...@indes.com> wrote in message
>
> news:op.tnr849dcy6p7a2@ragnarok.lan...
>
>
>
> > Op Wed, 14 Feb 2007 18:26:08 +0100 schreef Francois Grieu
> > <fgr...@gmail.com>:
> >> I'm trying to replace the ARM sequence
> >>     MOV  R0, #0
> >>     ADC  R0, R0, #0
>
> >> with a single ARM instruction that copy C into R0,
> >> clearing bits 1..31. I do not care for status bits
> >> afterwards, and have no register with a known value.
>
> >> Any idea ?
>
> > Only one of these instructions will be executed:
>
> > MOVCC R0, #0
> > MOVCS R0, #1
>
> > Does that count?

;) I'm quite sure it doesn't.
1. This is too obvious
2. Francois specified 'single instruction'.

> Surely that depends on your view of what 'executed' means in this context.
> Is there an ARM processor where the unexecuted instruction takes no (extra)
> time. I tend to think of these conditional instructions as going down the

In a few datasheets that I've read it was explicitly stated that
'unexecuted' conditional instructions take 1 cycle.

> pipeline, having (some of) their results calculated, but then the writeback
> inhibited. To be not executed at all, not taking up any execution resources
> (saving power or time), would require the knowledge of the carry flag
> setting during decode, stalling decode if there is a carry-changing
> instruction ahead in the pipeline (or carry prediction).
>
> Of course, if not executed means having no architechtural side effects then
> presumably NOP is also not executed. OK, I'm being a pedant.

AFAIK, there's no NOP in ARM, "mov r0,r0" is used instead (being
pedantic too).


> Anyway, the best I can do in one instruction is SBC R0,Rn,Rn but this setc
> R0 to -1 if no carry and 0 if carry - one low in either case. Perhaps this
> can be compensated for later in the instruction stream but we have no info
> on how the result is to be used.
>
> Peter

I've been cracking my brains with a way to use RRX shift somehow but
so far no luck.
I agree that if the OP gave us a slightly larger picture we could be
more productive with proposals but I guess he doesn't want to.

Disclaimer: I'm not familiar with all ARM architectures & variants, so
some of my statements may be wrong.

Reply by Francois Grieu ●February 15, 20072007-02-15

In article <I4WAh.10129$tz6.6642@newsfe2-gui.ntli.net>,
 "Peter Dickerson" <firstname.lastname@REMOVE.tesco.net> wrote:

> Anyway, the best I can do in one instruction is SBC R0,Rn,Rn but this setc 
> R0 to -1 if no carry and 0 if carry - one low in either case. Perhaps this 
> can be compensated for later in the instruction stream but we have no info 
> on how the result is to be used.

The context is producing, then immediately storing, the last word of
the result (on m+1 word) of addition of two m-word integers in
radix-2^32 representation; thus SBC R0,R0,R0 won't do without a
major functional change.

The feedback seems to confirm my impression that there is no way to
pervert the ARM instruction set into doing what I want. I which
I knew a source with examples of useful ARM idioms; my current bible
is the ARM Architecture Reference Manual (ARM DDI 100E, 2000-06-23)
and it is a bit shy on examples.

  Francois Grieu

Reply by Francois Grieu ●February 15, 20072007-02-15

In article <1171535295.984758.30100@m58g2000cwm.googlegroups.com>,
 "tum_" <atoumantsev_spam@mail.ru> wrote:

> if the OP gave us a slightly larger picture we could be
> more productive with proposals but I guess he doesn't want to.

I can tell without needing legal advice that
- CPU core is ARM922T
- context is this routine (not tested), performing addition of
  two m-word integers in radix-2^32 representation
- caller will immediately store the returned r0, and
  I do not want to change calling convention

; perform result = X+Y (expressed as little-endian radix 2^32)
; on entry:
;   r0 points to result
;   r1 and r2 point to sources X and Y
;   r3 length in byte of X, Y and result, a non-negative multiple of 4
; on exit:
;   r0 is 1 or 0 depending on if result overflows or not
        STMFD           SP!,{r4-r5}             ; save temp registers used

        ADDS            r3, r3, #0              ; Z = (r3==0), C=0
        ADD             r3, r3, r1              ; r3 = r3 + r1, r3 points after end of X
        BEQ             adddone                 ; -> early abort if Z is set (zero length)
addloop
        LDR             r4, [r1], #4            ; get 32-bit from X, advance pointer
        LDR             r5, [r2], #4            ; get 32-bit from Y, advance pointer
        ADCS            r4, r4, r5              ; C:r4 = r4+r5+C  (the actual arithmetic)
        STR             r4, [r0], #4            ; store 32-bit into result, advance pointer
        TEQ             r1, r3                  ; Z = (r1==r3)
        BNE             addloop                 ; -> loop until r1 reaches r3
adddone
        MOV             R0, #0
        ADC             R0, R0, #0              ; r0 = C (could we save one instruction ?)
        LDMIA           SP!,{r4-r5}             ; restore temp registers used
        BX              LR                      ; return to caller

Optimizing this is actually not critical, but I'm compacting the code to
the max as an intellectual exercise to deeply familiarize myself with ARM.


  Francois Grieu

Reply by Boudewijn Dijkstra ●February 15, 20072007-02-15

Op Thu, 15 Feb 2007 11:28:16 +0100 schreef tum_ <atoumantsev_spam@mail.ru>:
> On Feb 15, 9:53 am, "Peter Dickerson"
> <firstname.lastn...@REMOVE.tesco.net> wrote:
>> "Boudewijn Dijkstra" <boudew...@indes.com> wrote in message
>> news:op.tnr849dcy6p7a2@ragnarok.lan...
>>
>> > Op Wed, 14 Feb 2007 18:26:08 +0100 schreef Francois Grieu
>> > <fgr...@gmail.com>:
>> >> I'm trying to replace the ARM sequence
>> >>     MOV  R0, #0
>> >>     ADC  R0, R0, #0
>>
>> >> with a single ARM instruction that copy C into R0,
>> >> clearing bits 1..31. I do not care for status bits
>> >> afterwards, and have no register with a known value.
>>
>> >> Any idea ?
>>
>> > Only one of these instructions will be executed:
>>
>> > MOVCC R0, #0
>> > MOVCS R0, #1
>>
>> > Does that count?
>
> ;) I'm quite sure it doesn't.
> 1. This is too obvious

Often the obvious solution is accompanied with: "Why didn't I think of  
this before?"

> 2. Francois specified 'single instruction'.

He didn't specify whether it was supposed to be a stored instruction or an  
executed instruction.

>> Surely that depends on your view of what 'executed' means in this  
>> context.
>> Is there an ARM processor where the unexecuted instruction takes no  
>> (extra)
>> time. I tend to think of these conditional instructions as going down  
>> the
>
> In a few datasheets that I've read it was explicitly stated that
> 'unexecuted' conditional instructions take 1 cycle.

Yes.  The execution stage of the pipeline just waits for the next  
instruction to ripple through.



-- 
Gemaakt met Opera's revolutionaire e-mailprogramma:  
http://www.opera.com/mail/

Reply by tum_ ●February 15, 20072007-02-15

On Feb 15, 10:58 am, Francois Grieu <fgr...@gmail.com> wrote:
> In article <1171535295.984758.30...@m58g2000cwm.googlegroups.com>,
>
>  "tum_" <atoumantsev_s...@mail.ru> wrote:
> > if the OP gave us a slightly larger picture we could be
> > more productive with proposals but I guess he doesn't want to.
>
> I can tell without needing legal advice that
> - CPU core is ARM922T
> - context is this routine (not tested), performing addition of
>   two m-word integers in radix-2^32 representation
> - caller will immediately store the returned r0, and
>   I do not want to change calling convention
>
> ; perform result = X+Y (expressed as little-endian radix 2^32)
> ; on entry:
> ;   r0 points to result
> ;   r1 and r2 point to sources X and Y
> ;   r3 length in byte of X, Y and result, a non-negative multiple of 4
> ; on exit:
> ;   r0 is 1 or 0 depending on if result overflows or not
>         STMFD           SP!,{r4-r5}             ; save temp registers used
>
>         ADDS            r3, r3, #0              ; Z = (r3==0), C=0
>         ADD             r3, r3, r1              ; r3 = r3 + r1, r3 points after end of X
>         BEQ             adddone                 ; -> early abort if Z is set (zero length)
> addloop
>         LDR             r4, [r1], #4            ; get 32-bit from X, advance pointer
>         LDR             r5, [r2], #4            ; get 32-bit from Y, advance pointer
>         ADCS            r4, r4, r5              ; C:r4 = r4+r5+C  (the actual arithmetic)
>         STR             r4, [r0], #4            ; store 32-bit into result, advance pointer
>         TEQ             r1, r3                  ; Z = (r1==r3)
>         BNE             addloop                 ; -> loop until r1 reaches r3
> adddone
>         MOV             R0, #0
>         ADC             R0, R0, #0              ; r0 = C (could we save one instruction ?)
>         LDMIA           SP!,{r4-r5}             ; restore temp registers used
>         BX              LR                      ; return to caller
>
> Optimizing this is actually not critical, but I'm compacting the code to
> the max as an intellectual exercise to deeply familiarize myself with ARM.
>
>   Francois Grieu

After 20 minutes of thinking: I can't squeeze it any further, let's
see what others say.
All that I can propose is:
1) use r12 instead of r5. r12 doesn't have to be preserved. This will
improve speed & stack usage.
2) swap BEQ and ADD instructions, this will improve speed in case of
zero length ;-).
3) When size is the issue consider using Thumb (I understand that your
goal is an exercise with ARM, not Thumb).

ps. my previous post still didn't appear in the thread (I use Google
Groups), hopefully it will appear later but I'll paste the link here
just in case:
http://www.ee.ic.ac.uk/pcheung/teaching/ee2_computing/arm/Progtech.pdf

Reply by Francois Grieu ●February 15, 20072007-02-15

In article <1171539997.883911.255790@j27g2000cwj.googlegroups.com>,
 "tum_" <atoumantsev_spam@mail.ru> wrote:

> use r12 instead of r5. r12 doesn't have to be preserved.
> This will improve speed & stack usage.

Thanks, had missed that one, although it is implied by
http://www.arm.com/miscPDFs/8031.pdf

Seems like, in a piece of code with only self-references
(no linker veener), and calling no external code, there is a
carved-in-a-next-as-strong-as-hardware-stone rule that
register r12 belongs to me.

After this optimization, is it worth, neutral, counterproductive
or impossible to reformulate STMFD  SP!,{r4} into something like
  STR r4, [r13,#-4]
(did I get this right?); that kind of thing would be wise on a
680x0 (assuming condition codes do not matter).

> When size is the issue consider using Thumb (I understand
> that your goal is an exercise with ARM, not Thumb)

Yes, thanks. Also, in the context, since there is no TEQ in Thumb,
I found no way to loop without interfering with the C bit, this
in turn made some extra instructions necessary; but indeed,
probably still a bit more compact.

  Francois Grieu

Previous12 3 Next

single ARM instruction to copy C into r0 ?

Sign in

You might also like...

Search forums

Free PDF Downloads

Blogs - Hall of Fame

Discussion Groups

Quick Links

About EmbeddedRelated.com

Social Networks

The Related Media Group