Francois Grieu wrote:>>When size is the issue consider using Thumb (I understand >>that your goal is an exercise with ARM, not Thumb) > > Yes, thanks. Also, in the context, since there is no TEQ in Thumb, > I found no way to loop without interfering with the C bit, this > in turn made some extra instructions necessary; but indeed, > probably still a bit more compact.Doesn't EOR fit your needs? Laurent
single ARM instruction to copy C into r0 ?
Started by ●February 14, 2007
Reply by ●February 15, 20072007-02-15
Reply by ●February 15, 20072007-02-15
Laurent wrote:>> Yes, thanks. Also, in the context, since there is no TEQ in Thumb, >> I found no way to loop without interfering with the C bit, this >> in turn made some extra instructions necessary; but indeed, >> probably still a bit more compact. > > > Doesn't EOR fit your needs?No it doesn't, sorry :) Laurent
Reply by ●February 15, 20072007-02-15
On Feb 15, 1:09 pm, Francois Grieu <fgr...@gmail.com> wrote:> In article <1171539997.883911.255...@j27g2000cwj.googlegroups.com>, > > "tum_" <atoumantsev_s...@mail.ru> wrote: > > use r12 instead of r5. r12 doesn't have to be preserved. > > This will improve speed & stack usage. > > Thanks, had missed that one, although it is implied byhttp://www.arm.com/miscPDFs/8031.pdf > > Seems like, in a piece of code with only self-references > (no linker veener), and calling no external code, there is a > carved-in-a-next-as-strong-as-hardware-stone rule that > register r12 belongs to me.Yes. If you're dealing with software that is conformant with Procedure Call Standards proposed by ARM. There may be software out there that is not conformant.> After this optimization, is it worth, neutral, counterproductive > or impossible to reformulate STMFD SP!,{r4} into something like > STR r4, [r13,#-4] > (did I get this right?);;) you forgot the '!' at the end (but I had to peep into the manual to correct you, I don't know this stuff by heart). To the best of my (limited) knowledge, the two instructions are identical in effect/speed/size for ARM7 & 9 cores.> that kind of thing would be wise on a > 680x0 (assuming condition codes do not matter).Why would it be wise? (not familiar with 68k)> > When size is the issue consider using Thumb (I understand > > that your goal is an exercise with ARM, not Thumb) > > Yes, thanks. Also, in the context, since there is no TEQ in Thumb, > I found no way to loop without interfering with the C bit, this > in turn made some extra instructions necessary; but indeed, > probably still a bit more compact. > > Francois Grieu
Reply by ●February 15, 20072007-02-15
"Francois Grieu" <fgrieu@gmail.com> wrote in message news:fgrieu-DDB419.14093915022007@news-3.proxad.net...> In article <1171539997.883911.255790@j27g2000cwj.googlegroups.com>, > "tum_" <atoumantsev_spam@mail.ru> wrote: > >> use r12 instead of r5. r12 doesn't have to be preserved. >> This will improve speed & stack usage. > > Thanks, had missed that one, although it is implied by > http://www.arm.com/miscPDFs/8031.pdf > > Seems like, in a piece of code with only self-references > (no linker veener), and calling no external code, there is a > carved-in-a-next-as-strong-as-hardware-stone rule that > register r12 belongs to me. > > After this optimization, is it worth, neutral, counterproductive > or impossible to reformulate STMFD SP!,{r4} into something like > STR r4, [r13,#-4] > (did I get this right?); that kind of thing would be wise on a > 680x0 (assuming condition codes do not matter). > > >> When size is the issue consider using Thumb (I understand >> that your goal is an exercise with ARM, not Thumb) > > Yes, thanks. Also, in the context, since there is no TEQ in Thumb, > I found no way to loop without interfering with the C bit, this > in turn made some extra instructions necessary; but indeed, > probably still a bit more compact.I can make it more compact by removing two instructions from outside the loop and adding one inside and changing one slightly. Leave R3 as a count, counting down by 4. Then after the loop R3 is known to be zero so use ADC R0,R3,#0. Peter
Reply by ●February 15, 20072007-02-15
"Francois Grieu" <fgrieu@gmail.com> wrote in message news:fgrieu-BC8DB0.11584515022007@news-3.proxad.net...> Optimizing this is actually not critical, but I'm compacting the code to > the max as an intellectual exercise to deeply familiarize myself with ARM.How about (using new UAL syntax): PUSH {r14} ADDS r3, r3, r1 ; r3 = r3 + r1, r3 points after end of X, C = 0 B loopstart addloop LDR r14, [r1], #4 ; get 32-bit from X, advance pointer LDR r12, [r2], #4 ; get 32-bit from Y, advance pointer ADCS r14, r14, r12 ; C:r4 = r14+r12+C (the actual arithmetic) STR r14, [r0], #4 ; store 32-bit into result, advance pointer loopstart EORS r14, r1, r3 ; Z = (r1==r3), r14 = 0 BNE addloop ; -> loop until r1 reaches r3 ADC r0, r14, #0 ; r0 = C POP {pc} Note this assumes r1 + r3 doesn't overflow, ie. the array pointed to by r3 doesn't wrap around at the end of memory. Wilco
Reply by ●February 15, 20072007-02-15
On Feb 15, 3:13 pm, "Peter Dickerson" <firstname.lastn...@REMOVE.tesco.net> wrote:> "Francois Grieu" <fgr...@gmail.com> wrote in message > > news:fgrieu-DDB419.14093915022007@news-3.proxad.net... > > > > > In article <1171539997.883911.255...@j27g2000cwj.googlegroups.com>, > > "tum_" <atoumantsev_s...@mail.ru> wrote: > > >> use r12 instead of r5. r12 doesn't have to be preserved. > >> This will improve speed & stack usage. > > > Thanks, had missed that one, although it is implied by > >http://www.arm.com/miscPDFs/8031.pdf > > > Seems like, in a piece of code with only self-references > > (no linker veener), and calling no external code, there is a > > carved-in-a-next-as-strong-as-hardware-stone rule that > > register r12 belongs to me. > > > After this optimization, is it worth, neutral, counterproductive > > or impossible to reformulate STMFD SP!,{r4} into something like > > STR r4, [r13,#-4] > > (did I get this right?); that kind of thing would be wise on a > > 680x0 (assuming condition codes do not matter). > > >> When size is the issue consider using Thumb (I understand > >> that your goal is an exercise with ARM, not Thumb) > > > Yes, thanks. Also, in the context, since there is no TEQ in Thumb, > > I found no way to loop without interfering with the C bit, this > > in turn made some extra instructions necessary; but indeed, > > probably still a bit more compact. > > I can make it more compact by removing two instructions from outside the > loop and adding one inside and changing one slightly. Leave R3 as a count, > counting down by 4. Then after the loop R3 is known to be zero so use ADC > R0,R3,#0. > > PeterSUBS r3,r3,#4 ? But this will kill the carry... or am I missing something? sorry, a bit in a haste at the moment.
Reply by ●February 15, 20072007-02-15
On Feb 15, 3:19 pm, "Wilco Dijkstra" <Wilco_dot_Dijks...@ntlworld.com> wrote:> "Francois Grieu" <fgr...@gmail.com> wrote in message > > news:fgrieu-BC8DB0.11584515022007@news-3.proxad.net... > > > Optimizing this is actually not critical, but I'm compacting the code to > > the max as an intellectual exercise to deeply familiarize myself with ARM. > > How about (using new UAL syntax): > > PUSH {r14} > ADDS r3, r3, r1 ; r3 = r3 + r1, r3 points > after end of X, C = 0 > B loopstart > addloop > LDR r14, [r1], #4 ; get 32-bit from X, advance > pointer > LDR r12, [r2], #4 ; get 32-bit from Y, advance > pointer > ADCS r14, r14, r12 ; C:r4 = r14+r12+C (the actual > arithmetic) > STR r14, [r0], #4 ; store 32-bit into result, > advance pointer > loopstart > EORS r14, r1, r3 ; Z = (r1==r3), r14 = 0 > BNE addloop ; -> loop until r1 reaches r3 > ADC r0, r14, #0 ; r0 = C > POP {pc} > > Note this assumes r1 + r3 doesn't overflow, ie. the array pointed to by r3 > doesn't wrap around at the end of memory. > > WilcoNice. Just another example of a solution that appears obvious after someone has shown it to you ;))) Nice. EORS doesn't touch the C if there are no shifts involved.
Reply by ●February 15, 20072007-02-15
"tum_" <atoumantsev_spam@mail.ru> wrote in message news:1171550594.264411.38720@j27g2000cwj.googlegroups.com...> On Feb 15, 1:09 pm, Francois Grieu <fgr...@gmail.com> wrote:>> After this optimization, is it worth, neutral, counterproductive >> or impossible to reformulate STMFD SP!,{r4} into something like >> STR r4, [r13,#-4] >> (did I get this right?); > > ;) you forgot the '!' at the end (but I had to peep into the manual to > correct you, I don't know this stuff by heart). > To the best of my (limited) knowledge, the two instructions are > identical in effect/speed/size for ARM7 & 9 cores.No, LDM/STM of one register is takes 2 cycles on ARM9 while a LDR takes just 1, so it is best to avoid single register LDMs on ARM9. Thumb-2 doesn't support single register LDM/STM although Thumb-1 supports single register PUSH/POP. They are useful for codesize. Wilco
Reply by ●February 15, 20072007-02-15
On Feb 15, 3:35 pm, "Wilco Dijkstra" <Wilco_dot_Dijks...@ntlworld.com> wrote:> "tum_" <atoumantsev_s...@mail.ru> wrote in message > > news:1171550594.264411.38720@j27g2000cwj.googlegroups.com... > > > On Feb 15, 1:09 pm, Francois Grieu <fgr...@gmail.com> wrote: > >> After this optimization, is it worth, neutral, counterproductive > >> or impossible to reformulate STMFD SP!,{r4} into something like > >> STR r4, [r13,#-4] > >> (did I get this right?); > > > ;) you forgot the '!' at the end (but I had to peep into the manual to > > correct you, I don't know this stuff by heart). > > To the best of my (limited) knowledge, the two instructions are > > identical in effect/speed/size for ARM7 & 9 cores. > > No, LDM/STM of one register is takes 2 cycles on ARM9 while a > LDR takes just 1, so it is best to avoid single register LDMs on ARM9. > Thumb-2 doesn't support single register LDM/STM although Thumb-1 > supports single register PUSH/POP. They are useful for codesize. > > WilcoThanks. ARM9 is still new to me.
Reply by ●February 15, 20072007-02-15
In article <4S_Ah.11285$Zl6.274@newsfe3-win.ntli.net>, "Wilco Dijkstra" <Wilco_dot_Dijkstra@ntlworld.com> proposed:> (using new UAL syntax): > >; perform result = X+Y (expressed as little-endian radix 2^32) >; on entry: >; r0 points to result >; r1 and r2 point to sources X and Y >; r3 length in byte of X, Y and result, a non-negative multiple of 4 >; on exit: >; r0 is 1 or 0 depending on if result overflows or not > PUSH {r14} > ADDS r3, r3, r1 ; r3 = r3 + r1, r3 points after end of X, C = 0 > B loopstart > addloop > LDR r14, [r1], #4 ; get 32-bit from X, advance pointer > LDR r12, [r2], #4 ; get 32-bit from Y, advance pointer > ADCS r14, r14, r12 ; C:r4 = r14+r12+C (the actual arithmetic) > STR r14, [r0], #4 ; store 32-bit into result, advance pointer > loopstart > EORS r14, r1, r3 ; Z = (r1==r3), r14 = 0 > BNE addloop ; -> loop until r1 reaches r3 > ADC r0, r14, #0 ; r0 = C > POP {pc} > > Note this assumes r1 + r3 doesn't overflow, ie. the array pointed to by r3 > doesn't wrap around at the end of memory.Actually the assumption is stronger, and quite a bit less safe: it is that the array pointed to by r3 does not REACH 0xFFFFFFFF. ADDS r3, r3, r1 is still a nice trick, if not one that I would dare to promote heavily. The real gem is EORS r14, r1, r3 and how it leaves R14 zeroed. I had wrongly concluded that "C Flag = shifter_carry_out" meant that C was destroyed by EORS, and now realize it is not, which opens a whole new universe of possibilites. Thanks a lot. Also I like PUSH {r14} / POP {pc} After considerable hunt in ARM DDI 0100E (2000-06-23), I conclude that "On architecture version 5 and above" (my target), it is a perfectly legitimate idiom to preserve a working register, and return, including switching back to Thumb mode as needed. This can be put to excellent use in a lot of code; looks like if a terminal subroutine needs to preserve some registers for temp usage, it pays to make r14 part of the temporary registers pool, and return by restoring the saved r14/LR into r15/PC, leaving r14 indeterminate, which is allowed by the usual calling conventions. Thanks a lot, Wilco Dijkstra. BTW that was fun to see somemone with your name use B loopstart ;-) Is it a FAQ to ask the relationship with Edsger W. Dijkstra? Francois Grieu