EmbeddedRelated.com
Forums

Speeding up a "for" loop (8051)

Started by Martin Griffith March 23, 2008
I'm trying to feed a 7seg LEDdriver  bit bashing a STP08C05 and with a
11.059M clock it takes about 180uS.
Any way to speed it up, (and I am going up to 24Mhz xtal) ?


void Led7(char time)
{
 char buff,i;
 bit SDI;
  buff=seg7[time];

       for(i=0; i<8;i++)
      {
      SDI=buff&0x01;
      if(SDI)
     { P1|=0x20;}
      else
     { P1&=~0x20;}
      
      buff=buff>>1;

      P1|=0x80;//clock
      P1&=~0x80;
   }
}

Thanks


martin
Martin Griffith wrote:
> I'm trying to feed a 7seg LEDdriver bit bashing a STP08C05 and with a > 11.059M clock it takes about 180uS. > Any way to speed it up, (and I am going up to 24Mhz xtal) ? > > > void Led7(char time) > { > char buff,i; > bit SDI; > buff=seg7[time]; > > for(i=0; i<8;i++) > { > SDI=buff&0x01; > if(SDI) > { P1|=0x20;} > else > { P1&=~0x20;} > > buff=buff>>1; > > P1|=0x80;//clock > P1&=~0x80; > } > }
Playing around with SDCC and counting cycles with a WCET tool, assuming a vanilla 8051 with 12 clocks per machine cycle, the fastest looping version I found is 145 us: void Led7btx (unsigned char time) { unsigned char buff, i; buff=seg7[time]; P1 &= ~0x20; i = 8; do { if (buff & 0x01) { P1 ^= 0x20;} buff = buff >> 1; P1 |= 0x80; //clock P1 &= ~0x80; i --; } while (i != 0); } This assumes that seg7[] is in xdata and has been modified to show the changes in segment on/off state, counting from bit 0 to bit 7 and starting from '0' state, so that P1 can be updated with an xor after being initialized to a '0' state before the loop. Unrolling the loop in this function gives 92 us: void Led7fx (unsigned char time) { unsigned char buff; buff = seg7[time]; P1 &= ~0x20; if (buff & 0x01) { P1 ^= 0x20; } // 1 P1 |= 0x80; P1 &= ~0x80; if (buff & 0x02) { P1 ^= 0x20; } // 2 P1 |= 0x80; P1 &= ~0x80; if (buff & 0x04) { P1 ^= 0x20; } // 3 P1 |= 0x80; P1 &= ~0x80; if (buff & 0x08) { P1 ^= 0x20; } // 4 P1 |= 0x80; P1 &= ~0x80; if (buff & 0x10) { P1 ^= 0x20; } // 5 P1 |= 0x80; P1 &= ~0x80; if (buff & 0x20) { P1 ^= 0x20; } // 6 P1 |= 0x80; P1 &= ~0x80; if (buff & 0x40) { P1 ^= 0x20; } // 7 P1 |= 0x80; P1 &= ~0x80; if (buff & 0x80) { P1 ^= 0x20; } // 8 P1 |= 0x80; P1 &= ~0x80; } The SDCC code from the unrolled function looks quite tight, but I think there are some unnecessary moves from R2 into A that could be avoided in assembly language. Please note that this code is not tested, just compiled and analysed with a cycle-counting tool. By the way, I'm not familiar with 7-segment drivers, but I was surprised to see that the code sends 8 bits, not 7. Is that right? What is the 8th bit used for? HTH, -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
On Sun, 23 Mar 2008 14:14:52 +0200, in comp.arch.embedded Niklas
Holsti <niklas.holsti@tidorum.invalid> wrote:

>Martin Griffith wrote: >> I'm trying to feed a 7seg LEDdriver bit bashing a STP08C05 and with a >> 11.059M clock it takes about 180uS. >> Any way to speed it up, (and I am going up to 24Mhz xtal) ? >> >> >> void Led7(char time) >> { >> char buff,i; >> bit SDI; >> buff=seg7[time]; >> >> for(i=0; i<8;i++) >> { >> SDI=buff&0x01; >> if(SDI) >> { P1|=0x20;} >> else >> { P1&=~0x20;} >> >> buff=buff>>1; >> >> P1|=0x80;//clock >> P1&=~0x80; >> } >> } > >Playing around with SDCC and counting cycles with a WCET tool, >assuming a vanilla 8051 with 12 clocks per machine cycle, the >fastest looping version I found is 145 us: > >void Led7btx (unsigned char time) >{ > unsigned char buff, i; > > buff=seg7[time]; > P1 &= ~0x20; > i = 8; > do > { > if (buff & 0x01) { P1 ^= 0x20;} > > buff = buff >> 1; > > P1 |= 0x80; //clock > P1 &= ~0x80; > > i --; > } > while (i != 0); >} > >This assumes that seg7[] is in xdata and has been modified to show >the changes in segment on/off state, counting from bit 0 to bit 7 >and starting from '0' state, so that P1 can be updated with an xor >after being initialized to a '0' state before the loop. > >Unrolling the loop in this function gives 92 us: > >void Led7fx (unsigned char time) >{ > unsigned char buff; > > buff = seg7[time]; > > P1 &= ~0x20; > > if (buff & 0x01) { P1 ^= 0x20; } // 1 > P1 |= 0x80; P1 &= ~0x80; > > if (buff & 0x02) { P1 ^= 0x20; } // 2 > P1 |= 0x80; P1 &= ~0x80; > > if (buff & 0x04) { P1 ^= 0x20; } // 3 > P1 |= 0x80; P1 &= ~0x80; > > if (buff & 0x08) { P1 ^= 0x20; } // 4 > P1 |= 0x80; P1 &= ~0x80; > > if (buff & 0x10) { P1 ^= 0x20; } // 5 > P1 |= 0x80; P1 &= ~0x80; > > if (buff & 0x20) { P1 ^= 0x20; } // 6 > P1 |= 0x80; P1 &= ~0x80; > > if (buff & 0x40) { P1 ^= 0x20; } // 7 > P1 |= 0x80; P1 &= ~0x80; > > if (buff & 0x80) { P1 ^= 0x20; } // 8 > P1 |= 0x80; P1 &= ~0x80; > >} > >The SDCC code from the unrolled function looks quite tight, but I >think there are some unnecessary moves from R2 into A that could be >avoided in assembly language. > >Please note that this code is not tested, just compiled and >analysed with a cycle-counting tool. > >By the way, I'm not familiar with 7-segment drivers, but I was >surprised to see that the code sends 8 bits, not 7. Is that right? >What is the 8th bit used for? > >HTH,
Thanks, bit 8, for the decimal point :) martin
Martin Griffith wrote:
> I'm trying to feed a 7seg LEDdriver bit bashing a STP08C05 and with a > 11.059M clock it takes about 180uS. > Any way to speed it up, (and I am going up to 24Mhz xtal) ? >
0) check the assembler code generated 1) try to use bit instruction to access P1 : P1.5 = SDI; P1.7 = 1 ; // clock P1.7 = 0 ; (you have to check how your C compiler access bits in port 1 ) 2) hand write in assembler the for loop using: DJNZ to control the loop the CARRY-FLAG to move your data between the accumulator and the port bit using shifts the loop should be this: datum to be send in ACC bit counter in R1 MOV R1,#8 loop: RRC A ; this move the LSB in C and shift the Acc MOV P1.5,C SETB P1.7 CLR P1.7 DJNZ R1,loop Please, TRIPLE check if this code fragmant is right
> > void Led7(char time) > { > char buff,i; > bit SDI; > buff=seg7[time]; > > for(i=0; i<8;i++) > { > SDI=buff&0x01; > if(SDI) > { P1|=0x20;} > else > { P1&=~0x20;} > > buff=buff>>1; > > P1|=0x80;//clock > P1&=~0x80; > } > } > > Thanks > > > martin
Martin Griffith wrote:
> On Sun, 23 Mar 2008 14:14:52 +0200, in comp.arch.embedded Niklas > Holsti <niklas.holsti@tidorum.invalid> wrote: > > >>Martin Griffith wrote: >> >>>I'm trying to feed a 7seg LEDdriver bit bashing a STP08C05 and with a >>>11.059M clock it takes about 180uS. >>>Any way to speed it up, (and I am going up to 24Mhz xtal) ? >>> ... >> >>Playing around with SDCC and counting cycles with a WCET tool, >>assuming a vanilla 8051 with 12 clocks per machine cycle, ... >> >>This assumes that seg7[] is in xdata and has been modified to show >>the changes in segment on/off state, counting from bit 0 to bit 7 >>and starting from '0' state, so that P1 can be updated with an xor >>after being initialized to a '0' state before the loop. >> >>Unrolling the loop in this function gives 92 us:
and using bit access to P1 reduces that to 65 us: __bit __at (0x95) P1_5; __bit __at (0x97) P1_7; void Led7fx_bit (unsigned char time) { unsigned char buff; buff = seg7[time]; P1_5 = 0; if (buff & 0x01) { P1_5 ^= 1; } // 1 P1_7 = 1; P1_7 = 0; if (buff & 0x02) { P1_5 ^= 1; } // 2 P1_7 = 1; P1_7 = 0; if (buff & 0x04) { P1_5 ^= 1; } // 3 P1_7 = 1; P1_7 = 0; if (buff & 0x08) { P1_5 ^= 1; } // 4 P1_7 = 1; P1_7 = 0; if (buff & 0x10) { P1_5 ^= 1; } // 5 P1_7 = 1; P1_7 = 0; if (buff & 0x20) { P1_5 ^= 1; } // 6 P1_7 = 1; P1_7 = 0; if (buff & 0x40) { P1_5 ^= 1; } // 7 P1_7 = 1; P1_7 = 0; if (buff & 0x80) { P1_5 ^= 1; } // 8 P1_7 = 1; P1_7 = 0; }
>>What is the 8th bit used for?
> bit 8, for the decimal point :)
Ah. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
On Sun, 23 Mar 2008 13:23:17 +0100, in comp.arch.embedded mmm
<mmm@john.bluto.blutarsky.it> wrote:

>Martin Griffith wrote: >> I'm trying to feed a 7seg LEDdriver bit bashing a STP08C05 and with a >> 11.059M clock it takes about 180uS. >> Any way to speed it up, (and I am going up to 24Mhz xtal) ? >> >0) check the assembler code generated > >1) try to use bit instruction to access P1 : > >P1.5 = SDI; >P1.7 = 1 ; // clock >P1.7 = 0 ; > >(you have to check how your C compiler access bits in port 1 ) > >2) hand write in assembler the for loop using: > >DJNZ to control the loop >the CARRY-FLAG to move your data between the accumulator and the port >bit using shifts > >the loop should be this: > >datum to be send in ACC >bit counter in R1 > > MOV R1,#8 >loop: > RRC A ; this move the LSB in C and shift the Acc > MOV P1.5,C > SETB P1.7 > CLR P1.7 > DJNZ R1,loop > >Please, TRIPLE check if this code fragmant is right > >> >> void Led7(char time) >> { >> char buff,i; >> bit SDI; >> buff=seg7[time]; >> >> for(i=0; i<8;i++) >> { >> SDI=buff&0x01; >> if(SDI) >> { P1|=0x20;} >> else >> { P1&=~0x20;} >> >> buff=buff>>1; >> >> P1|=0x80;//clock >> P1&=~0x80; >> } >> } >> >> Thanks >> >> >> martin
I'm not very good at programming, I'm just trying Niklas's approach, and thanks for the bit instruction tip martin
On Sun, 23 Mar 2008 14:44:32 +0200, in comp.arch.embedded Niklas
Holsti <niklas.holsti@tidorum.invalid> wrote:

>Martin Griffith wrote: >> On Sun, 23 Mar 2008 14:14:52 +0200, in comp.arch.embedded Niklas >> Holsti <niklas.holsti@tidorum.invalid> wrote: >> >> >>>Martin Griffith wrote: >>> >>>>I'm trying to feed a 7seg LEDdriver bit bashing a STP08C05 and with a >>>>11.059M clock it takes about 180uS. >>>>Any way to speed it up, (and I am going up to 24Mhz xtal) ? >>>> ... >>> >>>Playing around with SDCC and counting cycles with a WCET tool, >>>assuming a vanilla 8051 with 12 clocks per machine cycle, ... >>> >>>This assumes that seg7[] is in xdata and has been modified to show >>>the changes in segment on/off state, counting from bit 0 to bit 7 >>>and starting from '0' state, so that P1 can be updated with an xor >>>after being initialized to a '0' state before the loop. >>> >>>Unrolling the loop in this function gives 92 us: > >and using bit access to P1 reduces that to 65 us: > >__bit __at (0x95) P1_5; >__bit __at (0x97) P1_7; > >void Led7fx_bit (unsigned char time) >{ > unsigned char buff; > > buff = seg7[time]; > > P1_5 = 0; > > if (buff & 0x01) { P1_5 ^= 1; } // 1 > P1_7 = 1; P1_7 = 0; > > if (buff & 0x02) { P1_5 ^= 1; } // 2 > P1_7 = 1; P1_7 = 0; > > if (buff & 0x04) { P1_5 ^= 1; } // 3 > P1_7 = 1; P1_7 = 0; > > if (buff & 0x08) { P1_5 ^= 1; } // 4 > P1_7 = 1; P1_7 = 0; > > if (buff & 0x10) { P1_5 ^= 1; } // 5 > P1_7 = 1; P1_7 = 0; > > if (buff & 0x20) { P1_5 ^= 1; } // 6 > P1_7 = 1; P1_7 = 0; > > if (buff & 0x40) { P1_5 ^= 1; } // 7 > P1_7 = 1; P1_7 = 0; > > if (buff & 0x80) { P1_5 ^= 1; } // 8 > P1_7 = 1; P1_7 = 0; > >} > >>>What is the 8th bit used for? > >> bit 8, for the decimal point :) > >Ah.
Thanks, I'm reading the Raisonace documents now, _bit is different, it appears. It's a vast improvement over my original :) martin
Niklas Holsti wrote:

Correcting my own post:

> __bit __at (0x95) P1_5; > __bit __at (0x97) P1_7;
To follow the SDCC manual, it is better to write "__sbit" than "__bit" in these declarations. However, it seems to make no difference in the machine code, maybe because of the "__at". Martin, which compiler are you using? SDCC or a commercial one? -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
On Sun, 23 Mar 2008 15:22:40 +0200, in comp.arch.embedded Niklas
Holsti <niklas.holsti@tidorum.invalid> wrote:

>Niklas Holsti wrote: > >Correcting my own post: > >> __bit __at (0x95) P1_5; >> __bit __at (0x97) P1_7; > >To follow the SDCC manual, it is better to write "__sbit" than >"__bit" in these declarations. However, it seems to make no >difference in the machine code, maybe because of the "__at". > >Martin, which compiler are you using? SDCC or a commercial one?
I'm using the 4k limited Raisonace compiler, freeware which is good enough for me. I am not really a programmer, I just have ideas that need a micro, so I attempt to programme I'm making a SMPTE timecode reader, which has interrupts every 250uS, or so, so I'm trying to speed things up. There are some (bad) videos, all very short, here: http://www.youtube.com/user/topofahillinspain It's going quite well. back to the docs.... martin
Martin Griffith wrote:
> On Sun, 23 Mar 2008 15:22:40 +0200, in comp.arch.embedded Niklas > Holsti <niklas.holsti@tidorum.invalid> wrote: > >> Niklas Holsti wrote: >> >> Correcting my own post: >> >>> __bit __at (0x95) P1_5; >>> __bit __at (0x97) P1_7; >> To follow the SDCC manual, it is better to write "__sbit" than >> "__bit" in these declarations. However, it seems to make no >> difference in the machine code, maybe because of the "__at". >> >> Martin, which compiler are you using? SDCC or a commercial one? > I'm using the 4k limited Raisonace compiler, freeware which is good > enough for me. I am not really a programmer, I just have ideas that > need a micro, so I attempt to programme > > I'm making a SMPTE timecode reader, which has interrupts every 250uS, > or so, so I'm trying to speed things up. There are some (bad) videos, > all very short, here: > http://www.youtube.com/user/topofahillinspain > > It's going quite well. > > back to the docs.... > > > martin
Are you making the Timecode grabber code on the PC available ??