Least processor-intensive serial interrupt routine for 8051

Started by mik3ca 2 weeks ago4 replieslatest reply 2 weeks ago40 views

I'm trying to figure out if I can somehow save some more clock cycles in these routines.

Basically I have a transmit and receive function here for the serial port. Ok I did omit the interrupt enable and initial serial port setup but that part I can do. What I am particularly interested in is being able to make a ring buffer for the receive routine and using less clock cycles to add data to it.

Currently my receive routine clocks in at about 25 clock cycles (roughly 13uS on my AT89S52 with a 22.1184Mhz crystal).

How can I optimize the code below (particularly the functions) to make it use less clock cycles?

STX equ 30h
CSUMT equ STX+10 ;end of transmit buffer = 3Ah
;Mask needs to be CFh for the rollover to work.
;The serial buffer is 16 bytes ranging from C0h to CFh
;Anding Dxh with CFh = Cxh
RBMASK equ 0CFh 
RBSTART equ 0C0h
SBANK equ 08h
SIR0 equ SBANK ;SIR0=R0 of bank 1
org 0h
mov RXPA,#RBSTART ;RXPA = tail of circular receive buffer
mov SIR0,#RBSTART ;R0 of bank1 = head of circular receive buffer
;Here we fill transmit buffer with data...
;then send byte to start transmission
mov SIR1,#STX-1 ;R1 of bank1 = position of linear transmit buffer
;...do other irrelevant stuff here
org 23h
jbc RI,rxdf ;-2c
  ;Data transmit character from linear buffer routine
  clr TI ; -1c
  mov SPSW,PSW         ;Save old PSW -2c
  mov PSW,#SBANK       ;Load our bank to get R1 -2c
  cjne R1,#CSUMT,notxe ;See if were at end of buffer -2c
    setb REN           ;..we are so enable receiver -1c
    sjmp notxe2        ;and skip nonsense. -2c
  inc R1               ;here we aren't at end so increment pointer -1c
  mov SBUF,@R1         ;and send out next character -2c
  mov PSW,SPSW         ;restore PSW and exit -2c
reti    ; -2c
;Total usage under worst case: 16 cycles (9uS)
  mov SPSW,PSW         ;Save PSW - 2c
  mov PSW,#SBANK       ;Use our bank of registers -2c 
  mov R2,A             ;Save Accumulator (1 cycle less VS push ACC) -1c
  mov A,R0             ;Load current head position -1c
  inc A                ;temporarily add 1 -1c
  anl A,#RBMASK        ;and wrap-around if past 16th byte -1c
  xrl A,RXPA           ;See if current position + 1 = tail -1c
  jz bufovr            ; -2c
    mov @R0,SBUF       ;It isn't so we store incoming byte -2c
    inc R0             ;and advance head -1c
    anl SIR0,#RBMASK   ;and wrap-around if past 16th byte -2c
  mov A,R2             ;Restore accumulator -1c
  mov PSW,SPSW         ;and restore PSW -2c
reti                   ;and exit -2c
;Total usage under worst case: 25 cycles (13uS)

P.S. the negative numbers followed by c's represent me indicating the clock cycles that one instruction required to complete. Example: -2c = 2 clock cycles.

[ - ]
Reply by MichaelKellettJanuary 4, 2019

I have to ask this - why don't you use a modern processor with DMA, 32 bit data paths and 10x or more the instruction rate. It won't cost any more and it will come in a sensible package.

I gave up on hand optimizing classic 8051 assembler 18 years ago - (when it was the core processor in a Cypress USB interface chip).


[ - ]
Reply by matthewbarrJanuary 4, 2019


Even in the 8051 world you can find inexpensive devices with 64k code space and more, and with 2k-8k XRAM. As an added bonus you can even program them in C, Silicon Labs offers a free Keil 8051 tool chain with no code size limits.

A long time ago I took an Operating Systems class from a British professor named Roy Campbell at UIUC. I had done a lot of 8080/Z80 development in assembler, and was talking with him after class one day explaining that I enjoyed low level programming and being close to the hardware.

He looked at me and smiled, and in his awesome British accent said, "Oh, a bit twiddler, eh?"

[ - ]
Reply by jorickJanuary 4, 2019

I can see only one place to save a couple of clock cycles.  In txdf, you can replace the sjmp notxe2 with mov PSW,SPSW; reti, saving the 2 clocks for the jump.  Other than that, it looks like your code is as optimized as possible.

[ - ]
Reply by mik3caJanuary 4, 2019

I just realized one second after your answer that its the most brilliant answer. Yes I know the world is for the latest and greatest. Well the nice news is I ordered AT89LP52 through a local shop which the datasheet claims can be used as a drop-in replacement for the AT89S52 but offers 10x more speed with the same crystal. I'll see what happens in a few days because the local shop has to buy the part through digikey plus they charge me their own handling fees. I'm told its $5/chip tax included which I accept. Not bad considering I won't have to deal with the fedex or ups brokerage nonsense fees, plus its expected to arrive in about 1 week.