Data packet format that transfers at best speed - 8051

Started by mik3ca 3 weeks ago19 replieslatest reply 1 week ago40 views

After analyzing my own code 500,000 times, I'm starting to learn my serial data format isn't perfect.

Currently, my packet is arranged as 11 bytes as follows:

Byte 1: Sender
Byte 2: Receiver
Bytes 3 - 10: Data
Byte 11: 8-bit CRC checksum of all previous bytes

I'm trying to do data transfer at 56kbps between two micros using this data format both in a wired and wireless environment, and I couldn't get a decent speed for transmission. I'm trying to have the packet transmitted out and data received within 10ms but that's not successful. I can however do it in 20ms.

I have verified other code that can possibly slow down the transmission and those routines take only a few milliseconds to execute at most so the only thing left to blame is my packet. The micros in question are AT89S52 with 22.1184Mhz crystal attached.

The reason why I'm blaming the packet is because if the data for any reason is out of sync (example: if power to one micro is applied a few microseconds later than the other micro), then in this situation, I could waste data time.

I'll explain my issue by example:

Say micro 2 expects the data string: 11 xx xx xx xx xx xx xx xx xx 77

Micro 1 this time manages to send the correct string.

micro 2 responds with correct data string: 22 xx xx xx xx xx xx xx xx xx 88

That's normal. 

But what if micro 1 has problems and sends something like 11 xx xx xx xx *power cut* 11 xx xx xx xx 

Then my protocol will have micro 2 pick up the full packet until the checksum byte which is several wasted bytes picked up and a waste of time.

In the end, What I'm trying to do is...

In the end, I'm trying to create a system so that one micro is the master and sends a packet out to each of the connected micros (about 24) wirelessly in turn and when each micro receives the valid data, it sens the correct response. The process repeats infinitely. One byte in the packet identifies who is supposed to receive the packet.

Is there a better way I can do this without causing any micro to receive up to 10 wasted bytes due to disconnection or power outage or the like?

I mean the only thing I can think of off the top of my head is to make an ultra-short packet but the byte size requirement per packet is quadrupled.

Example packet:

Byte 1: Sender
Byte 2: Receiver
Byte 3: One byte of data
Byte 4: Checksum

If I did that for the entire 8 bytes of data, then I'll need 32 bytes as opposed to 11 bytes to send out.

What do you guys think? How do I fix my issue?

[ - ]
Reply by mirceacJanuary 1, 2019

As with any shared transmission media, no matter if wired or wireless (but it's many times as bad when one goes wireless) you have to deal with collisions, lost data and other stuff that will mess with your timings and synchronization. There is no way around it and many PHD thesis were written on the subject and patents filled.

Also, not a small number of projects were working somehow in the lab conditions and failing miserable in production because noise and perturbation.

So IMNSHO, without long and careful simulation of the worst case possible, adjusting the protocol and packet size for the highest reliability and testing in noisy conditions, you won't be able to produce a report to explain in a ćlear way that what you want it's 169% IMPOSSIBLE, especially considering radio modems, that while offering a nice & simple interface for sending data, they're totally isolating you from how the radio protocol actually works, I can tell it based on my experience with Wireless mBUS, where a customer wanted to get ALL the data packages sent by a meter in an industrial environment with electrical noise and other meters around sharing the bandwidth. Of course, naive back of napkin calculations showed that should be possible.

So sorry to disappoint you, but that will be a difficult if not impossible project.

[ - ]
Reply by hodgecJanuary 1, 2019

In the case you described, one device lost power and/or stop communicating before it could finish the transmission.  The result of this is garbage( unfinished ) packet in the data stream.  What you need is the ability to gracefully handle this problem. This means you need the ability to detect it occurred and then recover from it.

The first problem is detecting it.  One way to accomplish this is to use a fixed length packet.  What this does for you is allows you to timeout the transmission.  You know from the baud rate that an 11 byte packet will complete in ~2 ms.  So when you receive the first character, you know that the remaining packet will arrive in less than 2 ms.  So at the end of 2 ms if you didn't receive 11 bytes then an error occurred.  The fixed length packet simplifies the problem, but this can done using a variable length packet also.  I'll discuss that in a minute.
Now that you know an error occured the next problem is resyncing to the the data stream.  For this you could use a sync pattern  By extending the protocol to always begin the packet with a 2 byte sync pattern say 0x55, 0xAA then you can always tell where in the data stream a new transmission starts. However, the sync pattern must be chosen such that it does not occur naturally within the data stream.  The sync pattern can be as many bytes as needed to ensure you can sync to the data, but the longer it is the more data you have to send thus more time for transmission.

Additionally, assuming the sync pattern chosen does not occur naturally within the data stream and you are using a fixed length packet, then if you encounter the sync pattern before the packet is complete once again you have an error.

As I stated earlier, the above can also be done using a variable length packet but it requires that you put a byte count into the packet so you know how many bytes are to be expected.  The byte count is then used to set the timeout.  If this method is chosen the byte count should be the the first byte after the sync pattern so as to minimize the chance for errors.  Since, you know the maximum amount of data that will be transfer. If you receive a byte count greater than the maximum, then this is also an error.

The above suggestions increases the amount of transmission data but is should still fit within the 10ms you are attempting to achieve.  

Additionally CustomSarge's idea of: "You can compress the addresses into 1 byte - MSb is master, bits 6-0 are slave".  would help reduce the overall byte count some.

For the situation were you are using wired communications and it is taking 20 ms instead of 10ms you expected, this likely a coding error somewhere. I would ensure that the routines processing the UART and not putting gaps in the data stream.

When transmission is done wireless there are a lot of factors that come in to play.  These include but limited to over-the-air collisions, transmitter power up time, and over-the-air transmission rates.

[ - ]
Reply by mik3caJanuary 1, 2019

You must have typed this as I was putting your suggestions in practice.

Sadly, these ideas made the reception slower.

Now my new packet format is as follows:

Byte 1: 0FEh - Value to tell all clients to reset their receiver state
Byte 2: 0nh
Byte 3: 0nh
Byte 20: 0nh
Byte 21: 0nh
Byte 22: 0nh
Byte 23: 0nh

First byte I explained. Then the original 8 bytes of data including the checksum are all divided up into nibbles and each byte will have the nibble in the LSB side of the byte (will replace n in packet format shown above with 4-bit value). This does inflate my packet from 11 bytes to 23 bytes but it gives me the ability to detect the starting byte.

I set my over-the-air transmission rate and UART rate on both the radio module and micro to 56kbps to prevent choking on the line. Should I just set the over-the-air rate to 115kbps and leave the rest at 56kbps or will that make matters worse?

[ - ]
Reply by hodgecJanuary 2, 2019

I'm not sure understand your new packet format but it seem to be a bit over complicated and I'm not sure it accomplishes any thing. Additionally, I'm not sure you have a guarantee of being able to detect the starting byte. Unless your normal data is always ASCII, then 0xFE could occur in the normal data.  This would have the effect of a false start of packet. My suggestion was as follows:

Byte 1: 0x55
Byte 2: 0xAA
Byte 3: Sender
Byte 4: Receiver
Bytes 5 - 12: Data
Byte 13: 8-bit CRC checksum of all previous byte

Using this then you always look for the 2 byte sequence of 0x55, 0xAA to indicate start of packet.  This is not fail proof either, but because it uses 2 bytes then the likely-hood of a false positive is reduced.  To further reduce the chance for a false positive one would need to escape the sync pattern if it occurred within the data stream.  To escape the pattern within the data then you could simple place a 0x7E in front of a 0x55 or 0xAA if they occur in the data portion only.


55 AA 11 22 00 01 02 80 7E 55 7E AA 32 30 00 00 <cksum>

Notice the 7E 55 and 7E AA.

Now referring back to your new format; If I understood then you are sending 0xFE to tell all receiving devices to reset their receive buffers.  Does this mean that all receiving devices will responded?

If so this is bad for radio transmission, because one and only one radio can transmit at any given time. If more than one radio attempts to transmit at the same time then a collisions will occur and each transmitting device will have to retry.  This will certainly cause the transmission to take a lot longer time and this will be even worse if the transceivers are not using some form of CSMA/CD (Carrier Sense Multiple Access / Collision Detection). 

As for your question regarding changing the Over-the-Air Rate to 115K, it can make things worse if there is RF interference within the environment. For now my recommendation is to get it working properly first, then you can attempt to increase the Over-the-Air rate.

I believe you have a couple of problems in this design.  My recommendation is to use only 2 transceivers Master and Slave.  Get the 2 units working reliably together.  Test them until you are sure everything is working well.  Then start adding more slave units.  

[ - ]
Reply by mik3caJanuary 2, 2019

The reset (code 0FEh) is sent as a synchronization to all slaves to reset their internal pointers to the data so that when the next character is sent by the master, the receivers will think its the 1st data character instead of the nth character.

And yes I'm currently dealing with one master and one slave at this time.

[ - ]
Reply by matthewbarrJanuary 1, 2019

It sounds like you may have a problem reliably framing receive packets, particularly in the presence of errors where you have to flush data so you have an empty buffer when you receive the first byte of the next packet. This is the issue I had in mind in an earlier response when I mentioned flushing the RX packet buffer. You have this issue in the normal case when a received packet is addressed to another node, and also in a variety of error situations. You do not have a unique start or end of packet identifier, and you have a 1/256 probability of randomly getting a valid checksum.

As long as your packet contains raw binary data, any byte is possible and you do not have a unique start of message byte. MODBUS defines one, but it can appear in the data and MODBUS serial relies on enforcement of minimal time gaps between message bytes to keep things straight. Note, if you swap source/destination address bytes in your packet format, each destination now has a unique first message byte that is its address. You get the equivalent of MODBUS SOM functionality for free without adding a byte to the packet format. There are other ways to go that give you truly unique packet delimiters.

As an alternative to changes inspired by MODBUS, have you heard of COBS? A colleague pointed me at COBS some time ago, it stands for Consistent Overhead Byte Stuffing. It's a very low overhead byte-stuffing scheme that removes NULs (0x00) from data. You can send your encoded packet with a trailing NUL, the receiver now has a unique end of data that cannot appear in the data. As a starting point see the Wikipedia page.

On the transmit side you COBS encode your packet and send it followed by a NUL. You can also add a leading NUL as start of data, but that isn't really necessary and typically isn't done. For your 11 byte packet the maximum overhead including trailing NUL should be two additional bytes, three if you add a leading NUL.

On the receive side you might have two states, START (wait for first byte) and DATA (decode and look for trailing NUL.) One beauty of COBS is that you can decode on-the-fly as each byte is received. The papers I've found don't describe this, but it's pretty easy to unroll the decode loop and define a stateful function that takes the next RX byte and produces decoded output. When you restart packet reception you have to initialize COBS decode state in addition to your RX buffer state, etc. This on-the-fly processing reduces the latency from last byte in to RX packet ready because you don't have to decode and process the entire packet, you've already done most of the work which can also include checksum accumulation.

When you are in the DATA state and receive a trailing NUL, you should have a decoded packet with a valid checksum. You have to be careful in the DATA state not to overrun your RX data buffer. You'll probably flush and return to the START state any time there is a gap in RX data, or on buffer overrun after a NUL is received. With both leading and trailing NUL there should be no need for gap detection timeout handling.

Assuming you add 2 bytes overhead to each 11 byte packet, your packet time goes from 1.96 ms to 2.32 ms, adding 0.357 ms to the transmission time. You'll also add a little processing time for COBS encode and decode, although you can hide decode processing as described above. What you get for this is simple, reliable and fast RX packet handling and error rejection.

With all that said...

My concern is that you have not done a thorough job characterizing and understanding the system latencies that are taking you to your 20 ms number. If you are unable to achieve your performance target using two nodes with hardwired UART connections, you have to be missing something. Hopefully you have a PING message/response that you can use to characterize delays. You might consider pulsing I/O ports on master send, client receive, client send and master receive, and observe these with a scope. This should help you quickly pin down your delays and see if one or more of them is off in the weeds. Until you go through this exercise or equivalent, you are basically guessing at the problem source and hoping you fix it with a related change. You might get lucky, you might not. Debug is like detective work, some guys like to gather facts, others like to go with their gut. I think you can tell which camp I'm in!

I would certainly do that before I implemented COBS or any other non-trivial change and found out that it didn't help. I really think you have a firmware packet handling issue or driver issue somewhere, particularly if you get this 20 ms number for a PING message/response involving minimal processing and overhead.

Budget and resources depending, if you're looking for a low-cost oscilloscope you might consider the Rigol DS1054Z. It's a great scope for the money, and serial decode and other options are bundled with the scope. I've been very impressed with it, and love that I can connect it to my LAN and run a shell script on my PC to extract a screen image and save it to a file. The best price I found was where they give you a modest price break if you create an account. (I have no incentives or relationship with Rigol or, just sharing my experience.)

One additional change to your packet format you might consider is an embedded byte count after the two address bytes. This could be the total decoded packet length or just the data payload length. (Payload length is my choice, mapping values 0-255 to payload lengths 1-256, disallowing a 0-byte payload, allowing up to a 256 byte payload with an 8-bit byte count value.) This is a nice consistency check for the total decoded packet byte count.

[ - ]
Reply by mik3caJanuary 2, 2019

Ok, I'm showing my entire serial routine. Only thing I didn't include was netdata and serialmem address and I didnt include setting up SCON and IE and IP registers. Also, I set all the memory to zero before routines start.

Is there any way I can improve it?

And another question... Since I'm using AT89S52 with 22.1184Mhz crystal, is it true that at 56Kbps that I can execute up to 320 clock cycles worth of instructions before RI is raised assuming data is continuously coming in at 56kbps?

    RCHARTIMEOUT equ 0FAh ;1 received character timeout - 750uS 
    RCHARTIMEOUTL equ 099h ;for modbus compliance
    STARTPKT equ 0FEh ;start of packet ID. (high nibble. low nib=0)
    CRCSEED equ 0h ;Base number (polynomial) for CRC calculations
    SBANK equ 08h ;RS1:RS0=0:1
    SIR0 equ SBANK
    SIR1 equ SBANK+1
    SIR4 equ SBANK+4
    SIR6 equ SBANK+6
    ;Netdata and serialmem are defined elsewhere
    ;Cached network data. 32 bytes. can use extd memory
    TXDAT equ NETDATA+10h
    ;Serialmem must be local - 36 bytes ram.
    STX equ SERIALMEM+10h
    SBU equ SERIALMEM+20h
    CSUMR equ SBU  ;checksum recv val to compare
    SPSW equ SBU+1h  ;backup
    TXR bit SERBITS.0 ;packet transmitted
    RXR bit SERBITS.1 ;packet received
    SERPACKET bit SERBITS.2 ;packet received
    GOTANYBYTE bit SERBITS.3 ;1 byte received of any raw serial byte
    PNIBBLET bit SERBITS.4 ;packet nibble xmit
    PNIBBLE bit SERBITS.5 ;packet nibble
    MYPACKET bit SERBITS.7 ;packet received for me
    MASKMYPKT equ 080h 
    MASKRCVD equ 0Eh 
    ;Meaning of byte offsets in packet
    SF_ADDR1 equ 0h ;first address
    SF_ADDR2 equ 1h ;2nd address
    SF_DP0 equ 2h  ;data 0
    SF_DP1 equ 3h 
    SF_DP2 equ 4h 
    SF_DP3 equ 5h 
    SF_DP4 equ 6h 
    SF_DP5 equ 7h 
    SF_DP6 equ 8h 
    SF_DP7 equ 9h ;data 7
    SF_CSUM equ 0Ah ;checksum
    CSUMT equ STX+SF_CSUM ;checksum transmit value
    CSUMRA equ SRX+SF_CSUM ;checksum receive value address
    ;Timer routine start
    org 0Bh
    ljmp tmr0
    ;Serial routine start:
    org 23h
      jbc RI,rxdf
      jbc TI,txdf
    ;Previous character was sent at this point
        cpl PNIBBLET        ;<- Variable to define nibble to use for transmit
        mov SPSW,PSW        ;<- Save old bank and switch
        mov PSW,#SBANK      
        mov R2,A            ;<- Save old A
        jnb PNIBBLET,ntxn
          mov A,R1          ;Executes on odd nibbles
          xrl A,#CSUMT
          jnz ntxend        ;See if we are in checksum field
            setb TXR        ;We are so we are done
            clr GOTANYBYTE  ;Tell timer we have nothing
            setb TF0        ;and timeout to make reset
            setb REN        ;turn receiver on
            setb IT0        ;enable timer
            ajmp exitint2   ;restore used registers and exit
          inc R1            ;we aren't in checksum field so advance
        mov A,@R1           ;Get byte
        jnb PNIBBLET,ntxn1  ;See which nibble it is
        swap A              ;and adjust
        anl A,#0Fh          ;we only want nibble
        mov SBUF,A          ;send 0x on the wire
        cjne R1,#CSUMT,ntxend2 ;See if current byte is checksum field
          ajmp exitint2        ;it is so just exit
        mov R5,DPH  ;Save DPTR
        mov R7,DPL
        xrl A,CSUMT ;Calculate checksum. CSUMT has rolling checksum value
        mov DPTR,#CRCT  ;DPTR=lookup table
        movc A,@A+DPTR  ;load value from lookup table
        mov CSUMT,A     ;store updated value in CSUMT
    ajmp exitint    ;Restore all registers and Exit
    ;A character was received here
        setb TF0 ;Run timer after this
        jbc RB8,rcvcont ;See if last bit of 10-bit UART data is 1
          clr GOTANYBYTE  ;It isn't so wireless is messed
          sjmp exitint3   ;so exit
        setb GOTANYBYTE ;We got a byte.
        jb SERPACKET,exitint3 ;See if we already have a packet
        mov SPSW,PSW      ;We don't so save registers and A
        mov PSW,#SBANK
        mov R2,A
        mov A,SBUF        ;Set A=R3=Serial input
        mov R3,A
        xrl A,#STARTPKT   ;See if the input is our starting packet value
        jnz rxredo
          clr GOTANYBYTE  ;It is so prepare for reset
          sjmp exitint2   ;and restore used registers and exit
        mov R5,DPH  ;Save DPTR
        mov R7,DPL
        mov A,R0      ;See if our slot is the checksum slot
        xrl A,#CSUMRA
        jz nocsr
          mov A,R3    ;It isn't so calculate checksum
          xrl A,CSUMR
          mov DPTR,#CRCT
          movc A,@A+DPTR
          mov CSUMR,A ;and update checksum value
        mov A,R3    ;Get low nibble of the incoming data
        anl A,#0Fh
        cpl PNIBBLE
        jnb PNIBBLE,nrxn1 ;See where to put it
          swap A            ;Put it as high nibble
          mov @R0,A         
          sjmp exitint      ;and exit for now
        xch A,@R0   ;Put next incoming nibble as low nibble to complete byte.
        orl A,@R0   ;These 3 instructions = XRL @R0,A but that isn't valid on 8051
        mov @R0,A
        mov A,R0   
        subb A,#SRX ;See what state we are at.
        rl A
        mov DPTR,#rtable
        jmp @A+DPTR ;and go to the state
        ;Here the state requires increment
        mov A,@R0
        inc R0
      mov DPH,R5 ;restore DPTR
      mov DPL,R7
      mov A,R2      ;restore A and PSW
      mov PSW,SPSW
    reti    ;and exit
    ;Byte 1 received. this function called
    ajmp rxdfc
    ;Byte 2 received. this function called
      mov A,@R0
      anl A,#1Fh
      cjne A,SIR4,rxdfc ;checks to see if data doesn't match bad data
        clr GOTANYBYTE ;its bad here
    ajmp exitint
    ;Last byte in packet. this is processed
      mov A,@R0
      clr GOTANYBYTE ;We make a reset
      cjne A,CSUMR,exitint
      ;here checksum is valid
      mov A,SRX
      anl A,#1Fh
      cjne A,SIR4,exitint ;see if the packet is for us
      orl SERBITS,#MASKMYPKT ;= setb mypacket
      mov R0,#RXDAT ;local buffer to prevent split data
      ;copy data to local memory so it doesnt get overwritten
      mov @R0,SRX+SF_ADDR2 
      inc R0
      mov @R0,SRX+SF_DP0
      inc R0
      mov @R0,SRX+SF_DP1
      inc R0
      mov @R0,SRX+SF_DP2
      inc R0
      mov @R0,SRX+SF_DP3
      inc R0
      mov @R0,SRX+SF_DP4
      inc R0
      mov @R0,SRX+SF_DP5
      inc R0
      mov @R0,SRX+SF_DP6
      inc R0
      mov @R0,SRX+SF_DP7
    ajmp exitint
    ;The states (Byte positions)
    ajmp raddrme ;1st byte = my address
    ajmp raddrthem ;2nd byte = their address
    ajmp rxdfc ;3rd byte = data
    ajmp rxdfc
    ajmp rxdfc
    ajmp rxdfc
    ajmp rxdfc
    ajmp rxdfc
    ajmp rxdfc
    ajmp rxdfc 
    ajmp rchksum ;last byte = checksum
    ajmp resetrc ;Sanity check: any value too high=reset
    ajmp resetrc
    ajmp resetrc
    ajmp resetrc
    ajmp resetrc
      clr GOTANYBYTE
    ajmp exitint
    ;Start transmission
      jnb TXR,$ ;stall until last transmission is done
      clr TXR
      clr IT0   ;Shut off timer
      clr PNIBBLET  ;reset nibble position
      mov CSUMT,#CRCSEED ;reset checksum for transmitter
      mov SIR1,#STX-1    ;Set data offset to 1 before since transmit routine increments first
      mov SBUF,#STARTPKT ;Send special start byte
    ;Low priority timer. Serial has high priority
    ;This is called when timeout happens or when serial is called
      mov TL0,#RCHARTIMEOUTL ;reset timeout value
      jbc GOTANYBYTE,notmo ;See if we got valid byte
        clr PNIBBLE ;We don't or end of packet is reached so reset receive nibble
        mov CSUMR,#CRCSEED ;and reset receive checksum seed
        mov SIR0,#SRX ;and reset receive pointer
        mov ACCR,A ;save accumulator
        jbc SERPACKET,dopacket
        mov A,ACCR ;restore accumulator
    ;Stuff happens here when valid packet is received
    ljmp ndp
[ - ]
Reply by matthewbarrJanuary 2, 2019

Wow. Kudos for nutting this out in assembler and getting it to the point of functional even with some performance issues. It isn't obvious why you're coding in assembler instead of C. I have no experience with the device you're using but would think you could get information on a C tool chain from Microchip. A quick tour of their web site led me to a UART code example here for their 8051 family in both C and assembler. There is also the option of using SDCC, I've used it successfully with Silicon Labs' C8051 devices, it used to be more popular before Silicon Labs began offering a free version of the 8-bit Keil tools with no code size limits. Some times you have to write a little bit of assembler, but in general you'll be much more productive, code fewer bugs, and find things easier to maintain in C. I wrote a LOT of 2560 and Z80 assembler in the early 80's, and I don't miss it very much at all. You can usually do all the bit-twiddling you need to do in C.

One comment about your code, it looks like you're doing all your receive processing in the ISR. In general you should keep your ISRs as short as possible, they're a classic less is more and more is less situation. A better approach is to define a circular buffer and have your receive interrupt simply add data to the buffer and return. You then have a function or subroutine, eg. rxprocess(), that you call repeatedly, for example from a main() super loop, that looks for a non-empty buffer and processes the data. You could have a getc() function or subroutine that gets a character from the buffer. The standard getc() is blocking which you absolutely don't want. You'll need a way to either test the RX buffer for empty condition, or for getc() to return a character and valid status.

This applies to the transmit side as well, use a circular buffer and transmit interrupt. The transmit ISR disables the TX interrupt when it empties the buffer, and you could have a putc() function or subroutine that adds a character to the buffer and enables the TX interrupt when adding to an empty buffer. You can also construct a blocking writeln() routine that iterates using getc() to add multiple characters to the transmit buffer. Since the transmit interrupt is running, writeln() can safely block and wait for a buffer spot to free up (shouldn't take long), but you should size your TX buffer to hold a complete packet and avoid this case. 

It is important to make sure that the getc()/putc() routines pay attention if they access volatile variables (those that can be changed by the ISR.) Assume your circular buffer has a read and write pointer, the function that adds data to the buffer updates the write pointer, the function that removes data updates the read pointer. The putc() routine has to be very careful that it doesn't incorrectly decide it doesn't have to re-enable the TX interrupt because a transmit interrupt fired and emptied the buffer between the buffer empty test and the add character update. One way to guarantee this is to disable interrupts around test and add, a more clever method is to add the character first, then test to see if there is only one character in the buffer and re-enable the TX interrupt if so.

There are good reasons for structuring your code this way. One is that your interrupt service time is as short as possible, ISRs should be brief and non-blocking. Another is that you have much more time to handle received data. You'll also find you can reuse this ISR and buffer method in other projects that use a UART. The scheme is general and doesn't care about data content, it just shovels bytes around. You can use it anywhere and it will serve you well.

Your concern about amount of code you can execute per character time goes away with the circular buffer strategy. You no longer care about the worst-case path through that complex receive interrupt logic. All you have to do is get data from your circular buffer fast enough to prevent overflow, and you can increase the buffer size to give yourself more time. You now have a LOT more time to perform your RX processing without worrying about data loss. It is also easy to detect buffer overrun by setting a flag if your RX interrupt encounters a full buffer.

This scheme doesn't apply to just UART handling, it applies in general to most low-level interrupt-driven TX/RX interfaces. I know I'm talking about a re-write and a new tools chain, you may well not want to go there on this project and I understand. 

[ - ]
Reply by mik3caJanuary 2, 2019

I put your text in italics and my relevant replies follow.

Wow. Kudos for nutting this out in assembler and getting it to the point of functional even with some performance issues...

And its the performance issues I need to nail down.

It isn't obvious why you're coding in assembler instead of C.

I could do hard-core machine coding by using hex codes but that's way worse. The thing with C is that you're relying on whoever made the functions to convert C code to assembly code to make the right opcodes for you. Sadly, computers crash year after year. See these videos:

A quick tour of their web site led me to a UART code example here

I'll take a look

Some times you have to write a little bit of assembler, but in general you'll be much more productive, code fewer bugs, and find things easier to maintain in C. Ok C is easier to maintain but I want my code to be 100% bug free and 0% hackable so I'm going straight down to the hardware level.

The way I presented my code is different from how I format it for myself. this is because I made myself a front-end compiler for my code in C and then that way I can group relevant code together on the same line. So I could do code like this:

mov A,1h/mov B,2h/setb C/ret

instead of:

mov A,1h
mov B,2h
setb C

Then my own compiler converts the code so the assembler can process it. in the above snippets, my compiler would convert the top snippet to the bottom snippet. this saves me from having 100,000 vertical lines of code.

One comment about your code, it looks like you're doing all your receive processing in the ISR...

Almost. I'm trying to make this code adaptable to both the master and slave. As soon as slave receives the packet, it must reply to the master asap. So I thought it would be wise that on each character receive that everything else gets interrupted while the character is loaded into its memory space with the checksum being calculated at the same time. If I waited until all the bytes were received first before calculating the checksum, wouldn't that be worse? because on the last byte, more processing is required?

In general you should keep your ISRs as short as possible

Thats my objective

A better approach is to define a circular buffer and have your receive interrupt simply add data to the buffer and return.

Considering I used up all the register banks, this may be a challenge to do. I gotta weigh this idea out first and see...

There are good reasons for structuring your code this way. One is that your interrupt service time is as short as possible, ISRs should be brief and non-blocking. Another is that you have much more time to handle received data.

I'm trying to understand the last line. Because in the interrupt I have now, the data is checked immediately, where as with your idea, the interrupt must end and the program must continue to run from where it left off until it reaches the point of where data is to be checked. For example:

Your idea (in general):

  mov @R1,SBUF
  inc R1

lcall function1
lcall function2
lcall function3
lcall processcharacter
lcall function4 ;**
ljmp mainline

If in the above case, the interrupt happened inside function4, and I used your idea then character will be inserted into the buffer then the first three functions have to be processed before the character gets processed. The is my setup right now:

My idea (in general):

  lcall processcharacter

lcall function1
lcall function2
lcall function3
lcall function4 ;**
ljmp mainline

In my code the character is processed instantly.

Your concern about amount of code you can execute per character time goes away with the circular buffer strategy.

I can agree I can execute more code between character times but question is will the right code be executed in time?

All you have to do is get data from your circular buffer fast enough to prevent overflow, and you can increase the buffer size to give yourself more time.

Sad news is my clients are literally almost out of ram but the server has lots of ram. I'll be lucky if my clients have 8 or 16 bytes free. Maybe a big buffer is what I need after all.

[ - ]
Reply by matthewbarrJanuary 2, 2019

Here's the argument for having more time to process RX data with a simple ISR and buffer.

With an RX buffer of N character times and a simple ISR, you basically have N character times before you need to start pulling data from the buffer to avoid dropping data. If N holds a complete packet you have all the time you need. The function processing RX buffer data should loop and attempt to empty the buffer each time it is called, as opposed to processing only one character as in your ISR. The assumption here is that it's called frequently enough to keep up generally, and only in rare circumstances will it iterate over several characters.

This also means you'll have most of the checksum computation done before the last byte arrives. Your latency from last byte to RX packet valid will be slightly worse, but the difference should be very small. You're not waiting to process the entire packet, you're still doing it on-the-fly as characters come in, you're just doing it from your main loop instead of from the ISR.

UART RX hardware typically has a shift register and data register or FIFO, this can give you some slack but if your ISR per-character handling is not keeping up with the incoming character rate, you're likely going to get behind and drop data at some point. This may or may not be the nature of your performance issue, you should nail down your min/max/average system latencies. Again, I suggest a hardwired UART connection, PING message/response, pulse I/O ports on key events and gather data. You might also check your UART hardware for RX overrun indications and make this visible if it occurs.

All functions you call from your main loop should be stateful and non-blocking. By this I mean that they should never spin waiting for an event. There should be a state associated with the event and in that state the function should test/exit, test/exit, ... until the event fires and you advance to the next state. Stateful and non-blocking.

Sadly I cannot argue that any of this will fix your problem, these are general design practices that work well. With regard to a change you might make, you could consider sending RX addr first, using this as a unique SOM (start of message) in each receiver. The problem you have is that you may see SOM elsewhere in your data and be fooled. Assuming your transmit side doesn't produce dead time between characters, re-starting packet reception after a several character time gap in RX data should help with this. Basically you want to detect SOM after a gap in RX data, and ignore it in back-to-back RX character data. That is essentially what the MODBUS serial standard does, and it requires no gaps in the transmitted character stream.

You have some interesting rationalizations for not using a compiler in 2019!

The Microsoft failures you mention have everything to do with product complexity and verification, and little if anything to do with use of a compiler. If Microsoft wrote everything in assembler they would be worse off in most every imaginable way: cost, schedule, quality, features, maintainability, you name it. If you tried to keep any one of cost, schedule or quality where it is now, the other two would suffer dramatically writing everything in assembler.

I'm not suggesting compiler bugs don't exist, they do. In 6 years I managed to trip over one SDCC bug, I fooled its dependency analysis by trying to be clever with assignment order and save a line of source code. Commercial offerings like Keil and IAR are typically going to be better in terms of quality and correctness of generated code, and I suspect GCC is closer to Keil and IAR.

I don't understand your comment about your assembler code being "0% hackable." Your attack surface is pretty much the same, particularly in an embedded device, whether your image is assembled and linked from hand-generated vs. compiler-generated instruction sequences.

If you chose the AT89S52 because it comes in a DIP package, I'm not sure what your other device options are. If that is not a constraint you might consider the Silicon Labs C8051 devices for future projects. They have a wide variety of offerings with lots of memory resource on die. Many variants offer 2k to 8k of XRAM (indirect addressing mode) in addition to the standard 256 bytes of RAM. Documentation is great, lots of example software, free tool chains (Keil, SDCC) and IDEs, and inexpensive development kits with power supply, programming adapter and USB cables included. It's nice not having your hands tied with resource and tool chain constraints.

[ - ]
Reply by mik3caJanuary 2, 2019

I did order some AT89S8253's online which seems to have an X2 option (double speed) but I'll have to wait a while for them.

In the meantime, could I get a speed boost if I put the above serial routine in a separate microcontroller and have it buffer only the data then it will then become the serial port that only exchanges the data with the system? I'll try and see.

[ - ]
Reply by mik3caJanuary 2, 2019
In the midst of trying super compressed interrupts...

Ok, so I was beginning to modify my code to shorten time spent in interrupts and after calculating things, I was able to shorten the time down by half. Now If I continue with this interrupt approach that someone suggested, I probably do not have enough free memory left to process them all.

This is because I need two pointers. One as the "head" and the other as a "tail". (That's what the guy at calls them: Basically the head pointer advances to the next free memory space in the circular buffer after a character is loaded in and the tail is advanced when the user routine wants the character. And for proper alignment (to allow me to use AND logic on pointers), I need to reserve 16 bytes for the incoming buffer plus 2 bytes for pointers.

That's only for extracting the characters one at a time from the buffer as the non-priority function needs them.

I also need 2 extra variables. One to store the nibble and one to store which byte number is being picked up so that the system knows if the checksum result should be calculated or not.

Then I need another buffer of the packet size and fill it with the raw characters as each one is processed. Then finally the checksum is calculated, and if it is correct then the buffer gets copied over to the userspace buffer which the application can use for more processing. Without the userspace buffer, I'd lose characters on the fly. So in the end, I'm looking at the need of 52 bytes of memory, but if I can somehow lower the memory requirement without corrupting data and without making the program slower, I'm open to ideas on that.

[ - ]
Reply by mr_banditJanuary 8, 2019

A few things:

1) Matthewbarr is spot-on. He is saying everything I would tell you. This also includes CustomSarge - we are all heavily scarred veterans of the embedded wars. 

2) you don't need pointers - indexes work on circular buffers. And - if you put the buffer on the right boundary, and a power of 2 length (say 16 bytes on a 16-byte boundary, you can do a modulo (mask) for the head and tail indexes to save code space.

3) You can process the buffer on the fly - reading the input buffer - and use the results only for well-formed packets. Basically, if the head & tail indexes are !=, you have a byte. No need for a  separate buffer. This appears to be what you wee doing in an ISR, but it gives you a lot more flexibility. Divide and conquer really helps and keeps your code simple.

4) You may be locked into an 8051 for this project. Seriously look for alternatives in future projects. I happen to be an Atmel freak - they have a lot of good tools && are competitive. Good peripherals. And - Arduinos use them, so you can get hardware that makes prototyping much easier with built-in serial and USB.You can program in C (woowoo).

[ - ]
Reply by mr_banditJanuary 8, 2019

Echoing @matthewbarr: you really need an oscope to be in this biz. The two best tools I own is my Estwing 20oz framing hammer and my Tek scope.

I highly recommend the Rigol DS1054Z

It has the same basic capability as my (10+yo) Tek 2024B $2400 scope. I have used the Rigol && was very impressed. When my Tek gives out, or I decide I need a second scope, it is at the head of the queue.

Pro tip: always go for as many oscope inputs - 4 is way better than 2.

You will not hear it from me there is a tool (ask Mr Google) that will convert the 50MHz to 100MHz...

There is an old trick with a spare pin - assert it at the start of something, and deassert at the end && measure the time with the scope. Useful for debugging ISRs. Also, if you really need to optimize a chunk of code, you can toggle the pin for different sections to see the timing for each section.

(You should always have at least one spare pin, preferably several. Bring them out to berg headers.

Also - get a case with the foam cubes to store/carry the scope in. Roughly $50 for the low end, or go for the gusto && get a Pelican (or clone). Make sure you have enough room for the cables/probes/power cord/etc. 

[ - ]
Reply by MichaelKellettJanuary 1, 2019

If I've understood the problem correctly:

The master can send out a command packet which instructs all the slaves to send data at specified delay times from the command packet. You can use a bit field  to specify which of the slaves should speak (4 bytes for that) and a further byte to say how much delay per active slave.

This way you need about half the bandwidth.


[ - ]
Reply by CustomSargeJanuary 1, 2019

Couple things: You can compress the addresses into 1 byte - MSb is master, bits 6-0 are slave. Many protocols are possible, but 2 easy ones:

Proto 1: master->slave1, slave1>master, slave2 delay then slave2->master, slave3 delay then slave3->master etc. A poll->response round robin for slaves all having equal tasks and response frequency.

Proto 2: master->slaveN, slaveN->master etc. A poll->response directly addressed for slaves having different tasks requiring different polling frequencies.

Neither has any bearing on data length: short, long, a "type" or # in the header. Totally dependent on system behavior. But, both require packet and/or byte timeout with CRC check and a "Nak" sent during the responder delay signalling retransmission required. The delay timeout without Nak infers good transfer.

I built a system of lifts requiring syncing of heights in both up/down. It was like Proto 1 except the master/slave addressing was only at initialization, any unit could command a move thereafter. During a move, active units sent their position in unit address order, all units kept track of all positions. Wired at first, then wireless using spread-spectrum FM, from 1 to 16 units. This was several yeas ago, but the radios worked fine, even with arc welders and other machinery nearby. Bit rate was 115.2k, but developed at 9.6k, then 28.8k.

You may find the need to streamline packet processing in assembler. They can be called functions or inline. My system was all assembler ~4900 lines.

[ - ]
Reply by mik3caJanuary 1, 2019

Thanks for your ideas.

After further research before arriving here again, I will try some ideas from the modbus protocol ( especially the use of a unique character as the start of stream flag.

[ - ]
Reply by CustomSargeJanuary 1, 2019

Enter "ASCII" in Wikipedia - the nice chart in lower right of page 1 is Identical to a card I've had a Long Time. Everything you'll need for protocol commands are there.

One cute way I've used is to have 2 interrupts, one on a timer, other on rx character ready. The int on rx rdy resets the timer, which is setup to be Just longer than a char rx time. When the timer int beats the rx, string is done.

[ - ]
Reply by mik3caJanuary 1, 2019


I use an idea similar to yours except that the timer acts as a timeout interrupt but the serial routine forces a timeout when the full packet is received to minimize delay. After all, the timeout routine causes a reception reset which is what I want.