## Data packet format that transfers at best speed - 8051

Started by 3 weeks ago19 replieslatest reply 1 week ago40 views

After analyzing my own code 500,000 times, I'm starting to learn my serial data format isn't perfect.

Currently, my packet is arranged as 11 bytes as follows:

Byte 1: Sender
Bytes 3 - 10: Data
Byte 11: 8-bit CRC checksum of all previous bytes


I'm trying to do data transfer at 56kbps between two micros using this data format both in a wired and wireless environment, and I couldn't get a decent speed for transmission. I'm trying to have the packet transmitted out and data received within 10ms but that's not successful. I can however do it in 20ms.

I have verified other code that can possibly slow down the transmission and those routines take only a few milliseconds to execute at most so the only thing left to blame is my packet. The micros in question are AT89S52 with 22.1184Mhz crystal attached.

The reason why I'm blaming the packet is because if the data for any reason is out of sync (example: if power to one micro is applied a few microseconds later than the other micro), then in this situation, I could waste data time.

I'll explain my issue by example:

Say micro 2 expects the data string: 11 xx xx xx xx xx xx xx xx xx 77

Micro 1 this time manages to send the correct string.

micro 2 responds with correct data string: 22 xx xx xx xx xx xx xx xx xx 88

That's normal.

But what if micro 1 has problems and sends something like 11 xx xx xx xx *power cut* 11 xx xx xx xx

Then my protocol will have micro 2 pick up the full packet until the checksum byte which is several wasted bytes picked up and a waste of time.

In the end, What I'm trying to do is...

In the end, I'm trying to create a system so that one micro is the master and sends a packet out to each of the connected micros (about 24) wirelessly in turn and when each micro receives the valid data, it sens the correct response. The process repeats infinitely. One byte in the packet identifies who is supposed to receive the packet.

Is there a better way I can do this without causing any micro to receive up to 10 wasted bytes due to disconnection or power outage or the like?

I mean the only thing I can think of off the top of my head is to make an ultra-short packet but the byte size requirement per packet is quadrupled.

Example packet:

Byte 1: Sender
Byte 3: One byte of data
Byte 4: Checksum


If I did that for the entire 8 bytes of data, then I'll need 32 bytes as opposed to 11 bytes to send out.

What do you guys think? How do I fix my issue?

[ - ]

As with any shared transmission media, no matter if wired or wireless (but it's many times as bad when one goes wireless) you have to deal with collisions, lost data and other stuff that will mess with your timings and synchronization. There is no way around it and many PHD thesis were written on the subject and patents filled.

Also, not a small number of projects were working somehow in the lab conditions and failing miserable in production because noise and perturbation.

So IMNSHO, without long and careful simulation of the worst case possible, adjusting the protocol and packet size for the highest reliability and testing in noisy conditions, you won't be able to produce a report to explain in a ćlear way that what you want it's 169% IMPOSSIBLE, especially considering radio modems, that while offering a nice & simple interface for sending data, they're totally isolating you from how the radio protocol actually works, I can tell it based on my experience with Wireless mBUS, where a customer wanted to get ALL the data packages sent by a meter in an industrial environment with electrical noise and other meters around sharing the bandwidth. Of course, naive back of napkin calculations showed that should be possible.

So sorry to disappoint you, but that will be a difficult if not impossible project.

[ - ]

In the case you described, one device lost power and/or stop communicating before it could finish the transmission.  The result of this is garbage( unfinished ) packet in the data stream.  What you need is the ability to gracefully handle this problem. This means you need the ability to detect it occurred and then recover from it.

The first problem is detecting it.  One way to accomplish this is to use a fixed length packet.  What this does for you is allows you to timeout the transmission.  You know from the baud rate that an 11 byte packet will complete in ~2 ms.  So when you receive the first character, you know that the remaining packet will arrive in less than 2 ms.  So at the end of 2 ms if you didn't receive 11 bytes then an error occurred.  The fixed length packet simplifies the problem, but this can done using a variable length packet also.  I'll discuss that in a minute.

Now that you know an error occured the next problem is resyncing to the the data stream.  For this you could use a sync pattern  By extending the protocol to always begin the packet with a 2 byte sync pattern say 0x55, 0xAA then you can always tell where in the data stream a new transmission starts. However, the sync pattern must be chosen such that it does not occur naturally within the data stream.  The sync pattern can be as many bytes as needed to ensure you can sync to the data, but the longer it is the more data you have to send thus more time for transmission.

Additionally, assuming the sync pattern chosen does not occur naturally within the data stream and you are using a fixed length packet, then if you encounter the sync pattern before the packet is complete once again you have an error.

As I stated earlier, the above can also be done using a variable length packet but it requires that you put a byte count into the packet so you know how many bytes are to be expected.  The byte count is then used to set the timeout.  If this method is chosen the byte count should be the the first byte after the sync pattern so as to minimize the chance for errors.  Since, you know the maximum amount of data that will be transfer. If you receive a byte count greater than the maximum, then this is also an error.

The above suggestions increases the amount of transmission data but is should still fit within the 10ms you are attempting to achieve.

Additionally CustomSarge's idea of: "You can compress the addresses into 1 byte - MSb is master, bits 6-0 are slave".  would help reduce the overall byte count some.

For the situation were you are using wired communications and it is taking 20 ms instead of 10ms you expected, this likely a coding error somewhere. I would ensure that the routines processing the UART and not putting gaps in the data stream.

When transmission is done wireless there are a lot of factors that come in to play.  These include but limited to over-the-air collisions, transmitter power up time, and over-the-air transmission rates.

[ - ]

You must have typed this as I was putting your suggestions in practice.

Now my new packet format is as follows:

Byte 1: 0FEh - Value to tell all clients to reset their receiver state
Byte 2: 0nh
Byte 3: 0nh
...
Byte 20: 0nh
Byte 21: 0nh
Byte 22: 0nh
Byte 23: 0nh


First byte I explained. Then the original 8 bytes of data including the checksum are all divided up into nibbles and each byte will have the nibble in the LSB side of the byte (will replace n in packet format shown above with 4-bit value). This does inflate my packet from 11 bytes to 23 bytes but it gives me the ability to detect the starting byte.

I set my over-the-air transmission rate and UART rate on both the radio module and micro to 56kbps to prevent choking on the line. Should I just set the over-the-air rate to 115kbps and leave the rest at 56kbps or will that make matters worse?

[ - ]

I'm not sure understand your new packet format but it seem to be a bit over complicated and I'm not sure it accomplishes any thing. Additionally, I'm not sure you have a guarantee of being able to detect the starting byte. Unless your normal data is always ASCII, then 0xFE could occur in the normal data.  This would have the effect of a false start of packet. My suggestion was as follows:

Byte 1: 0x55
Byte 2: 0xAA
Byte 3: Sender
Bytes 5 - 12: Data
Byte 13: 8-bit CRC checksum of all previous byte

Using this then you always look for the 2 byte sequence of 0x55, 0xAA to indicate start of packet.  This is not fail proof either, but because it uses 2 bytes then the likely-hood of a false positive is reduced.  To further reduce the chance for a false positive one would need to escape the sync pattern if it occurred within the data stream.  To escape the pattern within the data then you could simple place a 0x7E in front of a 0x55 or 0xAA if they occur in the data portion only.

Example:

55 AA 11 22 00 01 02 80 7E 55 7E AA 32 30 00 00 <cksum>

Notice the 7E 55 and 7E AA.

Now referring back to your new format; If I understood then you are sending 0xFE to tell all receiving devices to reset their receive buffers.  Does this mean that all receiving devices will responded?

If so this is bad for radio transmission, because one and only one radio can transmit at any given time. If more than one radio attempts to transmit at the same time then a collisions will occur and each transmitting device will have to retry.  This will certainly cause the transmission to take a lot longer time and this will be even worse if the transceivers are not using some form of CSMA/CD (Carrier Sense Multiple Access / Collision Detection).

As for your question regarding changing the Over-the-Air Rate to 115K, it can make things worse if there is RF interference within the environment. For now my recommendation is to get it working properly first, then you can attempt to increase the Over-the-Air rate.

I believe you have a couple of problems in this design.  My recommendation is to use only 2 transceivers Master and Slave.  Get the 2 units working reliably together.  Test them until you are sure everything is working well.  Then start adding more slave units.

[ - ]

The reset (code 0FEh) is sent as a synchronization to all slaves to reset their internal pointers to the data so that when the next character is sent by the master, the receivers will think its the 1st data character instead of the nth character.

And yes I'm currently dealing with one master and one slave at this time.

[ - ]

It sounds like you may have a problem reliably framing receive packets, particularly in the presence of errors where you have to flush data so you have an empty buffer when you receive the first byte of the next packet. This is the issue I had in mind in an earlier response when I mentioned flushing the RX packet buffer. You have this issue in the normal case when a received packet is addressed to another node, and also in a variety of error situations. You do not have a unique start or end of packet identifier, and you have a 1/256 probability of randomly getting a valid checksum.

As long as your packet contains raw binary data, any byte is possible and you do not have a unique start of message byte. MODBUS defines one, but it can appear in the data and MODBUS serial relies on enforcement of minimal time gaps between message bytes to keep things straight. Note, if you swap source/destination address bytes in your packet format, each destination now has a unique first message byte that is its address. You get the equivalent of MODBUS SOM functionality for free without adding a byte to the packet format. There are other ways to go that give you truly unique packet delimiters.

As an alternative to changes inspired by MODBUS, have you heard of COBS? A colleague pointed me at COBS some time ago, it stands for Consistent Overhead Byte Stuffing. It's a very low overhead byte-stuffing scheme that removes NULs (0x00) from data. You can send your encoded packet with a trailing NUL, the receiver now has a unique end of data that cannot appear in the data. As a starting point see the Wikipedia page.

On the transmit side you COBS encode your packet and send it followed by a NUL. You can also add a leading NUL as start of data, but that isn't really necessary and typically isn't done. For your 11 byte packet the maximum overhead including trailing NUL should be two additional bytes, three if you add a leading NUL.

On the receive side you might have two states, START (wait for first byte) and DATA (decode and look for trailing NUL.) One beauty of COBS is that you can decode on-the-fly as each byte is received. The papers I've found don't describe this, but it's pretty easy to unroll the decode loop and define a stateful function that takes the next RX byte and produces decoded output. When you restart packet reception you have to initialize COBS decode state in addition to your RX buffer state, etc. This on-the-fly processing reduces the latency from last byte in to RX packet ready because you don't have to decode and process the entire packet, you've already done most of the work which can also include checksum accumulation.

When you are in the DATA state and receive a trailing NUL, you should have a decoded packet with a valid checksum. You have to be careful in the DATA state not to overrun your RX data buffer. You'll probably flush and return to the START state any time there is a gap in RX data, or on buffer overrun after a NUL is received. With both leading and trailing NUL there should be no need for gap detection timeout handling.

Assuming you add 2 bytes overhead to each 11 byte packet, your packet time goes from 1.96 ms to 2.32 ms, adding 0.357 ms to the transmission time. You'll also add a little processing time for COBS encode and decode, although you can hide decode processing as described above. What you get for this is simple, reliable and fast RX packet handling and error rejection.

With all that said...

My concern is that you have not done a thorough job characterizing and understanding the system latencies that are taking you to your 20 ms number. If you are unable to achieve your performance target using two nodes with hardwired UART connections, you have to be missing something. Hopefully you have a PING message/response that you can use to characterize delays. You might consider pulsing I/O ports on master send, client receive, client send and master receive, and observe these with a scope. This should help you quickly pin down your delays and see if one or more of them is off in the weeds. Until you go through this exercise or equivalent, you are basically guessing at the problem source and hoping you fix it with a related change. You might get lucky, you might not. Debug is like detective work, some guys like to gather facts, others like to go with their gut. I think you can tell which camp I'm in!

I would certainly do that before I implemented COBS or any other non-trivial change and found out that it didn't help. I really think you have a firmware packet handling issue or driver issue somewhere, particularly if you get this 20 ms number for a PING message/response involving minimal processing and overhead.

Budget and resources depending, if you're looking for a low-cost oscilloscope you might consider the Rigol DS1054Z. It's a great scope for the money, and serial decode and other options are bundled with the scope. I've been very impressed with it, and love that I can connect it to my LAN and run a shell script on my PC to extract a screen image and save it to a file. The best price I found was tequipment.net where they give you a modest price break if you create an account. (I have no incentives or relationship with Rigol or tequipment.net, just sharing my experience.)

One additional change to your packet format you might consider is an embedded byte count after the two address bytes. This could be the total decoded packet length or just the data payload length. (Payload length is my choice, mapping values 0-255 to payload lengths 1-256, disallowing a 0-byte payload, allowing up to a 256 byte payload with an 8-bit byte count value.) This is a nice consistency check for the total decoded packet byte count.

[ - ]

Ok, I'm showing my entire serial routine. Only thing I didn't include was netdata and serialmem address and I didnt include setting up SCON and IE and IP registers. Also, I set all the memory to zero before routines start.

Is there any way I can improve it?

And another question... Since I'm using AT89S52 with 22.1184Mhz crystal, is it true that at 56Kbps that I can execute up to 320 clock cycles worth of instructions before RI is raised assuming data is continuously coming in at 56kbps?

    RCHARTIMEOUT equ 0FAh ;1 received character timeout - 750uS
RCHARTIMEOUTL equ 099h ;for modbus compliance
STARTPKT equ 0FEh ;start of packet ID. (high nibble. low nib=0)
CRCSEED equ 0h ;Base number (polynomial) for CRC calculations
SBANK equ 08h ;RS1:RS0=0:1
SIR0 equ SBANK
SIR1 equ SBANK+1
SIR4 equ SBANK+4
SIR6 equ SBANK+6
;Netdata and serialmem are defined elsewhere
;Cached network data. 32 bytes. can use extd memory
RXDAT equ NETDATA
TXDAT equ NETDATA+10h
;Serialmem must be local - 36 bytes ram.
SRX equ SERIALMEM
STX equ SERIALMEM+10h
SBU equ SERIALMEM+20h
CSUMR equ SBU  ;checksum recv val to compare
SPSW equ SBU+1h  ;backup
TXR bit SERBITS.0 ;packet transmitted
GOTANYBYTE bit SERBITS.3 ;1 byte received of any raw serial byte
PNIBBLET bit SERBITS.4 ;packet nibble xmit
PNIBBLE bit SERBITS.5 ;packet nibble
MYPACKET bit SERBITS.7 ;packet received for me
;Meaning of byte offsets in packet
SF_DP0 equ 2h  ;data 0
SF_DP1 equ 3h
SF_DP2 equ 4h
SF_DP3 equ 5h
SF_DP4 equ 6h
SF_DP5 equ 7h
SF_DP6 equ 8h
SF_DP7 equ 9h ;data 7
SF_CSUM equ 0Ah ;checksum
CSUMT equ STX+SF_CSUM ;checksum transmit value
;Timer routine start
org 0Bh
ljmp tmr0
;Serial routine start:
org 23h
mainserint:
jbc RI,rxdf
jbc TI,txdf
reti
;Previous character was sent at this point
txdf:
cpl PNIBBLET        ;<- Variable to define nibble to use for transmit
mov SPSW,PSW        ;<- Save old bank and switch
mov PSW,#SBANK
mov R2,A            ;<- Save old A
jnb PNIBBLET,ntxn
mov A,R1          ;Executes on odd nibbles
xrl A,#CSUMT
jnz ntxend        ;See if we are in checksum field
setb TXR        ;We are so we are done
clr GOTANYBYTE  ;Tell timer we have nothing
setb TF0        ;and timeout to make reset
setb IT0        ;enable timer
ajmp exitint2   ;restore used registers and exit
ntxend:
inc R1            ;we aren't in checksum field so advance
ntxn:
mov A,@R1           ;Get byte
jnb PNIBBLET,ntxn1  ;See which nibble it is
ntxn1:
anl A,#0Fh          ;we only want nibble
mov SBUF,A          ;send 0x on the wire
cjne R1,#CSUMT,ntxend2 ;See if current byte is checksum field
ajmp exitint2        ;it is so just exit
ntxend2:
mov R5,DPH  ;Save DPTR
mov R7,DPL
xrl A,CSUMT ;Calculate checksum. CSUMT has rolling checksum value
mov DPTR,#CRCT  ;DPTR=lookup table
movc A,@A+DPTR  ;load value from lookup table
mov CSUMT,A     ;store updated value in CSUMT
ajmp exitint    ;Restore all registers and Exit
rxdf:
setb TF0 ;Run timer after this
jbc RB8,rcvcont ;See if last bit of 10-bit UART data is 1
clr GOTANYBYTE  ;It isn't so wireless is messed
sjmp exitint3   ;so exit
rcvcont:
setb GOTANYBYTE ;We got a byte.
jb SERPACKET,exitint3 ;See if we already have a packet
mov SPSW,PSW      ;We don't so save registers and A
mov PSW,#SBANK
mov R2,A
mov A,SBUF        ;Set A=R3=Serial input
mov R3,A
xrl A,#STARTPKT   ;See if the input is our starting packet value
jnz rxredo
clr GOTANYBYTE  ;It is so prepare for reset
sjmp exitint2   ;and restore used registers and exit
rxredo:
mov R5,DPH  ;Save DPTR
mov R7,DPL
mov A,R0      ;See if our slot is the checksum slot
xrl A,#CSUMRA
jz nocsr
mov A,R3    ;It isn't so calculate checksum
xrl A,CSUMR
mov DPTR,#CRCT
movc A,@A+DPTR
mov CSUMR,A ;and update checksum value
nocsr:
mov A,R3    ;Get low nibble of the incoming data
anl A,#0Fh
cpl PNIBBLE
jnb PNIBBLE,nrxn1 ;See where to put it
swap A            ;Put it as high nibble
mov @R0,A
sjmp exitint      ;and exit for now
nrxn1:
xch A,@R0   ;Put next incoming nibble as low nibble to complete byte.
orl A,@R0   ;These 3 instructions = XRL @R0,A but that isn't valid on 8051
mov @R0,A
mov A,R0
subb A,#SRX ;See what state we are at.
rl A
mov DPTR,#rtable
jmp @A+DPTR ;and go to the state
;Here the state requires increment
rxdfc:
mov A,@R0
inc R0
exitint:
mov DPH,R5 ;restore DPTR
mov DPL,R7
exitint2:
mov A,R2      ;restore A and PSW
mov PSW,SPSW
exitint3:
reti    ;and exit
;Byte 1 received. this function called
ajmp rxdfc
;Byte 2 received. this function called
mov A,@R0
anl A,#1Fh
cjne A,SIR4,rxdfc ;checks to see if data doesn't match bad data
ajmp exitint
;Last byte in packet. this is processed
rchksum:
mov A,@R0
clr GOTANYBYTE ;We make a reset
cjne A,CSUMR,exitint
;here checksum is valid
orl SERBITS,#MASKRCVD ;= setb RXR+setb SERPACKET+setb GOTANYBYTE
mov A,SRX
anl A,#1Fh
cjne A,SIR4,exitint ;see if the packet is for us
mov R0,#RXDAT ;local buffer to prevent split data
;copy data to local memory so it doesnt get overwritten
inc R0
mov @R0,SRX+SF_DP0
inc R0
mov @R0,SRX+SF_DP1
inc R0
mov @R0,SRX+SF_DP2
inc R0
mov @R0,SRX+SF_DP3
inc R0
mov @R0,SRX+SF_DP4
inc R0
mov @R0,SRX+SF_DP5
inc R0
mov @R0,SRX+SF_DP6
inc R0
mov @R0,SRX+SF_DP7
ajmp exitint
;The states (Byte positions)
rtable:
ajmp rxdfc ;3rd byte = data
ajmp rxdfc
ajmp rxdfc
ajmp rxdfc
ajmp rxdfc
ajmp rxdfc
ajmp rxdfc
ajmp rxdfc
ajmp rchksum ;last byte = checksum
ajmp resetrc ;Sanity check: any value too high=reset
ajmp resetrc
ajmp resetrc
ajmp resetrc
ajmp resetrc
resetrc:
clr GOTANYBYTE
ajmp exitint
;Start transmission
rawsxmitS:
jnb TXR,;stall until last transmission is done clr TXR clr IT0 ;Shut off timer clr PNIBBLET ;reset nibble position mov CSUMT,#CRCSEED ;reset checksum for transmitter mov SIR1,#STX-1 ;Set data offset to 1 before since transmit routine increments first mov SBUF,#STARTPKT ;Send special start byte ret ;Low priority timer. Serial has high priority ;This is called when timeout happens or when serial is called tmr0: mov TL0,#RCHARTIMEOUTL ;reset timeout value mov TH0,#RCHARTIMEOUT jbc GOTANYBYTE,notmo ;See if we got valid byte clr PNIBBLE ;We don't or end of packet is reached so reset receive nibble mov CSUMR,#CRCSEED ;and reset receive checksum seed mov SIR0,#SRX ;and reset receive pointer mov ACCR,A ;save accumulator ndp: jbc SERPACKET,dopacket mov A,ACCR ;restore accumulator notmo: reti dopacket: ;Stuff happens here when valid packet is received ljmp ndp  [ - ] Reply by January 2, 2019 Wow. Kudos for nutting this out in assembler and getting it to the point of functional even with some performance issues. It isn't obvious why you're coding in assembler instead of C. I have no experience with the device you're using but would think you could get information on a C tool chain from Microchip. A quick tour of their web site led me to a UART code example here for their 8051 family in both C and assembler. There is also the option of using SDCC, I've used it successfully with Silicon Labs' C8051 devices, it used to be more popular before Silicon Labs began offering a free version of the 8-bit Keil tools with no code size limits. Some times you have to write a little bit of assembler, but in general you'll be much more productive, code fewer bugs, and find things easier to maintain in C. I wrote a LOT of 2560 and Z80 assembler in the early 80's, and I don't miss it very much at all. You can usually do all the bit-twiddling you need to do in C. One comment about your code, it looks like you're doing all your receive processing in the ISR. In general you should keep your ISRs as short as possible, they're a classic less is more and more is less situation. A better approach is to define a circular buffer and have your receive interrupt simply add data to the buffer and return. You then have a function or subroutine, eg. rxprocess(), that you call repeatedly, for example from a main() super loop, that looks for a non-empty buffer and processes the data. You could have a getc() function or subroutine that gets a character from the buffer. The standard getc() is blocking which you absolutely don't want. You'll need a way to either test the RX buffer for empty condition, or for getc() to return a character and valid status. This applies to the transmit side as well, use a circular buffer and transmit interrupt. The transmit ISR disables the TX interrupt when it empties the buffer, and you could have a putc() function or subroutine that adds a character to the buffer and enables the TX interrupt when adding to an empty buffer. You can also construct a blocking writeln() routine that iterates using getc() to add multiple characters to the transmit buffer. Since the transmit interrupt is running, writeln() can safely block and wait for a buffer spot to free up (shouldn't take long), but you should size your TX buffer to hold a complete packet and avoid this case. It is important to make sure that the getc()/putc() routines pay attention if they access volatile variables (those that can be changed by the ISR.) Assume your circular buffer has a read and write pointer, the function that adds data to the buffer updates the write pointer, the function that removes data updates the read pointer. The putc() routine has to be very careful that it doesn't incorrectly decide it doesn't have to re-enable the TX interrupt because a transmit interrupt fired and emptied the buffer between the buffer empty test and the add character update. One way to guarantee this is to disable interrupts around test and add, a more clever method is to add the character first, then test to see if there is only one character in the buffer and re-enable the TX interrupt if so. There are good reasons for structuring your code this way. One is that your interrupt service time is as short as possible, ISRs should be brief and non-blocking. Another is that you have much more time to handle received data. You'll also find you can reuse this ISR and buffer method in other projects that use a UART. The scheme is general and doesn't care about data content, it just shovels bytes around. You can use it anywhere and it will serve you well. Your concern about amount of code you can execute per character time goes away with the circular buffer strategy. You no longer care about the worst-case path through that complex receive interrupt logic. All you have to do is get data from your circular buffer fast enough to prevent overflow, and you can increase the buffer size to give yourself more time. You now have a LOT more time to perform your RX processing without worrying about data loss. It is also easy to detect buffer overrun by setting a flag if your RX interrupt encounters a full buffer. This scheme doesn't apply to just UART handling, it applies in general to most low-level interrupt-driven TX/RX interfaces. I know I'm talking about a re-write and a new tools chain, you may well not want to go there on this project and I understand. [ - ] Reply by January 2, 2019 I put your text in italics and my relevant replies follow. Wow. Kudos for nutting this out in assembler and getting it to the point of functional even with some performance issues... And its the performance issues I need to nail down. It isn't obvious why you're coding in assembler instead of C. I could do hard-core machine coding by using hex codes but that's way worse. The thing with C is that you're relying on whoever made the functions to convert C code to assembly code to make the right opcodes for you. Sadly, computers crash year after year. See these videos: A quick tour of their web site led me to a UART code example here I'll take a look Some times you have to write a little bit of assembler, but in general you'll be much more productive, code fewer bugs, and find things easier to maintain in C. Ok C is easier to maintain but I want my code to be 100% bug free and 0% hackable so I'm going straight down to the hardware level. The way I presented my code is different from how I format it for myself. this is because I made myself a front-end compiler for my code in C and then that way I can group relevant code together on the same line. So I could do code like this: function: mov A,1h/mov B,2h/setb C/ret  instead of: function: mov A,1h mov B,2h setb C ret  Then my own compiler converts the code so the assembler can process it. in the above snippets, my compiler would convert the top snippet to the bottom snippet. this saves me from having 100,000 vertical lines of code. One comment about your code, it looks like you're doing all your receive processing in the ISR... Almost. I'm trying to make this code adaptable to both the master and slave. As soon as slave receives the packet, it must reply to the master asap. So I thought it would be wise that on each character receive that everything else gets interrupted while the character is loaded into its memory space with the checksum being calculated at the same time. If I waited until all the bytes were received first before calculating the checksum, wouldn't that be worse? because on the last byte, more processing is required? In general you should keep your ISRs as short as possible Thats my objective A better approach is to define a circular buffer and have your receive interrupt simply add data to the buffer and return. Considering I used up all the register banks, this may be a challenge to do. I gotta weigh this idea out first and see... There are good reasons for structuring your code this way. One is that your interrupt service time is as short as possible, ISRs should be brief and non-blocking. Another is that you have much more time to handle received data. I'm trying to understand the last line. Because in the interrupt I have now, the data is checked immediately, where as with your idea, the interrupt must end and the program must continue to run from where it left off until it reaches the point of where data is to be checked. For example: Your idea (in general): ISR: mov @R1,SBUF inc R1 retf mainline: lcall function1 lcall function2 lcall function3 lcall processcharacter lcall function4 ;** ljmp mainline  If in the above case, the interrupt happened inside function4, and I used your idea then character will be inserted into the buffer then the first three functions have to be processed before the character gets processed. The is my setup right now: My idea (in general): ISR: mov CHARACTER,SBUF lcall processcharacter retf mainline: lcall function1 lcall function2 lcall function3 lcall function4 ;** ljmp mainline  In my code the character is processed instantly. Your concern about amount of code you can execute per character time goes away with the circular buffer strategy. I can agree I can execute more code between character times but question is will the right code be executed in time? All you have to do is get data from your circular buffer fast enough to prevent overflow, and you can increase the buffer size to give yourself more time. Sad news is my clients are literally almost out of ram but the server has lots of ram. I'll be lucky if my clients have 8 or 16 bytes free. Maybe a big buffer is what I need after all. [ - ] Reply by January 2, 2019 Here's the argument for having more time to process RX data with a simple ISR and buffer. With an RX buffer of N character times and a simple ISR, you basically have N character times before you need to start pulling data from the buffer to avoid dropping data. If N holds a complete packet you have all the time you need. The function processing RX buffer data should loop and attempt to empty the buffer each time it is called, as opposed to processing only one character as in your ISR. The assumption here is that it's called frequently enough to keep up generally, and only in rare circumstances will it iterate over several characters. This also means you'll have most of the checksum computation done before the last byte arrives. Your latency from last byte to RX packet valid will be slightly worse, but the difference should be very small. You're not waiting to process the entire packet, you're still doing it on-the-fly as characters come in, you're just doing it from your main loop instead of from the ISR. UART RX hardware typically has a shift register and data register or FIFO, this can give you some slack but if your ISR per-character handling is not keeping up with the incoming character rate, you're likely going to get behind and drop data at some point. This may or may not be the nature of your performance issue, you should nail down your min/max/average system latencies. Again, I suggest a hardwired UART connection, PING message/response, pulse I/O ports on key events and gather data. You might also check your UART hardware for RX overrun indications and make this visible if it occurs. All functions you call from your main loop should be stateful and non-blocking. By this I mean that they should never spin waiting for an event. There should be a state associated with the event and in that state the function should test/exit, test/exit, ... until the event fires and you advance to the next state. Stateful and non-blocking. Sadly I cannot argue that any of this will fix your problem, these are general design practices that work well. With regard to a change you might make, you could consider sending RX addr first, using this as a unique SOM (start of message) in each receiver. The problem you have is that you may see SOM elsewhere in your data and be fooled. Assuming your transmit side doesn't produce dead time between characters, re-starting packet reception after a several character time gap in RX data should help with this. Basically you want to detect SOM after a gap in RX data, and ignore it in back-to-back RX character data. That is essentially what the MODBUS serial standard does, and it requires no gaps in the transmitted character stream. You have some interesting rationalizations for not using a compiler in 2019! The Microsoft failures you mention have everything to do with product complexity and verification, and little if anything to do with use of a compiler. If Microsoft wrote everything in assembler they would be worse off in most every imaginable way: cost, schedule, quality, features, maintainability, you name it. If you tried to keep any one of cost, schedule or quality where it is now, the other two would suffer dramatically writing everything in assembler. I'm not suggesting compiler bugs don't exist, they do. In 6 years I managed to trip over one SDCC bug, I fooled its dependency analysis by trying to be clever with assignment order and save a line of source code. Commercial offerings like Keil and IAR are typically going to be better in terms of quality and correctness of generated code, and I suspect GCC is closer to Keil and IAR. I don't understand your comment about your assembler code being "0% hackable." Your attack surface is pretty much the same, particularly in an embedded device, whether your image is assembled and linked from hand-generated vs. compiler-generated instruction sequences. If you chose the AT89S52 because it comes in a DIP package, I'm not sure what your other device options are. If that is not a constraint you might consider the Silicon Labs C8051 devices for future projects. They have a wide variety of offerings with lots of memory resource on die. Many variants offer 2k to 8k of XRAM (indirect addressing mode) in addition to the standard 256 bytes of RAM. Documentation is great, lots of example software, free tool chains (Keil, SDCC) and IDEs, and inexpensive development kits with power supply, programming adapter and USB cables included. It's nice not having your hands tied with resource and tool chain constraints. [ - ] Reply by January 2, 2019 I did order some AT89S8253's online which seems to have an X2 option (double speed) but I'll have to wait a while for them. In the meantime, could I get a speed boost if I put the above serial routine in a separate microcontroller and have it buffer only the data then it will then become the serial port that only exchanges the data with the system? I'll try and see. [ - ] Reply by January 2, 2019 In the midst of trying super compressed interrupts... Ok, so I was beginning to modify my code to shorten time spent in interrupts and after calculating things, I was able to shorten the time down by half. Now If I continue with this interrupt approach that someone suggested, I probably do not have enough free memory left to process them all. This is because I need two pointers. One as the "head" and the other as a "tail". (That's what the guy at pjrc.com calls them: https://www.pjrc.com/tech/8051/uartintr.html). Basically the head pointer advances to the next free memory space in the circular buffer after a character is loaded in and the tail is advanced when the user routine wants the character. And for proper alignment (to allow me to use AND logic on pointers), I need to reserve 16 bytes for the incoming buffer plus 2 bytes for pointers. That's only for extracting the characters one at a time from the buffer as the non-priority function needs them. I also need 2 extra variables. One to store the nibble and one to store which byte number is being picked up so that the system knows if the checksum result should be calculated or not. Then I need another buffer of the packet size and fill it with the raw characters as each one is processed. Then finally the checksum is calculated, and if it is correct then the buffer gets copied over to the userspace buffer which the application can use for more processing. Without the userspace buffer, I'd lose characters on the fly. So in the end, I'm looking at the need of 52 bytes of memory, but if I can somehow lower the memory requirement without corrupting data and without making the program slower, I'm open to ideas on that. [ - ] Reply by January 8, 2019 A few things: 1) Matthewbarr is spot-on. He is saying everything I would tell you. This also includes CustomSarge - we are all heavily scarred veterans of the embedded wars. 2) you don't need pointers - indexes work on circular buffers. And - if you put the buffer on the right boundary, and a power of 2 length (say 16 bytes on a 16-byte boundary, you can do a modulo (mask) for the head and tail indexes to save code space. 3) You can process the buffer on the fly - reading the input buffer - and use the results only for well-formed packets. Basically, if the head & tail indexes are !=, you have a byte. No need for a separate buffer. This appears to be what you wee doing in an ISR, but it gives you a lot more flexibility. Divide and conquer really helps and keeps your code simple. 4) You may be locked into an 8051 for this project. Seriously look for alternatives in future projects. I happen to be an Atmel freak - they have a lot of good tools && are competitive. Good peripherals. And - Arduinos use them, so you can get hardware that makes prototyping much easier with built-in serial and USB.You can program in C (woowoo). [ - ] Reply by January 8, 2019 Echoing @matthewbarr: you really need an oscope to be in this biz. The two best tools I own is my Estwing 20oz framing hammer and my Tek scope. I highly recommend the Rigol DS1054Z https://www.amazon.com/Rigol-DS1054Z-Digital-Oscil... It has the same basic capability as my (10+yo) Tek 2024B2400 scope. I have used the Rigol && was very impressed. When my Tek gives out, or I decide I need a second scope, it is at the head of the queue.

Pro tip: always go for as many oscope inputs - 4 is way better than 2.

You will not hear it from me there is a tool (ask Mr Google) that will convert the 50MHz to 100MHz...

There is an old trick with a spare pin - assert it at the start of something, and deassert at the end && measure the time with the scope. Useful for debugging ISRs. Also, if you really need to optimize a chunk of code, you can toggle the pin for different sections to see the timing for each section.

(You should always have at least one spare pin, preferably several. Bring them out to berg headers.

Also - get a case with the foam cubes to store/carry the scope in. Roughly \$50 for the low end, or go for the gusto && get a Pelican (or clone). Make sure you have enough room for the cables/probes/power cord/etc.

[ - ]

If I've understood the problem correctly:

The master can send out a command packet which instructs all the slaves to send data at specified delay times from the command packet. You can use a bit field  to specify which of the slaves should speak (4 bytes for that) and a further byte to say how much delay per active slave.

This way you need about half the bandwidth.

MK

[ - ]

Couple things: You can compress the addresses into 1 byte - MSb is master, bits 6-0 are slave. Many protocols are possible, but 2 easy ones:

Proto 1: master->slave1, slave1>master, slave2 delay then slave2->master, slave3 delay then slave3->master etc. A poll->response round robin for slaves all having equal tasks and response frequency.

Proto 2: master->slaveN, slaveN->master etc. A poll->response directly addressed for slaves having different tasks requiring different polling frequencies.

Neither has any bearing on data length: short, long, a "type" or # in the header. Totally dependent on system behavior. But, both require packet and/or byte timeout with CRC check and a "Nak" sent during the responder delay signalling retransmission required. The delay timeout without Nak infers good transfer.

I built a system of lifts requiring syncing of heights in both up/down. It was like Proto 1 except the master/slave addressing was only at initialization, any unit could command a move thereafter. During a move, active units sent their position in unit address order, all units kept track of all positions. Wired at first, then wireless using spread-spectrum FM, from 1 to 16 units. This was several yeas ago, but the radios worked fine, even with arc welders and other machinery nearby. Bit rate was 115.2k, but developed at 9.6k, then 28.8k.

You may find the need to streamline packet processing in assembler. They can be called functions or inline. My system was all assembler ~4900 lines.

[ - ]

After further research before arriving here again, I will try some ideas from the modbus protocol (http://modbus.org/specs.php) especially the use of a unique character as the start of stream flag.

[ - ]