I'm trying to implement a simple protocol for a point-to-point full-duplex serial link. It could be a reliable link, such as a connection between two near MCUs on the same PCB, or a noisy link, such as RF link. The application layer should send and receive generic messages: -> How are you? <- I'm fine, and you? -> That's ok here. The above example is a half-duplex protocol, but the link is full-duplex and the messages could be transmitted anytime. I'd like to isolate the reliability feature to lower protocols (transport, network, link layers), as TCP guarantees a connection-oriented session to the application layer. Even with a very reliable connection, I have to face the event of some error during transmission. This brings to implement the mechanism of acks and retransmissions. If the sender doesn't receive one ack in a certain timeout, it sends again the packet. But I can't retransmit the naked packet as is, because it could be received twice (imagine what happens if the message is "charge the bank account for 1000USD"). So I looked at sequence numbers mechanism: every message is marked with a different number so the receiver can detect duplicated packets and, in case, ack them again (but don't process them another time). One standard protocol with those features is HDLC in Asyncronous Balanced Mode (ABM). It defines a good asyncronous serial framing (similar to SLIP) *and* introduces tx and rx sequence numbers for every type-I frames. Of course, it is very similar to TCP that uses sequence and acknowledgment number. Now the big question: how the sender can be *always* sure that the message has really arrived (then processed) by the receiver? Of course, if the sender receives the ack for the frame of interest, it can be sure the message has arrived and processed. But what happens if the sender doesn't receive the ack, because it is transmitted with errors? It can sends without problem the message again, indeed the receiver will detect the duplicate message and sends again the ack for that frame (without processing the message another time). And what happens if the sender doesn't receiver second, third, ... ack? This could happen for example when an intermediate router/forwarder/hub has been powered off or the receiever has physically disconnected from the link. Maybe the message has arrived (and processed) just before the connection trouble, so the ack will never arrive to the sender. In this odd case, the sender can't be 100% sure if its message has arrived. What is the solution for this situation? This scenario is very similar with actual TCP/IP network. As said before, TCP is a protocol that implements acks, retransmissions and sequence numbers. After a TCP connection has established, the two hosts start exchanging data. At some time, something goes wrong and one host doesn't know if the last message has arrived or not to destination. I'm sure a simple solution exists, because TCP/IP is now used for many applications, even critical (on-line banking and similar things), but I can't find it myself.
Simple BUT reliable serial protocol
Started by ●January 3, 2016
Reply by ●January 3, 20162016-01-03
On 04/01/16 01:01, pozz wrote:> Now the big question: how the sender can be *always* sure that the message has > really arrived (then processed) by the receiver?See https://en.wikipedia.org/wiki/Two_Generals%27_Problem
Reply by ●January 3, 20162016-01-03
pozz wrote:> I'm trying to implement a simple protocol for a point-to-point > full-duplex serial link. It could be a reliable link, such as a > connection between two near MCUs on the same PCB, or a noisy link, such > as RF link. > > The application layer should send and receive generic messages: > -> How are you? > <- I'm fine, and you? > -> That's ok here. > The above example is a half-duplex protocol, but the link is full-duplex > and the messages could be transmitted anytime. > > I'd like to isolate the reliability feature to lower protocols > (transport, network, link layers), as TCP guarantees a > connection-oriented session to the application layer. >Does "as" mean " in the same manner that"? Congratulations. You can use SLIP over a serial port and reuse TCP wholesale. The something like ":<message>,<crc>;" can be your application transport protocol, for ready parsing. Use a \: , \, and \; for escaping those characters. Or constrain messages to ASCII and use STX, ETX and other < 0x20 characters as delimiters. Or something else. Or you could do without TCP and use the same basic mechanism.> Even with a very reliable connection, I have to face the event of some > error during transmission. This brings to implement the mechanism of > acks and retransmissions. If the sender doesn't receive one ack in a > certain timeout, it sends again the packet. >Don't forget NACK, WACK ( wait ACK ) sequences... When you send a WACK, and the processing is done for it, send an ACK for that message. Or don't.> But I can't retransmit the naked packet as is, because it could be > received twice (imagine what happens if the message is "charge the bank > account for 1000USD"). >So now we're up to: :<sequence number>!<message>,<crc>; Or make it such that duplicate messages don't matter.> So I looked at sequence numbers mechanism: every message is marked with > a different number so the receiver can detect duplicated packets and, in > case, ack them again (but don't process them another time). > > One standard protocol with those features is HDLC in Asyncronous > Balanced Mode (ABM). It defines a good asyncronous serial framing > (similar to SLIP) *and* introduces tx and rx sequence numbers for every > type-I frames. > Of course, it is very similar to TCP that uses sequence and > acknowledgment number. > > > Now the big question: how the sender can be *always* sure that the > message has really arrived (then processed) by the receiver? >he can't. See also "The Two Generals Problem". He can, however, be assured that he'll receive an ACK ) or NACK, of WACK ) within <n> seconds of transmission.> Of course, if the sender receives the ack for the frame of interest, it > can be sure the message has arrived and processed. > But what happens if the sender doesn't receive the ack, because it is > transmitted with errors?Have the receiver send NAK sequences when an error is detected. It's your call whether to try to use a sequence number in NACK sequences.> It can sends without problem the message again, indeed the receiver will > detect the duplicate message and sends again the ack for that frame > (without processing the message another time).Indeed; or with processing if that's okay at the application level.> And what happens if the sender doesn't receiver second, third, ... ack? > This could happen for example when an intermediate router/forwarder/hub > has been powered off or the receiever has physically disconnected from > the link.It's often useful to have a state of "oh, the line is dead" based on the timeout.> Maybe the message has arrived (and processed) just before the connection > trouble, so the ack will never arrive to the sender. > > In this odd case, the sender can't be 100% sure if its message has > arrived. What is the solution for this situation? >Other than repealing the second law of thermodynamics, there isn't one.> This scenario is very similar with actual TCP/IP network. As said > before, TCP is a protocol that implements acks, retransmissions and > sequence numbers. > After a TCP connection has established, the two hosts start exchanging > data. At some time, something goes wrong and one host doesn't know if > the last message has arrived or not to destination. >Correct.> I'm sure a simple solution exists, because TCP/IP is now used for many > applications, even critical (on-line banking and similar things), but I > can't find it myself.https://en.wikipedia.org/wiki/Two_Generals'_Problem -- Les Cargill
Reply by ●January 4, 20162016-01-04
On Mon, 4 Jan 2016 01:12:59 +0000, Tom Gardner <spamjunk@blueyonder.co.uk> wrote:>On 04/01/16 01:01, pozz wrote: >> Now the big question: how the sender can be *always* sure that the message has >> really arrived (then processed) by the receiver? > >See https://en.wikipedia.org/wiki/Two_Generals%27_ProblemAnd a partial solution is something like the two-phase commit protocol (also well described in the obviously named Wikipedia article), but it's not 100% (it may require administrator intervention to resolve certain failures), but that's how distributed databases maintain coherency over unreliable links.
Reply by ●January 4, 20162016-01-04
Il 04/01/2016 03:25, Les Cargill ha scritto:> pozz wrote: >> I'm trying to implement a simple protocol for a point-to-point >> full-duplex serial link. It could be a reliable link, such as a >> connection between two near MCUs on the same PCB, or a noisy link, such >> as RF link. >> >> The application layer should send and receive generic messages: >> -> How are you? >> <- I'm fine, and you? >> -> That's ok here. >> The above example is a half-duplex protocol, but the link is full-duplex >> and the messages could be transmitted anytime. >> >> I'd like to isolate the reliability feature to lower protocols >> (transport, network, link layers), as TCP guarantees a >> connection-oriented session to the application layer. >> > > Does "as" mean " in the same manner that"?Yes, sorry for my poor English.> Congratulations. You can use SLIP over a serial port and > reuse TCP wholesale.I was thinking to use the full TCP/IP lwip stack for this simple problem: reliable full-duplex point-to-point protocol. lwip comes with SLIP and TCP and is highly customizable. I gave it a try and, of course, it works well... but after that I think it is too complicated for my very simple task. First of all, the lengths of frames are at the minimum 40-50bytes, without payload (TCP header + IP header + SLIP framing). IP packets have 32-bits source and destination address, TCP segments have 16-bits source and destination ports. For a simple point-to-point link it seems too complicated. IP packets and TCP segments have 16-bits checksum. lwip needs a dynamic memory allocator, even if it could be very simple. Finally I think I need another solution, similar to TCP in many aspects but without many of its features (port multiplexing, IP layer, ...)> The something like ":<message>,<crc>;" can be your application transport > protocol, for ready parsing. Use a \: , \, and \; for > escaping those characters. > > Or constrain messages to ASCII and use STX, ETX and other < 0x20 > characters as delimiters. Or something else. > > Or you could do without TCP and use the same basic mechanism.Indeed it is the way I think I will follow. It is very strange to me there isn't a ready-to-use or standard protocol to reuse.>> Even with a very reliable connection, I have to face the event of some >> error during transmission. This brings to implement the mechanism of >> acks and retransmissions. If the sender doesn't receive one ack in a >> certain timeout, it sends again the packet. > > Don't forget NACK, WACK ( wait ACK ) sequences... When you send a WACK, > and the processing is done for it, send an ACK for that message. Or don't.Are you talking about application or transport protocol? Here I'm interested in *transport protocol* (that doesn't know anything regarding application protocol). Transport SHOULD guarantee a reliable connection with two hosts. If a frame is received, the receiver MUST send an ACK to the sender (never a NACK). For me ACK means "ok, I received the frame" and not "I received the frame, but the message is syntactically wrong or the command received can't be processed". Those kind of things are at the application layer. Anyway NACK can be used in transport protocol when a frame is received with a non sequential sequence number, maybe because the previous message was lost. Here NAK can be used to speed up the retransmission of a lost frame. But the same result can be obtained without NACK, simply the receiver silently drops all the frames received with a non sequential sequence number. The sender retransmits the un-acked frames when a timeout expires. Similar things for WACK. If it is used for application ("hey, this commands that too time, please wait"), it should be moved to another protocol layer. Anyway it can be used in transport protocol for flow control ("hey, I received last frame, but you are too fast for me. Please stop transmitting now, I'll tell you when you can continue again").>> But I can't retransmit the naked packet as is, because it could be >> received twice (imagine what happens if the message is "charge the bank >> account for 1000USD"). > > So now we're up to: > :<sequence number>!<message>,<crc>;Yes.> Or make it such that duplicate messages don't matter.Yes, it's possible to design an *application* protocol with messages that could be duplicated without any side effect (for example, all the messages are "set the room temperature to 20�C" or "get the voltage measure"). Howevere in this case you are designing a transport protocol keeping in mind what is the application protocol that will be used. Maybe tomorrow you will want to change the application protocol or use another one and a conflict with the transport protocol could arise.>> So I looked at sequence numbers mechanism: every message is marked with >> a different number so the receiver can detect duplicated packets and, in >> case, ack them again (but don't process them another time). >> >> One standard protocol with those features is HDLC in Asyncronous >> Balanced Mode (ABM). It defines a good asyncronous serial framing >> (similar to SLIP) *and* introduces tx and rx sequence numbers for every >> type-I frames. >> Of course, it is very similar to TCP that uses sequence and >> acknowledgment number. >> >> >> Now the big question: how the sender can be *always* sure that the >> message has really arrived (then processed) by the receiver? >> > > he can't. See also "The Two Generals Problem". He can, however, be > assured that he'll receive an ACK ) or NACK, of WACK ) within <n> > seconds of transmission.Now I understand why I didn't find a solution to my problem... it doesn't have a real solution.>> Of course, if the sender receives the ack for the frame of interest, it >> can be sure the message has arrived and processed. >> But what happens if the sender doesn't receive the ack, because it is >> transmitted with errors? > > Have the receiver send NAK sequences when an error is detected. It's > your call whether to try to use a sequence number in NACK sequences.Here the problem is with sender that doesn't receive the ack from the receiver. It isn't an error detected by the receiver.>> It can sends without problem the message again, indeed the receiver will >> detect the duplicate message and sends again the ack for that frame >> (without processing the message another time). > > Indeed; or with processing if that's okay at the application level. > >> And what happens if the sender doesn't receiver second, third, ... ack? >> This could happen for example when an intermediate router/forwarder/hub >> has been powered off or the receiever has physically disconnected from >> the link. > > It's often useful to have a state of "oh, the line is dead" based > on the timeout.Of course, yes. However what happened to my last message, before the link died?>> Maybe the message has arrived (and processed) just before the connection >> trouble, so the ack will never arrive to the sender. >> >> In this odd case, the sender can't be 100% sure if its message has >> arrived. What is the solution for this situation? >> > > Other than repealing the second law of thermodynamics, there isn't one. > >> This scenario is very similar with actual TCP/IP network. As said >> before, TCP is a protocol that implements acks, retransmissions and >> sequence numbers. >> After a TCP connection has established, the two hosts start exchanging >> data. At some time, something goes wrong and one host doesn't know if >> the last message has arrived or not to destination. > > Correct. > >> I'm sure a simple solution exists, because TCP/IP is now used for many >> applications, even critical (on-line banking and similar things), but I >> can't find it myself. > > https://en.wikipedia.org/wiki/Two_Generals'_ProblemOk, thank you for your observations and suggestions. Now I know a good solution doesn't exist. At last, I think I will try to implement a simple HDLC ABM protocol for my needs.
Reply by ●January 4, 20162016-01-04
Il 04/01/2016 06:43, Robert Wessel ha scritto:> On Mon, 4 Jan 2016 01:12:59 +0000, Tom Gardner > <spamjunk@blueyonder.co.uk> wrote: > >> On 04/01/16 01:01, pozz wrote: >>> Now the big question: how the sender can be *always* sure that the message has >>> really arrived (then processed) by the receiver? >> >> See https://en.wikipedia.org/wiki/Two_Generals%27_Problem > > > And a partial solution is something like the two-phase commit protocol > (also well described in the obviously named Wikipedia article), but > it's not 100% (it may require administrator intervention to resolve > certain failures), but that's how distributed databases maintain > coherency over unreliable links.Of course, it is too complicated for simple serial links, maybe between a PC and an embedded board. Fortunately I don't work for medical and mission-critical applications. Anyway I'm curious how those kind of problems are solved in those applications where I can't accept *any* error.
Reply by ●January 4, 20162016-01-04
On 04/01/16 09:29, pozz wrote:> Il 04/01/2016 06:43, Robert Wessel ha scritto: >> On Mon, 4 Jan 2016 01:12:59 +0000, Tom Gardner >> <spamjunk@blueyonder.co.uk> wrote: >> >>> On 04/01/16 01:01, pozz wrote: >>>> Now the big question: how the sender can be *always* sure that the message has >>>> really arrived (then processed) by the receiver? >>> >>> See https://en.wikipedia.org/wiki/Two_Generals%27_Problem >> >> >> And a partial solution is something like the two-phase commit protocol >> (also well described in the obviously named Wikipedia article), but >> it's not 100% (it may require administrator intervention to resolve >> certain failures), but that's how distributed databases maintain >> coherency over unreliable links. > > Of course, it is too complicated for simple serial links, maybe between a PC and > an embedded board. > > > Fortunately I don't work for medical and mission-critical applications. Anyway > I'm curious how those kind of problems are solved in those applications where I > can't accept *any* error.That's a good question and attitude :) Not all problems can be solved. A classic example is synchroniser metastability. The best that can be achieved is to reduce the probability of failure to an acceptable level. That, of course, requires that all the implementation details details are correct and that they match the design presumptions - which isn't easy. Another example is two-phase transaction commit protocols, which also have a finite failure probability. That probability can be reduced by three-phase commit protocols, and I think you can guess the next step :) In practice it would be wise to assume that failures will occur and that there are other mechanisms to detect and recover from failures. Alternatively architect the system so that failures are tolerated; they will often have other system advantages. Always bear in mind the "eight fallacies of distributed computing", https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing and that "a distributed system is one where your application can be broken by a failure in a computer that you didn't know existed"
Reply by ●January 4, 20162016-01-04
On 04/01/16 02:25, Les Cargill wrote:> It's often useful to have a state of "oh, the line is dead" based > on the timeout.Yes. Especially if that results in the explicit design of an FSM, with a corresponding simple, easily readable and easily modifiable implementation. The latter should strongly shape the implementation techniques, because it is all too common that "doing the simplest thing" at each modification leads to an unmaintainable ball of string.
Reply by ●January 4, 20162016-01-04
On Mon, 04 Jan 2016 02:01:37 +0100, pozz wrote:> Now the big question: how the sender can be *always* sure that the message > has really arrived (then processed) by the receiver?By waiting for an ACK; forever if necessary.> Of course, if the sender receives the ack for the frame of interest, it > can be sure the message has arrived and processed. But what happens if the > sender doesn't receive the ack, because it is transmitted with errors? > It can sends without problem the message again, indeed the receiver will > detect the duplicate message and sends again the ack for that frame > (without processing the message another time). And what happens if the > sender doesn't receiver second, third, ... ack?It just keeps sending.> This could happen for > example when an intermediate router/forwarder/hub has been powered off or > the receiever has physically disconnected from the link. > Maybe the message has arrived (and processed) just before the connection > trouble, so the ack will never arrive to the sender. > > In this odd case, the sender can't be 100% sure if its message has > arrived. What is the solution for this situation?If an ACK arrives, the data was received. If no ACK arrives, you don't know whether it was received so you assume that it wasn't and resend eventually. You cannot be certain that data *wasn't* received; the lack of an ACK can be caused either by the data getting lost or by the ACK getting lost. If the remote system suddenly goes silent and stays silent, it's impossible to distinguish beteween received-but-not-ACK'd and not-received data.> I'm sure a simple solution exists, because TCP/IP is now used for many > applications, even critical (on-line banking and similar things), but I > can't find it myself.The simple solution is not to give up. Note that TCP will give up eventually; this is required e.g. for the situation where the remote computer gets unplugged, tossed in the dumpster, and never replaced. No protocol can fix that.
Reply by ●January 4, 20162016-01-04
Il giorno luned� 4 gennaio 2016 02:01:44 UTC+1, pozz ha scritto:> I'm sure a simple solution exists, because TCP/IP is now used for many > applications, even critical (on-line banking and similar things), but I > can't find it myself.If you can set a MCU as master and the other as slave, I find that MODBUS RTU is simple and robust enough for most of the communication that small MCU need to do. There are of course limits: a MCU is the master the other(s) are slave: itmeans that one ask the other(s) answer. No communication can be initiated from slaves. It measn that the communication is half-duplex (even if you use two lines, rx and tx). Bye Jack







