EmbeddedRelated.com
Forums
The 2026 Embedded Online Conference

Simple BUT reliable serial protocol

Started by pozz January 3, 2016
On 14/01/16 10:17, Vincent vB wrote:
> Op 14-1-2016 om 10:06 wrokte Clifford Heath: >> On 14/01/16 19:53, Robert Wessel wrote: >>>> Its quite simple to add an message counter. Each time a message is send >>>> with a counter value, the remote side has to ack this with the same >>>> counter value. If no ack is received within a certain time, the message >>>> is send again. >> >> Or, like TCP, it can wait a short period to see if further >> messages are closely following, and just ack the highest >> sequence number that forms a continuous sequence. Fewer >> ACKs are needed that way. >> >> TCP also piggy-backs bi-directional ACKs on outgoing data >> frames, which saves frames. > > Yes, of course that is possible. I have actually implemented something similar, > but the ACK piggy-backs the remote side. > > >> >>> Which doesn't solve the problem of the remote side receiving and >>> processing the message, but going offline before the ACK can be sent >>> back (perhaps the ACK was half-way out the network card at the instant >>> the backhoe hit the cable). In that case the local node can >>> retransmit all it wants, but it will presumably eventually give up and >>> assume incorrectly that the remote side did *not* process the message. >> >> Why should it make that assumption? It might equally wrongly >> make the opposite assumption. Until a mutual closing handshake >> has been completed there is no certainty. > > There is no way to know. Message ordinals/acks and CRCs/Checksums will make your > connection quite reliable. The idea is to minimize the types and rate of errors > which can occur, and to design the higher layer of the software such that it can > deal with special circumstances. > >> >> And even then, the change of state at one end might not have >> been persisted properly, and lost during a reboot. >> >> It's important to design protocols based on "desired state" >> where possible, and not on "state change" requests. > > That would do it. If messages are lost, and connection is regained, verify the > state of the remote system and update where desired. Its all a matter of careful > thinking.
If you think you have found a solution to the "two generals" problem, then you should publish it and you will be famous. I *strongly* recommend you read Leslie Lamport's seminal papers from the late 70s and 80s. Many people have inadvertently re-invented TCP, poorly. Many have tried to improve on TCP, a few have succeeded in special cases, but most have failed. The onus is on you to demonstrate the advantages of any home-brew protocol over TCP - and state the limitations.
On 14-1-2016 om 13:11 Tom Gardner wrote:
> > If you think you have found a solution to the "two > generals" problem, then you should publish it and you > will be famous. I *strongly* recommend you read Leslie > Lamport's seminal papers from the late 70s and 80s. > > Many people have inadvertently re-invented TCP, poorly. > > Many have tried to improve on TCP, a few have succeeded > in special cases, but most have failed. > > The onus is on you to demonstrate the advantages of any > home-brew protocol over TCP - and state the limitations. > >
I did not make such claims. I never said I implemented such a system instead of TCP/IP. If you need a reliable bidirectional communication on an channel, which may not be an IP network at all, it is possible to do it like this. If there is a possibility of using TCP/IP, of course you should use it.
On 14/01/16 12:46, Vincent vB wrote:
> On 14-1-2016 om 13:11 Tom Gardner wrote: >> >> If you think you have found a solution to the "two >> generals" problem, then you should publish it and you >> will be famous. I *strongly* recommend you read Leslie >> Lamport's seminal papers from the late 70s and 80s. >> >> Many people have inadvertently re-invented TCP, poorly. >> >> Many have tried to improve on TCP, a few have succeeded >> in special cases, but most have failed. >> >> The onus is on you to demonstrate the advantages of any >> home-brew protocol over TCP - and state the limitations. >> >> > > I did not make such claims.
Fair enough; my misapprehension.
> I never said I implemented such a system instead of > TCP/IP. If you need a reliable bidirectional communication on an channel, which > may not be an IP network at all, it is possible to do it like this.
The TCP protocol does not /require/ IP. There are examples of it running on morse, icmp, Infiniband, ipx and probably avian carriers.
> If there is a possibility of using TCP/IP, of course you should use it.
Not everyone realises that :(
Robert Wessel wrote:
> On Thu, 14 Jan 2016 09:01:03 +0100, Vincent vB <embedded@spam.com> > wrote: > >> Op 4-1-2016 om 2:01 schreef pozz: >>> I'm trying to implement a simple protocol for a point-to-point >>> full-duplex serial link. It could be a reliable link, such as a >>> connection between two near MCUs on the same PCB, or a noisy link, such >>> as RF link. >>> >>> The application layer should send and receive generic messages: >>> -> How are you? >>> <- I'm fine, and you? >>> -> That's ok here. >>> The above example is a half-duplex protocol, but the link is full-duplex >>> and the messages could be transmitted anytime. >>> >>> I'd like to isolate the reliability feature to lower protocols >>> (transport, network, link layers), as TCP guarantees a >>> connection-oriented session to the application layer. >>> >>> Even with a very reliable connection, I have to face the event of some >>> error during transmission. This brings to implement the mechanism of >>> acks and retransmissions. If the sender doesn't receive one ack in a >>> certain timeout, it sends again the packet. >>> >>> But I can't retransmit the naked packet as is, because it could be >>> received twice (imagine what happens if the message is "charge the bank >>> account for 1000USD"). >>> >>> So I looked at sequence numbers mechanism: every message is marked with >>> a different number so the receiver can detect duplicated packets and, in >>> case, ack them again (but don't process them another time). >>> >>> One standard protocol with those features is HDLC in Asyncronous >>> Balanced Mode (ABM). It defines a good asyncronous serial framing >>> (similar to SLIP) *and* introduces tx and rx sequence numbers for every >>> type-I frames. >>> Of course, it is very similar to TCP that uses sequence and >>> acknowledgment number. >>> >>> >>> Now the big question: how the sender can be *always* sure that the >>> message has really arrived (then processed) by the receiver? >>> >>> Of course, if the sender receives the ack for the frame of interest, it >>> can be sure the message has arrived and processed. >>> But what happens if the sender doesn't receive the ack, because it is >>> transmitted with errors? >>> It can sends without problem the message again, indeed the receiver will >>> detect the duplicate message and sends again the ack for that frame >>> (without processing the message another time). >>> And what happens if the sender doesn't receiver second, third, ... ack? >>> This could happen for example when an intermediate router/forwarder/hub >>> has been powered off or the receiever has physically disconnected from >>> the link. >>> Maybe the message has arrived (and processed) just before the connection >>> trouble, so the ack will never arrive to the sender. >>> >>> In this odd case, the sender can't be 100% sure if its message has >>> arrived. What is the solution for this situation? >>> >>> This scenario is very similar with actual TCP/IP network. As said >>> before, TCP is a protocol that implements acks, retransmissions and >>> sequence numbers. >>> After a TCP connection has established, the two hosts start exchanging >>> data. At some time, something goes wrong and one host doesn't know if >>> the last message has arrived or not to destination. >>> >>> I'm sure a simple solution exists, because TCP/IP is now used for many >>> applications, even critical (on-line banking and similar things), but I >>> can't find it myself. >> >> Its quite simple to add an message counter. Each time a message is send >> with a counter value, the remote side has to ack this with the same >> counter value. If no ack is received within a certain time, the message >> is send again. The remote side keeps track of the last received counter >> value. If that value is received again, the message is not processed, >> but another ack is send. >> >> I have implemented this scheme over UDP, but with framing and a >> checksum/crc it could be applied to a serial communication stream as well. > > > Which doesn't solve the problem of the remote side receiving and > processing the message, but going offline before the ACK can be sent > back (perhaps the ACK was half-way out the network card at the instant > the backhoe hit the cable). In that case the local node can > retransmit all it wants, but it will presumably eventually give up and > assume incorrectly that the remote side did *not* process the message. >
You have to be able to query the far node for state after the line drops and comes back up. Even better ( if you have the bandwidth for it), the far node asynchronously sends state periodically when the line is up -- Les Cargill
Tom Gardner wrote:
> On 14/01/16 10:17, Vincent vB wrote: >> Op 14-1-2016 om 10:06 wrokte Clifford Heath: >>> On 14/01/16 19:53, Robert Wessel wrote: >>>>> Its quite simple to add an message counter. Each time a message is >>>>> send >>>>> with a counter value, the remote side has to ack this with the same >>>>> counter value. If no ack is received within a certain time, the >>>>> message >>>>> is send again. >>> >>> Or, like TCP, it can wait a short period to see if further >>> messages are closely following, and just ack the highest >>> sequence number that forms a continuous sequence. Fewer >>> ACKs are needed that way. >>> >>> TCP also piggy-backs bi-directional ACKs on outgoing data >>> frames, which saves frames. >> >> Yes, of course that is possible. I have actually implemented something >> similar, >> but the ACK piggy-backs the remote side. >> >> >>> >>>> Which doesn't solve the problem of the remote side receiving and >>>> processing the message, but going offline before the ACK can be sent >>>> back (perhaps the ACK was half-way out the network card at the instant >>>> the backhoe hit the cable). In that case the local node can >>>> retransmit all it wants, but it will presumably eventually give up and >>>> assume incorrectly that the remote side did *not* process the message. >>> >>> Why should it make that assumption? It might equally wrongly >>> make the opposite assumption. Until a mutual closing handshake >>> has been completed there is no certainty. >> >> There is no way to know. Message ordinals/acks and CRCs/Checksums will >> make your >> connection quite reliable. The idea is to minimize the types and rate >> of errors >> which can occur, and to design the higher layer of the software such >> that it can >> deal with special circumstances. >> >>> >>> And even then, the change of state at one end might not have >>> been persisted properly, and lost during a reboot. >>> >>> It's important to design protocols based on "desired state" >>> where possible, and not on "state change" requests. >> >> That would do it. If messages are lost, and connection is regained, >> verify the >> state of the remote system and update where desired. Its all a matter >> of careful >> thinking. > > If you think you have found a solution to the "two > generals" problem, then you should publish it and you > will be famous. I *strongly* recommend you read Leslie > Lamport's seminal papers from the late 70s and 80s. > > Many people have inadvertently re-invented TCP, poorly. >
But you may actually be able to shape your own, UDP based ( in color and shape ) solution that will have advantages over TCP. Principally, this has to do with broken connections - TCP takes a very long time to give up. Minutes in cases; at least I have never found a set of config flags for sockets and/or the stack that will make it give up sooner.
> Many have tried to improve on TCP, a few have succeeded > in special cases, but most have failed. > > The onus is on you to demonstrate the advantages of any > home-brew protocol over TCP - and state the limitations. > >
I'd say the nearly universal state of Web protocols over TCP damns it with very faint praise. It sort of flabbergasts me what people are willing to put up with. This is "comp.arch.embedded" and the audience is highly likely to have run into cases where TCP did not perform. It's a lousy metaphor, but I like saying "TCP just sells you life insurance; it does not guarantee immortality." You'll find very frequently that people fail to make this distinction. -- Les Cargill
On 1/14/2016 3:53 AM, Robert Wessel wrote:
> On Thu, 14 Jan 2016 09:01:03 +0100, Vincent vB <embedded@spam.com> > wrote: > >> Op 4-1-2016 om 2:01 schreef pozz: >>> I'm trying to implement a simple protocol for a point-to-point >>> full-duplex serial link. It could be a reliable link, such as a >>> connection between two near MCUs on the same PCB, or a noisy link, such >>> as RF link. >>> >>> The application layer should send and receive generic messages: >>> -> How are you? >>> <- I'm fine, and you? >>> -> That's ok here. >>> The above example is a half-duplex protocol, but the link is full-duplex >>> and the messages could be transmitted anytime. >>> >>> I'd like to isolate the reliability feature to lower protocols >>> (transport, network, link layers), as TCP guarantees a >>> connection-oriented session to the application layer. >>> >>> Even with a very reliable connection, I have to face the event of some >>> error during transmission. This brings to implement the mechanism of >>> acks and retransmissions. If the sender doesn't receive one ack in a >>> certain timeout, it sends again the packet. >>> >>> But I can't retransmit the naked packet as is, because it could be >>> received twice (imagine what happens if the message is "charge the bank >>> account for 1000USD"). >>> >>> So I looked at sequence numbers mechanism: every message is marked with >>> a different number so the receiver can detect duplicated packets and, in >>> case, ack them again (but don't process them another time). >>> >>> One standard protocol with those features is HDLC in Asyncronous >>> Balanced Mode (ABM). It defines a good asyncronous serial framing >>> (similar to SLIP) *and* introduces tx and rx sequence numbers for every >>> type-I frames. >>> Of course, it is very similar to TCP that uses sequence and >>> acknowledgment number. >>> >>> >>> Now the big question: how the sender can be *always* sure that the >>> message has really arrived (then processed) by the receiver? >>> >>> Of course, if the sender receives the ack for the frame of interest, it >>> can be sure the message has arrived and processed. >>> But what happens if the sender doesn't receive the ack, because it is >>> transmitted with errors? >>> It can sends without problem the message again, indeed the receiver will >>> detect the duplicate message and sends again the ack for that frame >>> (without processing the message another time). >>> And what happens if the sender doesn't receiver second, third, ... ack? >>> This could happen for example when an intermediate router/forwarder/hub >>> has been powered off or the receiever has physically disconnected from >>> the link. >>> Maybe the message has arrived (and processed) just before the connection >>> trouble, so the ack will never arrive to the sender. >>> >>> In this odd case, the sender can't be 100% sure if its message has >>> arrived. What is the solution for this situation? >>> >>> This scenario is very similar with actual TCP/IP network. As said >>> before, TCP is a protocol that implements acks, retransmissions and >>> sequence numbers. >>> After a TCP connection has established, the two hosts start exchanging >>> data. At some time, something goes wrong and one host doesn't know if >>> the last message has arrived or not to destination. >>> >>> I'm sure a simple solution exists, because TCP/IP is now used for many >>> applications, even critical (on-line banking and similar things), but I >>> can't find it myself. >> >> Its quite simple to add an message counter. Each time a message is send >> with a counter value, the remote side has to ack this with the same >> counter value. If no ack is received within a certain time, the message >> is send again. The remote side keeps track of the last received counter >> value. If that value is received again, the message is not processed, >> but another ack is send. >> >> I have implemented this scheme over UDP, but with framing and a >> checksum/crc it could be applied to a serial communication stream as well. > > > Which doesn't solve the problem of the remote side receiving and > processing the message, but going offline before the ACK can be sent > back (perhaps the ACK was half-way out the network card at the instant > the backhoe hit the cable). In that case the local node can > retransmit all it wants, but it will presumably eventually give up and > assume incorrectly that the remote side did *not* process the message.
Which is the other half of the two generals problem. -- Rick
On Thu, 14 Jan 2016 12:45:47 -0600, Les Cargill
<lcargill99@comcast.com> wrote:

>Robert Wessel wrote: >> On Thu, 14 Jan 2016 09:01:03 +0100, Vincent vB <embedded@spam.com> >> wrote: >> >>> Op 4-1-2016 om 2:01 schreef pozz: >>>> I'm trying to implement a simple protocol for a point-to-point >>>> full-duplex serial link. It could be a reliable link, such as a >>>> connection between two near MCUs on the same PCB, or a noisy link, such >>>> as RF link. >>>> >>>> The application layer should send and receive generic messages: >>>> -> How are you? >>>> <- I'm fine, and you? >>>> -> That's ok here. >>>> The above example is a half-duplex protocol, but the link is full-duplex >>>> and the messages could be transmitted anytime. >>>> >>>> I'd like to isolate the reliability feature to lower protocols >>>> (transport, network, link layers), as TCP guarantees a >>>> connection-oriented session to the application layer. >>>> >>>> Even with a very reliable connection, I have to face the event of some >>>> error during transmission. This brings to implement the mechanism of >>>> acks and retransmissions. If the sender doesn't receive one ack in a >>>> certain timeout, it sends again the packet. >>>> >>>> But I can't retransmit the naked packet as is, because it could be >>>> received twice (imagine what happens if the message is "charge the bank >>>> account for 1000USD"). >>>> >>>> So I looked at sequence numbers mechanism: every message is marked with >>>> a different number so the receiver can detect duplicated packets and, in >>>> case, ack them again (but don't process them another time). >>>> >>>> One standard protocol with those features is HDLC in Asyncronous >>>> Balanced Mode (ABM). It defines a good asyncronous serial framing >>>> (similar to SLIP) *and* introduces tx and rx sequence numbers for every >>>> type-I frames. >>>> Of course, it is very similar to TCP that uses sequence and >>>> acknowledgment number. >>>> >>>> >>>> Now the big question: how the sender can be *always* sure that the >>>> message has really arrived (then processed) by the receiver? >>>> >>>> Of course, if the sender receives the ack for the frame of interest, it >>>> can be sure the message has arrived and processed. >>>> But what happens if the sender doesn't receive the ack, because it is >>>> transmitted with errors? >>>> It can sends without problem the message again, indeed the receiver will >>>> detect the duplicate message and sends again the ack for that frame >>>> (without processing the message another time). >>>> And what happens if the sender doesn't receiver second, third, ... ack? >>>> This could happen for example when an intermediate router/forwarder/hub >>>> has been powered off or the receiever has physically disconnected from >>>> the link. >>>> Maybe the message has arrived (and processed) just before the connection >>>> trouble, so the ack will never arrive to the sender. >>>> >>>> In this odd case, the sender can't be 100% sure if its message has >>>> arrived. What is the solution for this situation? >>>> >>>> This scenario is very similar with actual TCP/IP network. As said >>>> before, TCP is a protocol that implements acks, retransmissions and >>>> sequence numbers. >>>> After a TCP connection has established, the two hosts start exchanging >>>> data. At some time, something goes wrong and one host doesn't know if >>>> the last message has arrived or not to destination. >>>> >>>> I'm sure a simple solution exists, because TCP/IP is now used for many >>>> applications, even critical (on-line banking and similar things), but I >>>> can't find it myself. >>> >>> Its quite simple to add an message counter. Each time a message is send >>> with a counter value, the remote side has to ack this with the same >>> counter value. If no ack is received within a certain time, the message >>> is send again. The remote side keeps track of the last received counter >>> value. If that value is received again, the message is not processed, >>> but another ack is send. >>> >>> I have implemented this scheme over UDP, but with framing and a >>> checksum/crc it could be applied to a serial communication stream as well. >> >> >> Which doesn't solve the problem of the remote side receiving and >> processing the message, but going offline before the ACK can be sent >> back (perhaps the ACK was half-way out the network card at the instant >> the backhoe hit the cable). In that case the local node can >> retransmit all it wants, but it will presumably eventually give up and >> assume incorrectly that the remote side did *not* process the message. >> > > >You have to be able to query the far node for state after the line drops >and comes back up. Even better ( if you have the bandwidth for it), the >far node asynchronously sends state periodically when the line is up
So the transaction is in an indeterminate state until the connection to the remote node comes back up? Which then begs the question of what state the remote node thinks the transaction is in once it's sent the ack? How will it know if or when the ack is processed by the local node? Now if the remote node is simple a backup for the local node, and it does nothing until some (outside) intervention causes it to take over, the problem is simplified, but like the two generals problem, you often have two nodes controlling different things, and they may need the same view of the state of the overall system.
On 1/14/2016 9:06 PM, Robert Wessel wrote:
> On Thu, 14 Jan 2016 12:45:47 -0600, Les Cargill > <lcargill99@comcast.com> wrote: > >> Robert Wessel wrote: >>> On Thu, 14 Jan 2016 09:01:03 +0100, Vincent vB <embedded@spam.com> >>> wrote: >>> >>>> Op 4-1-2016 om 2:01 schreef pozz: >>>>> I'm trying to implement a simple protocol for a point-to-point >>>>> full-duplex serial link. It could be a reliable link, such as a >>>>> connection between two near MCUs on the same PCB, or a noisy link, such >>>>> as RF link. >>>>> >>>>> The application layer should send and receive generic messages: >>>>> -> How are you? >>>>> <- I'm fine, and you? >>>>> -> That's ok here. >>>>> The above example is a half-duplex protocol, but the link is full-duplex >>>>> and the messages could be transmitted anytime. >>>>> >>>>> I'd like to isolate the reliability feature to lower protocols >>>>> (transport, network, link layers), as TCP guarantees a >>>>> connection-oriented session to the application layer. >>>>> >>>>> Even with a very reliable connection, I have to face the event of some >>>>> error during transmission. This brings to implement the mechanism of >>>>> acks and retransmissions. If the sender doesn't receive one ack in a >>>>> certain timeout, it sends again the packet. >>>>> >>>>> But I can't retransmit the naked packet as is, because it could be >>>>> received twice (imagine what happens if the message is "charge the bank >>>>> account for 1000USD"). >>>>> >>>>> So I looked at sequence numbers mechanism: every message is marked with >>>>> a different number so the receiver can detect duplicated packets and, in >>>>> case, ack them again (but don't process them another time). >>>>> >>>>> One standard protocol with those features is HDLC in Asyncronous >>>>> Balanced Mode (ABM). It defines a good asyncronous serial framing >>>>> (similar to SLIP) *and* introduces tx and rx sequence numbers for every >>>>> type-I frames. >>>>> Of course, it is very similar to TCP that uses sequence and >>>>> acknowledgment number. >>>>> >>>>> >>>>> Now the big question: how the sender can be *always* sure that the >>>>> message has really arrived (then processed) by the receiver? >>>>> >>>>> Of course, if the sender receives the ack for the frame of interest, it >>>>> can be sure the message has arrived and processed. >>>>> But what happens if the sender doesn't receive the ack, because it is >>>>> transmitted with errors? >>>>> It can sends without problem the message again, indeed the receiver will >>>>> detect the duplicate message and sends again the ack for that frame >>>>> (without processing the message another time). >>>>> And what happens if the sender doesn't receiver second, third, ... ack? >>>>> This could happen for example when an intermediate router/forwarder/hub >>>>> has been powered off or the receiever has physically disconnected from >>>>> the link. >>>>> Maybe the message has arrived (and processed) just before the connection >>>>> trouble, so the ack will never arrive to the sender. >>>>> >>>>> In this odd case, the sender can't be 100% sure if its message has >>>>> arrived. What is the solution for this situation? >>>>> >>>>> This scenario is very similar with actual TCP/IP network. As said >>>>> before, TCP is a protocol that implements acks, retransmissions and >>>>> sequence numbers. >>>>> After a TCP connection has established, the two hosts start exchanging >>>>> data. At some time, something goes wrong and one host doesn't know if >>>>> the last message has arrived or not to destination. >>>>> >>>>> I'm sure a simple solution exists, because TCP/IP is now used for many >>>>> applications, even critical (on-line banking and similar things), but I >>>>> can't find it myself. >>>> >>>> Its quite simple to add an message counter. Each time a message is send >>>> with a counter value, the remote side has to ack this with the same >>>> counter value. If no ack is received within a certain time, the message >>>> is send again. The remote side keeps track of the last received counter >>>> value. If that value is received again, the message is not processed, >>>> but another ack is send. >>>> >>>> I have implemented this scheme over UDP, but with framing and a >>>> checksum/crc it could be applied to a serial communication stream as well. >>> >>> >>> Which doesn't solve the problem of the remote side receiving and >>> processing the message, but going offline before the ACK can be sent >>> back (perhaps the ACK was half-way out the network card at the instant >>> the backhoe hit the cable). In that case the local node can >>> retransmit all it wants, but it will presumably eventually give up and >>> assume incorrectly that the remote side did *not* process the message. >>> >> >> >> You have to be able to query the far node for state after the line drops >> and comes back up. Even better ( if you have the bandwidth for it), the >> far node asynchronously sends state periodically when the line is up > > > So the transaction is in an indeterminate state until the connection > to the remote node comes back up? Which then begs the question of > what state the remote node thinks the transaction is in once it's sent > the ack? How will it know if or when the ack is processed by the > local node?
You are asking about the two generals problem. There is always a possible state where the one end doesn't know what state the other is in if a message is lost because it can't know *which* message was lost.
> Now if the remote node is simple a backup for the local node, and it > does nothing until some (outside) intervention causes it to take over, > the problem is simplified, but like the two generals problem, you > often have two nodes controlling different things, and they may need > the same view of the state of the overall system. >
-- Rick
Robert Wessel wrote:
> On Thu, 14 Jan 2016 12:45:47 -0600, Les Cargill > <lcargill99@comcast.com> wrote: > >> Robert Wessel wrote: >>> On Thu, 14 Jan 2016 09:01:03 +0100, Vincent vB <embedded@spam.com> >>> wrote: >>> >>>> Op 4-1-2016 om 2:01 schreef pozz: >>>>> I'm trying to implement a simple protocol for a point-to-point >>>>> full-duplex serial link. It could be a reliable link, such as a >>>>> connection between two near MCUs on the same PCB, or a noisy link, such >>>>> as RF link. >>>>> >>>>> The application layer should send and receive generic messages: >>>>> -> How are you? >>>>> <- I'm fine, and you? >>>>> -> That's ok here. >>>>> The above example is a half-duplex protocol, but the link is full-duplex >>>>> and the messages could be transmitted anytime. >>>>> >>>>> I'd like to isolate the reliability feature to lower protocols >>>>> (transport, network, link layers), as TCP guarantees a >>>>> connection-oriented session to the application layer. >>>>> >>>>> Even with a very reliable connection, I have to face the event of some >>>>> error during transmission. This brings to implement the mechanism of >>>>> acks and retransmissions. If the sender doesn't receive one ack in a >>>>> certain timeout, it sends again the packet. >>>>> >>>>> But I can't retransmit the naked packet as is, because it could be >>>>> received twice (imagine what happens if the message is "charge the bank >>>>> account for 1000USD"). >>>>> >>>>> So I looked at sequence numbers mechanism: every message is marked with >>>>> a different number so the receiver can detect duplicated packets and, in >>>>> case, ack them again (but don't process them another time). >>>>> >>>>> One standard protocol with those features is HDLC in Asyncronous >>>>> Balanced Mode (ABM). It defines a good asyncronous serial framing >>>>> (similar to SLIP) *and* introduces tx and rx sequence numbers for every >>>>> type-I frames. >>>>> Of course, it is very similar to TCP that uses sequence and >>>>> acknowledgment number. >>>>> >>>>> >>>>> Now the big question: how the sender can be *always* sure that the >>>>> message has really arrived (then processed) by the receiver? >>>>> >>>>> Of course, if the sender receives the ack for the frame of interest, it >>>>> can be sure the message has arrived and processed. >>>>> But what happens if the sender doesn't receive the ack, because it is >>>>> transmitted with errors? >>>>> It can sends without problem the message again, indeed the receiver will >>>>> detect the duplicate message and sends again the ack for that frame >>>>> (without processing the message another time). >>>>> And what happens if the sender doesn't receiver second, third, ... ack? >>>>> This could happen for example when an intermediate router/forwarder/hub >>>>> has been powered off or the receiever has physically disconnected from >>>>> the link. >>>>> Maybe the message has arrived (and processed) just before the connection >>>>> trouble, so the ack will never arrive to the sender. >>>>> >>>>> In this odd case, the sender can't be 100% sure if its message has >>>>> arrived. What is the solution for this situation? >>>>> >>>>> This scenario is very similar with actual TCP/IP network. As said >>>>> before, TCP is a protocol that implements acks, retransmissions and >>>>> sequence numbers. >>>>> After a TCP connection has established, the two hosts start exchanging >>>>> data. At some time, something goes wrong and one host doesn't know if >>>>> the last message has arrived or not to destination. >>>>> >>>>> I'm sure a simple solution exists, because TCP/IP is now used for many >>>>> applications, even critical (on-line banking and similar things), but I >>>>> can't find it myself. >>>> >>>> Its quite simple to add an message counter. Each time a message is send >>>> with a counter value, the remote side has to ack this with the same >>>> counter value. If no ack is received within a certain time, the message >>>> is send again. The remote side keeps track of the last received counter >>>> value. If that value is received again, the message is not processed, >>>> but another ack is send. >>>> >>>> I have implemented this scheme over UDP, but with framing and a >>>> checksum/crc it could be applied to a serial communication stream as well. >>> >>> >>> Which doesn't solve the problem of the remote side receiving and >>> processing the message, but going offline before the ACK can be sent >>> back (perhaps the ACK was half-way out the network card at the instant >>> the backhoe hit the cable). In that case the local node can >>> retransmit all it wants, but it will presumably eventually give up and >>> assume incorrectly that the remote side did *not* process the message. >>> >> >> >> You have to be able to query the far node for state after the line drops >> and comes back up. Even better ( if you have the bandwidth for it), the >> far node asynchronously sends state periodically when the line is up > > > So the transaction is in an indeterminate state until the connection > to the remote node comes back up? Which then begs the question of > what state the remote node thinks the transaction is in once it's sent > the ack? How will it know if or when the ack is processed by the > local node? >
I would have the both nodes query for the ID of the last completed transaction on reestablishment of the link. There are other strategies.
> Now if the remote node is simple a backup for the local node, and it > does nothing until some (outside) intervention causes it to take over, > the problem is simplified, but like the two generals problem, you > often have two nodes controlling different things, and they may need > the same view of the state of the overall system. >
This gets relatively involved. It's roughly equivalent to transaction handling in databases. -- Les Cargill
On Monday, January 4, 2016 at 4:29:54 AM UTC-5, pozz wrote:
[]
> Of course, it is too complicated for simple serial links, > maybe between > a PC and an embedded board. > > > Fortunately I don't work for medical and > mission-critical applications. > Anyway I'm curious how those kind of problems are solved in those > applications where I can't accept *any* error.
In those cases, you don't use a simple serial link. You use real networks with multiple possible data and control paths. You have error recovery protocols up to and including the application level. And in the end, you have a supervisor/administrator who can manually override failed transactions, either rolling them back or correcting them before they are committed. For your case, you have to decide what failure rate your application can tolerate and choose hardware and software/firmware that meets the requirements. Bottom line, however, is that there are no systems that can meet the requirement for "applications where I can't accept *any* error."
The 2026 Embedded Online Conference