error detection rate with crc-16 CCITT| page 7

Reply by Shane williams ●March 30, 20112011-03-30

On Mar 31, 1:31=A0am, ChrisQ <m...@devnull.com> wrote:
> Shane williams wrote:
>
> > I found out today ddcmp was used purely because it calculated the CRC
> > for us. =A0All it does is the framing. =A0All the state transition and
> > error recovery stuff is turned off. =A0Using ddcmp was probably a
> > mistake because ccitt crc can be calculated quickly enough and soon
> > we'll be doing a new version of this device with a different micro
> > which will have to be compatible with the existing device so will
> > still have to ddcmp but without the hardware support.
>
> So you using some hardware device with internal crc hw ?. Just
> curious, which device ?.
>

Motorola 68302


>
>
> > I'm trying to improve the propagation delay of messages around the
> > ring without requiring the customer to fit twisted pair cable
> > everywhere and I'm also trying to improve the error monitoring so we
> > can signal when a connection isn't performing well enough without
> > creating nuisance faults, hence my interest in the error detection
> > capability of crc16-ccitt.
>
> Two unknowns: Max cable length between nodes and max baud rate ?. Assume
> that you are currently running unbalanced rs232 style cabling ?.

It's RS485 but apparently a variety of cable gets used, not always
twisted pair.


>
> If you are limited on baud rate due to cable length, you might be able to
> compress the data. A recent project for led road signs was limited to 960=
0
> bauds, but the screen update requirement of 1 second max meant that we
> had no
> option but to use compression.
>

I don't know much about compression but it sounds too CPU intensive
for the 68302?  What micro are you using?

Reply by D Yuniskis ●March 30, 20112011-03-30

Hi Shane,

On 3/30/2011 5:34 AM, Shane williams wrote:
> On Mar 30, 11:40 am, "robertwess...@yahoo.com"
> <robertwess...@yahoo.com>  wrote:
>>
>> The traditional way to lower latency in a ring is to start
>> transmitting to the next node early - at least as soon as you see that
>> the address (hopefully at the front of the frame) isn't yours.  If the
>> frame is bad, you can force that to be passed on by making sure the
>> ending delimiter is transmitted with an indication of error.  If you
>> do it right, then in the worst case the bad frame will be pruned at
>> each node, so even if the address has been damaged* (or it was
>> addressed to a non-existent node), it'll get removed in a reasonable
>> amount of time.
>
> It's half duplex so we can't start transmitting early.

So, you have to buffer a message, verify it's integrity and then
push it on to the next node.  This suggests it is either done
*in* the Rx ISR *or* an ASR running tightly coupled to it
(else you risk adding processing delays to the propagation delay).

I.e., the time a message takes to circumnavigate the ring is
~K * n where K reflects message size, baud rate and per node
processing.

>> *And exactly where the frame is removed from the ring is a design
>> question.  Often the sender removes frames it had sent, when they make
>> their way back around, in which case the critical item for removal is
>> the source address (and usually in that case the destination node sets
>> a "copied" bit in the trailer, thus verifying physical transmission of
>> the frame to the destination).
>
> In our case, there are two logical rings and each message placed on
> the ring is sent in both directions.  When the two messages meet up
> approximately half-way round they annihilate each other - but if the
> annihilation fails, the sender removes them as well.

How does the sender *recognize* them as "his to annihilate"?
I.e., if the data can be corrupted, so can the "sender ID"!
The problem with a ring is that it has no "end" so things have
the potential to go 'round and 'round and 'round and...

If you unilaterally drop any message found to be corrupted, then
you have no way of knowing if it was received by its intended
recipient (since you don't know who it's recipient is -- or was).
If you await acknowledgment (and retry until you receive it),
then you run the risk of a message being processed more than
once.   etc.

Reply by ChrisQ ●March 30, 20112011-03-30

Shane williams wrote:

> It's RS485 but apparently a variety of cable gets used, not always
> twisted pair.
>

Not wishing to offend, but this sounds like a legacy project that was
originally ill thought out and a bit of a hack to start with. You used
the ddcmp frame format, but didn't  implement the full protocol.
The system wiring is non RS485 conforming, so susceptable to noise
related errors and line drive problems. Data reliability is exactly
what protocol definitions like ddcmp are designed to address.

I think you will have to at least rewire with twisted pair before 
addressing
any sw issues. If the hardware is bad, then no amount of software will fix
the problem...

>>
> 
> I don't know much about compression but it sounds too CPU intensive
> for the 68302?  What micro are you using?
> 

The project used the Renesas 32C87 series from Hitachi. Not such an
elegant arch as 68k, but almost certainly faster than a '302. Depending
on the data, simple compression like huffman encoding can work quite
well, but another way might be to simplify / reorganise the frame format
or data within it, so you can send fewer bytes...

Regards,

Chris

Reply by D Yuniskis ●March 30, 20112011-03-30

Hi Shane,

On 3/30/2011 5:12 AM, Shane williams wrote:
> On Mar 30, 4:09 am, D Yuniskis<not.going.to...@seen.com>  wrote:

>> Is this a synchronous protocol?  Or, are you just using a pair
>> of UARTs on each device to implement the CW&  CCW links?
>
> Asynchronous with a pair of uarts, one for clockwise, one for counter-
> clockwise.

OK.  Been there, done that, T-shirt to prove it...

>> If that's the case, you have to remember to include all the
>> "overhead bit(-time)s" in your evaluation of the error rate
>> and your performance thereunder.
>>
>> E.g., a start bit error is considerably different than a
>> *data* bit error (think about it).
>
> Hmm.  I forgot about that.  A start or stop bit error means the whole
> message is rejected which is good.

My point was that if you *miss* a start bit, then you have -- at
the very least -- missed the "first" bit of the message (because,
if it was MARKING, the UART just ignored it and, if it was SPACING,
the UART thought *it* was the start bit).  If you are pushing
bytes (characters) down the wire at the maximum data rate (minimal
stop time between characters), then you run the risk of part of
the *next* character being "shifted" into this "misaligned" first
character.  I.e., it gets really difficult to figure out *if*
your code will be able to detect an error (because the received
byte "looks wrong") or if, BY CHANCE, the bit patterns can conspire
to look like a valid "something else".

>>> However we may end up with 3 ports per node making it a collection of
>>> rings or a mesh.  The loading at the slowest baud rate is approx 10%
>>
>> [scratches head] then why are you worrying about running at
>> a higher rate?
>
> Because not all sites can wire as a mesh.  The third port is optional
> but helps the propagation delay a lot.

Sorry, the subject wasn't clear in my question  <:-(
I mean, if you were to stick with the slowest rate, your
"10%" number *suggests* you have lots of margin -- why
push for a higher rate with the potential for more
problems?

>> Latency might be a reason -- assuming you
>> don't circulate messages effectively as they pass *through*
>> a node.  But, recall that you only have to pass through
>> 32 nodes, worst case, to get *a* copy of a message to any
>> other node...
>>
>>> for 64 nodes.  If we decide to allow mixed baud rates, each node will
>>> have the ability to tell its adjacent nodes to slow down when its
>>> message queue gets to a certain level, allowing it to cope with a
>>> brief surge in messages.
>>
>> Depending on how you chose to allocate the Tx&Rx devices in each
>> link -- and, whether or not your baudrate generator allows
>> the Tx and Rx to run at different baudrates -- you have to:
>> * make sure your Tx FIFO (hardware and software) is empty before
>>     changing Tx baudrate
>> * make sure your "neighbor" isn't sending data to you when you
>>     change your Rx baudrate (!)
>
> This is assured.  It's half duplex and the hardware sends a whole
> message at a time.

So, for each ring, you WON'T receive a message until you have
transmitted any previous message?  Alternatively, you won't
transmit a message until your receiver is finished?

What prevents two messages from being "in a ring" at the same
time (by accident)?  I.e., without violating the above, it
seems possible that node 18 can be sending to node 19 (while
19 is NOT sending to 20 and 17 is not sending to 18) at the
same time that node 3 is sending to node 4 (while neither 2
nor 4 are actively transmitting).

Since this *seems* possible, how can you be sure one message
doesn't get delayed slightly so that the second message ends
up catching up to it?  (i.e., node 23 has no way of knowing
that node 24 is transmitting to 25 so 23 *could* start sending
a message to 24 that 24 fails to notice -- in whole or in
part -- because 24 is preoccupied with its outbound message)

>> Consider that a link (a connection to *a* neighbor) that "gives you
>> problems" will probably (?) cause problems in all communications
>> with that neighbor (Tx&  Rx).  So, you probably want to tie the
>> Tx and Rx channels of *one* device to that neighbor (vs. splitting
>> the Rx with the upstream and Tx with the downstream IN A GIVEN RING)
>
> Not sure I follow but a single uart does both the tx and rx to the
> same neighbor.
>
>> [this may seem intuitive -- or not!  For the *other* case, see end]
>
>> Now, when you change the Rx baudrate for the upstream CW neighbor,
>> you are also (?) changing the Tx baudrate for the downstream CCW
>> neighbor (the "neighbor" is the same physical node in each case).
>
> Yes
>
>> Also, you have to consider if you will be changing the baudrate
>> for the "other" ring simultaneously (so you have to consider the
>> RTT in your switching calculations).
>
> What is RTT?

Round Trip Time (sorry :< )  I.e., you (each of your nodes) has
to be aware of the time it takes a message to (hopefully) make
it around the ring.

>> Chances are (bet dollars to donuts?), the two rings are in different
>> points of their message exchange (since the distance from message
>> originator to that particular node is different in the CW ring
>> vs. the CCW ring).  I.e., this may be a convenient time to change
>> the baudrate (thereby INTERRUPTING the flow of data around the ring)
>> for the CW ring -- but, probably *not* for the CCW ring.
>
> I'm lost here.

Number the nodes 1 - 10 (sequentially).
The CW node has 1 sending to 2, 2 sending to 3, ... 10 sending to 1.
The CW node has 10 sending to 9, 9 sending to 8, ... 1 sending to 10.
The nodes operate concurrently.

So, assume 7 originates a message -- destined for 3.  In the CW ring,
it is routed as 7, 8, 9, 10, 1, 2, 3.  In the CCW ring, it is routed
(simultaneously) as 7, 6, 5, 4, 3.

*If* it progresses node to node at the exact same rates in each
ring (this isn't guaranteed but "close enough for gummit work"),
then it arrives in 8 & 6 at the same time, 9 & 5, 10 & 4, 1 & 3,
2 & 2 (though different "rings"), 3 & 1, etc. (note I have assumed,
here, that it continues around until reaching it's originator...
but, that's not important).

Now, at node 9, if the CW ring decides that the baudrate needs to be
changed and it thinks "now is a good time to do so" (because it has
*just* passed it's CW message on to node 10), that action effectively
interrupts any traffic in the CW ring (until the other nodes make
the similar baudrate adjustment in the CW direction).

But, there is a message circulating in the CCW ring -- it was just
transmitted from node 5 to 4 (while 9 was sending to 10).  It will
eventually be routed to node 9 as it continues it's way around the
CCW ring.  But, *it* is moving at the original baudrate (in the CCW
ring) while node 9 is now operating at the *new* baudrate (in the
CW ring).  So, any new traffic in the CW ring will run around
that ring at a different rate than the CCW traffic.  If you only
allow one message to be active in each ring at any given time, then
this will "resolve itself" one RTT later.  But, if the "other"
ring never decides to change baudrates... ?

And, if it *does* change baudrates at the same time as the "first"
ring, then you have to wait for the CW message to have been
completely propagated *and* the CCW message as well before making
the change.  I.e., you have to let both rings go idle before
risking the switch (or, take considerable care to ensure that
a switch doesn't happen DOWNstream of a circulating message)

>> [recall, changing baudrate is probably going to result in lots
>> of errors for the communications to/from the affected neighbor(s)]
>>
>> So, you really have to wait for the entire ring to become idle
>> before you change baudrates -- and then must have all nodes do
>> so more or less concurrently (for that ring).  If you've split the
>> Tx and Rx like I described, then this must also happen on the
>> "other" ring at the same time.
>>
>> Regarding the "other" way to split the Tx&Rx... have the Tx
>> always talk to the downstream neighbor and Rx the upstream
>> IN THE SAME RING.  In this case, changes to Tx+Rx baudrates
>> apply only to a certain ring.  So, you can change baudrate
>> when it is convenient (temporally) for that *ring*.
>>
>> But, now the two rings are potentially operating at different
>> rates.  So, the "other" ring will eventually ALSO have to
>> have its baudrate adjusted to match (or, pass different traffic)
>
> I think there must be a misunderstanding somewhere  - not sure where.

You can wire two UARTs to give you two rings in TWO DIFFERENT WAYS.
Look at a segment of the ring with three nodes:

   ------> 1 AAAA 1 --------> 1 BBBB 1 --------> 1 CCCC 1 ------->
             AAAA               BBBB               CCCC
   <------ 2 AAAA 2 <-------< 2 BBBB 2 <-------- 2 CCCC 2 <-------

vs.

   ------> 1 AAAA 2 --------> 1 BBBB 2 --------> 1 CCCC 2 ------->
             AAAA               BBBB               CCCC
   <------ 1 AAAA 2 <-------< 1 BBBB 2 <-------- 1 CCCC 2 <-------

where the numbers identify the UARTs associated with each signal.

[assume tx and rx baudrates are driven by the same baudrate generator
so there is a BRG1 and BRG2 in each node]

In the first case, when you change the baudrate of a UART at some
particular node, the baudrate for that segment in *the* ring that
the UART services (left-to-right ring vs right-to-left ring) changes.
So, you must change *after* you have finished transmitting and you
will no longer be able to receive until the node upstream from you
also changes baudrate.

In the second case, when you change the baudrate of a UART at some
particular node, the baudrate for all communications with that
particular neighbor (to the left or to the right) changes.  So,
*both* rings are "broken" until that neighbor makes the comparable
change.

Look at each scenario and its consequences while messages are
circulating (in both rings!).  Changing data rates can be a very
disruptive thing as it forces the ring(s) to be emptied; some
minimum guaranteed quiescent period to provide a safety factor
(that no messages are still in transit); the actual change
to be effected; a quiescent period to ensure all nodes are
at the new speed; *then* you can start up again.

While it sounds "child-like", you might find making a drawing
and moving some coins (tokens) around the rings as if they were
messages.  It helps to picture what changes to the rings'
operation you can make and *when*.

Either try NOT to change baudrates *or* change them at times
that can be determined a priori.

Reply by D Yuniskis ●March 30, 20112011-03-30

Hi Shane,

On 3/30/2011 4:41 AM, Shane williams wrote:
> On Mar 30, 6:39 am, ChrisQ<m...@devnull.com>  wrote:
>>
>> One could ask about the wisdom of using a ring topology, which will
>> always involve more latency than a multidrop network using some sort of
>> poll / select or request / response protocol. You must have more than one
>> comms link for redundancy, as any break in the ring isolates any node
>> past the fault. You need double the comms hardware, as each node needs
>> an rx and tx uart. In the presence of faults, a ring topology doesn't
>> degrade anything like as gracefully, as multidrop either. Finally, where
>> does ddcmp fit into the picture ?. Ddcmp is more than just a frame
>> format, it's a complete protocol spec with defined messages flows, state
>> transitions, error recovery etc...
>
> I found out today ddcmp was used purely because it calculated the CRC
> for us.  All it does is the framing.  All the state transition and
> error recovery stuff is turned off.  Using ddcmp was probably a
> mistake because ccitt crc can be calculated quickly enough and soon
> we'll be doing a new version of this device with a different micro
> which will have to be compatible with the existing device so will
> still have to ddcmp but without the hardware support.
>
> I'm trying to improve the propagation delay of messages around the

What sort of times are you seeing, presently?  At which baudrates?
How much *better* do they need to be (or, would you *like* them
to be)?

> ring without requiring the customer to fit twisted pair cable
> everywhere and I'm also trying to improve the error monitoring so we
> can signal when a connection isn't performing well enough without
> creating nuisance faults, hence my interest in the error detection
> capability of crc16-ccitt.
>
> We actually already do have an RS485 multi-drop version of this
> protocol but it's non-deterministic and doesn't work very well.  I
> don't really want to go into that...

It's relatively easy to get deterministic behavior from a 485
deployment.  And, depending on the *actual* operating conditions
of the current ring implementation, could probably achieve
lower latencies at lower baudrates.

Reply by Shane williams ●March 30, 20112011-03-30

On Mar 31, 3:33=A0am, D Yuniskis <not.going.to...@seen.com> wrote:
> Hi Shane,
>
> On 3/30/2011 5:34 AM, Shane williams wrote:
>
>
> > It's half duplex so we can't start transmitting early.
>
> So, you have to buffer a message, verify it's integrity and then
> push it on to the next node. =A0This suggests it is either done
> *in* the Rx ISR *or* an ASR running tightly coupled to it
> (else you risk adding processing delays to the propagation delay).

It's done in the Rx ISR.  Actually as well as being half duplex the
hardware sends and receives whole messages though we could change that

>
> I.e., the time a message takes to circumnavigate the ring is
> ~K * n where K reflects message size, baud rate and per node
> processing.

Yes.

>
> >> *And exactly where the frame is removed from the ring is a design
> >> question. =A0Often the sender removes frames it had sent, when they ma=
ke
> >> their way back around, in which case the critical item for removal is
> >> the source address (and usually in that case the destination node sets
> >> a "copied" bit in the trailer, thus verifying physical transmission of
> >> the frame to the destination).
>
> > In our case, there are two logical rings and each message placed on
> > the ring is sent in both directions. =A0When the two messages meet up
> > approximately half-way round they annihilate each other - but if the
> > annihilation fails, the sender removes them as well.
>
> How does the sender *recognize* them as "his to annihilate"?
> I.e., if the data can be corrupted, so can the "sender ID"!
> The problem with a ring is that it has no "end" so things have
> the potential to go 'round and 'round and 'round and...

The message would have to get corrupted undetected every time around
the ring to go round forever.

Each device that puts a message on the ring puts his own address at
the start plus a one byte incrementing sequence number.  Each node
keeps a list of address/ sequence #/ received time of the last X
messages received.  If it's seen the address/ seq# before within a
certain time, it removes the message from the ring.

>
> If you unilaterally drop any message found to be corrupted, then
> you have no way of knowing if it was received by its intended
> recipient (since you don't know who it's recipient is -- or was).
> If you await acknowledgment (and retry until you receive it),
> then you run the risk of a message being processed more than
> once. =A0 etc.

Every message a node transmits has an incrementing "ack" byte that the
next node sends back in its next message.  If the ack byte doesn't
come back correctly the message is sent again.  If the ack is lost and
a retry is sent, the receiver throws the message away because he's
already seen it.

Reply by Shane williams ●March 30, 20112011-03-30

On Mar 31, 6:49=A0am, D Yuniskis <not.going.to...@seen.com> wrote:
> Hi Shane,
>
>
> > I'm trying to improve the propagation delay of messages around the
>
> What sort of times are you seeing, presently? =A0At which baudrates?
> How much *better* do they need to be (or, would you *like* them
> to be)?

We've limited the ring to 32 nodes up till now at 57600 baud.  The
nominal target now is 64 nodes for which I've calculated a worst case
request response time of approx 1.8 seconds with no retries.  I would
like it to be about one second.  64 nodes max is a bit arbitrary so
what I'm really trying to do is get the best performance that's
reasonably achievable.  Some sites have well over 64 devices but not
all on the same ring.

>
> > ring without requiring the customer to fit twisted pair cable
> > everywhere and I'm also trying to improve the error monitoring so we
> > can signal when a connection isn't performing well enough without
> > creating nuisance faults, hence my interest in the error detection
> > capability of crc16-ccitt.
>
> > We actually already do have an RS485 multi-drop version of this
> > protocol but it's non-deterministic and doesn't work very well. =A0I
> > don't really want to go into that...
>
> It's relatively easy to get deterministic behavior from a 485
> deployment. =A0And, depending on the *actual* operating conditions
> of the current ring implementation, could probably achieve
> lower latencies at lower baudrates.

Some of the devices that connect to the multi-drop network are old and
low-powered and a token ring was too much overhead at the time.  Also
we require redundancy which needs 4 wires for multi-drop but only 2
wires for the ring.

Reply by Shane williams ●March 30, 20112011-03-30

On Mar 31, 6:14=A0am, D Yuniskis <not.going.to...@seen.com> wrote:
> Hi Shane,
>
> On 3/30/2011 5:12 AM, Shane williams wrote:
>
> > On Mar 30, 4:09 am, D Yuniskis<not.going.to...@seen.com> =A0wrote:
> >> Is this a synchronous protocol? =A0Or, are you just using a pair
> >> of UARTs on each device to implement the CW& =A0CCW links?
>
> > Asynchronous with a pair of uarts, one for clockwise, one for counter-
> > clockwise.
>
> OK. =A0Been there, done that, T-shirt to prove it...

Hi, I'm out of time today.  I'll get back to this tomorrow.
Thanks.

Reply by D Yuniskis ●March 31, 20112011-03-31

Hi Shane,

On 3/30/2011 3:56 PM, Shane williams wrote:

[8<]

>>> I'm trying to improve the propagation delay of messages around the
>>
>> What sort of times are you seeing, presently?  At which baudrates?
>> How much *better* do they need to be (or, would you *like* them
>> to be)?
>
> We've limited the ring to 32 nodes up till now at 57600 baud.  The
> nominal target now is 64 nodes for which I've calculated a worst case
> request response time of approx 1.8 seconds with no retries.  I would
> like it to be about one second.

OK, back-of-napkin guesstimates...

Assume 10b character frames transmitted "flat out" (no time between
end of stop bit and beginning of next start bit).  So, 5760 characters
per second is data rate.

If we assume N is size of packet (in "characters"), then RTT is
64 * [(N / 5760) + P] where P is time spent processing message
packet on each node before passing it along.

1.8s/64 = [(N / 5760) + P] = ~30ms

Guessing at a message size of ~100 bytes (characters) suggests the
"transmission time" component of this is ~17ms -- leaving 13ms as
a guess at the processing time, P.

If you cut this to ~0, then you achieve your 1 sec goal (almost
exactly).

This suggests that eliminating/simplifying any error detection
so that incoming messages can *easily* be propagated is a goal
to pursue.  If you can improve the reliability of the comm link
so that errors are the *exception* (i.e., unexpected), then
you can simplify the effort required to "handle" those errors.

Furthermore, if errors *are* The Exception, then you can consider
running the interface(s) in full duplex mode and starting to pass
a packet along to your successor *before* it is completely
received.  This effectively reduces the size of the message (N)
in the above calculation.

E.g., if you can hold just *10* bytes of the message before
deciding to pass it along, then the "transmission time"
component drops to 1.7ms.  Your RTT is then 0.1 sec!

Alternatively, you can spend a few ms processing in each node
and still beat your 1 sec goal -- *or*, drop the data rate by
a factor of 5 or 6 and still hit the 1 sec goal!

[remember, this is back-of-napkin calculation so I don't claim
it accurately reflects *your* operating environment.  rather, it
puts some options in perspective...]

> 64 nodes max is a bit arbitrary so
> what I'm really trying to do is get the best performance that's
> reasonably achievable.  Some sites have well over 64 devices but not
> all on the same ring.
>
>>> ring without requiring the customer to fit twisted pair cable
>>> everywhere and I'm also trying to improve the error monitoring so we
>>> can signal when a connection isn't performing well enough without
>>> creating nuisance faults, hence my interest in the error detection
>>> capability of crc16-ccitt.
>>
>>> We actually already do have an RS485 multi-drop version of this
>>> protocol but it's non-deterministic and doesn't work very well.  I
>>> don't really want to go into that...
>>
>> It's relatively easy to get deterministic behavior from a 485
>> deployment.  And, depending on the *actual* operating conditions
>> of the current ring implementation, could probably achieve
>> lower latencies at lower baudrates.
>
> Some of the devices that connect to the multi-drop network are old and
> low-powered and a token ring was too much overhead at the time.  Also
> we require redundancy which needs 4 wires for multi-drop but only 2
> wires for the ring.

How do you mean "4 wires" vs. "2 wires"?  You can run a 485 network
with a single differential pair.  A 232-ish approach requires a Tx
and Rx conductor (for each ring).  So, you could implement two
485 busses for the same conductor count as your dual UART rings.

Reply by D Yuniskis ●March 31, 20112011-03-31

Hi Shane,

On 3/30/2011 3:39 PM, Shane williams wrote:

>>>> *And exactly where the frame is removed from the ring is a design
>>>> question.  Often the sender removes frames it had sent, when they make
>>>> their way back around, in which case the critical item for removal is
>>>> the source address (and usually in that case the destination node sets
>>>> a "copied" bit in the trailer, thus verifying physical transmission of
>>>> the frame to the destination).
>>
>>> In our case, there are two logical rings and each message placed on
>>> the ring is sent in both directions.  When the two messages meet up
>>> approximately half-way round they annihilate each other - but if the
>>> annihilation fails, the sender removes them as well.
>>
>> How does the sender *recognize* them as "his to annihilate"?
>> I.e., if the data can be corrupted, so can the "sender ID"!
>> The problem with a ring is that it has no "end" so things have
>> the potential to go 'round and 'round and 'round and...
>
> The message would have to get corrupted undetected every time around
> the ring to go round forever.

No.  Once a frame is corrupted, it is no longer recognizable as
its original *intent*.  (see below)

> Each device that puts a message on the ring puts his own address at
> the start plus a one byte incrementing sequence number.  Each node

Right.  So what happens if the address gets corrupted?  Or the
sequence number?  Once it is corrupted, each successive node will
pass along the corrupted version AS IF it was a regular message
(best case, you can detect it as "corrupted" and remove it from
the ring -- but you won't know how to decide *which* message was
then "deleted")

> keeps a list of address/ sequence #/ received time of the last X
> messages received.  If it's seen the address/ seq# before within a
> certain time, it removes the message from the ring.

But you don't know how any of these things will be "corrupted".
You can only opt to remove a message that you are "suspicious of".
And that implies that your error detection scheme is robust enough
that *all* errors (signs of corruption) are detectable.

If you are setting out with the expectation that you *will* be
operating with a real (non zero) error rate, what can you do
to assure yourself that you are catching *all* errors?

>> If you unilaterally drop any message found to be corrupted, then
>> you have no way of knowing if it was received by its intended
>> recipient (since you don't know who it's recipient is -- or was).
>> If you await acknowledgment (and retry until you receive it),
>> then you run the risk of a message being processed more than
>> once.   etc.
>
> Every message a node transmits has an incrementing "ack" byte that the
> next node sends back in its next message.  If the ack byte doesn't
> come back correctly the message is sent again.

Again, how do you know that the message isn't corrupted to distort
the ACK and "whatever else"?  I.e., so that the message is no longer
recognizable as it's original form -- yet looks like a valid message
(or *not*).

How do you know that this is not a case of the message arriving
correctly but the ACK being corrupted?

> If the ack is lost and
> a retry is sent, the receiver throws the message away because he's
> already seen it.

All of these things are predicated on the assumption that
errors are rare.  So, the chance of a message (forward or ACK)
being corrupted *and* a followup/reply being corrupted is
"highly unlikely".

If you're looking at error rates high enough that you are
trying to detect errors as significant as "19 bytes out of 60",
then how much confidence can you have in *any* of the messages?