EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

error detection rate with crc-16 CCITT

Started by Shane williams March 27, 2011
On 3/27/2011 5:36 PM, Shane williams wrote:
> On Mar 28, 12:22 pm, D Yuniskis<not.going.to...@seen.com> wrote: >> Hi Shane, >> >> On 3/27/2011 3:31 PM, Shane williams wrote: >> >>> Interesting points, thanks. The environment can be just about >>> anything. I suspect we'll back off the baud rate fairly quickly once >>> errors start occurring. I'm also thinking we could raise the security >>> for some of the critical messages, like double transmissions perhaps. >> >> Consider carefully what sort of "encoding" you use. E.g., >> "double transmissions" might add lots of overhead for very >> little gain in "reliability". >> >> You can [1] also consider dynamically varying the data rate in >> a TDM sort of scheme -- so, in this timeslot, you run at a slow, >> reliable rate transfering critical messages; then, in this other >> timeslot, you run "flat out" pushing data that would be "nice to >> have" but not critical to proper operation. >> >> Again, you really need to look hard at what you are likely to >> encounter "in the field" before you can come to any expectations >> regarding likely performance. I've seen (and have been guilty, >> myself!) some pretty mangled patches to deployed systems "just >> to get by until the FedEx replacement parts delivery arrives". >> If you *might* be running on the bleeding edge in some configuration, >> the last thing you want is a guy in the field to *think* things >> are OK when, in fact, they are not. >> >> [e.g., you might want to add a switch that forces communications >> to stay in the "degraded/secure" mode if you suspect you are not >> catching all the communication errors in a particular installation... >> because the tech made a cable out of "bell wire"] >> >> ---------------------------- >> >> [1] Depends on what is on the other end of the link, of course. >> But, if you can autobaud dynamically, then that suggests you have >> some control over both ends of the link! > > Yep, it's the same device at both ends. > > Regarding double transmissions, what do you mean by "encoding". We > could complement all bits in the second transmission I guess.
You are sending 2*n bits to encode n bits of data. Yet, that encoding will only *detect* a single bit error. Won't *correct* ANY errors. Won't *see* (certain) two bit errors. etc. I.e., your choice of message encoding has lots of overhead (twice as many bits!) but doesn't give you a corresponding increase in "reliability". Without understanding what sorts of errors you are likely to encounter, it is hard to design a protocol and encoding scheme that will be resilient to *those* errors.
> TDM might not be viable and probably too much hassle I suspect. The > baud rate behavior will be user configurable with probably a system > wide switch to allow the faster baud rate.
You can also opt to run at the slower (more reliable) rate ALL THE TIME and encode command messages more robustly than "less important messages". I.e., so command messages have greater Hamming distances (require more bandwidth per bit, so to speak) while less important messages are *compressed* so there is more "data" per bit -- and less protection against corrupted transmission. As such, the compressed data appears to have a higher bandwidth -- at reduced reliability -- even though it is being sent over the same "bit rate" channel.
On 3/27/2011 5:41 PM, Shane williams wrote:

> The system is a ring of devices with each connection point to point > with one device at each end.
Do you *literally* mean a ring topology? I.e., (excuse the crappy ASCII art) AAAA ----> BBBB ----> CCCC ----> DDDD AAAA BBBB CCCC DDDD AAAA <----------<----------<---- DDDD So, for A to send to D, B and C act as intermediaries? Now, hold that thought... How does C send to A? I.e., is the "bottom" connection simply a pass-thru connection from the downstream node? Or, is it an active connection (like a second comm channel)? Asked another way, can C send to A *without* going through D (i.e., by going through B, instead)? Regardless... consider that if you twiddle with the baud rate on any link, you will either need to make sure *all* links "simultaneously" update their baud-rates (taking into consideration any packets "in the pipe") -- or -- you have to provide an elastic store in each node and some smarts to decide what data that node can *drop* (since it's outbound connection may not? be at the same rate as it's inbound connection) [this last bit applies iff there is a real second channel in each node like: AAAA ----> BBBB ----> CCCC ----> DDDD AAAA BBBB CCCC DDDD AAAA <---- BBBB <---- CCCC <---- DDDD
On Mar 27, 5:31=A0pm, Shane williams <shane.2471...@gmail.com> wrote:
> On Mar 28, 8:29=A0am, D Yuniskis <not.going.to...@seen.com> wrote: > > > > > > > Hi Shane, > > > On 3/27/2011 4:39 AM, Shane williams wrote: > > > > On Mar 27, 11:53 pm, Michael Karas<mka...@carousel-design.com> =A0wro=
te:
> > > [8<] > > > >> Are you getting some of the errors in your transmission path > > >> due to distortion of the RS485 waveform due to non-equal propagation > > >> delays through your logic on the "0"-->"1" transition versus the > > >> one from "1"-->"0"? Common problem with certain optocouplers. ;-) > > > And some devices degrade with age. > > > > Thanks. I'm trying to figure out whether it's possible/ viable to > > > dynamically determine the fastest baud rate we can use by checking th=
e
> > > error rate. =A0The cable lengths and types of wire used when our syst=
ems
> > > are installed varies and I was hoping we could automatically work out > > > what speed a particular connection can run at. =A0The spec for the > > > MOC5007 Optocoupler seems a bit vague so I was trying to find a bette=
r
> > > one. > > > <frown> =A0You might, instead, want to think of this from the > > "engineering" standpoint -- what are the likely/expected > > *sources* of your errors? =A0I.e., how is the channel typically [1] > > going to be corrupted. > > > First, think of the medium by itself. =A0With a given type of > > cable (including "crap" that someone might fabricate on-the-spot), > > how will your system likely behave (waveform distortions, > > sampling skew in the receiver, component aging, etc.). > > > Then, think of the likely noise sources that might interfere > > with your signal. =A0Is there some synchronous source nearby that > > will periodically be bouncing your grounds or coupling directly > > to your signals (i.e., will your cable be routed alongside > > something noisey)? =A0[this assumes you have identified any > > sources of "noise" that your system imposes on *itself*! =A0e.g., > > each time *you* command the VFD to engage the 10HP motor you > > might notice glitches in your data...] > > > Then, think of what aperiodic/transient/"random" disturbances > > are likely to be encountered in your environment. > > > In each case, think of the impact on the data stream AT ALL > > THE DATA RATES YOU *MIGHT* BE LIKELY TO HAVE IN USE. =A0Are > > you likely to see lots of dispersed single bit errors? =A0How > > far apart (temporally) are they likely to be (far enough > > that two different code words can cover them?) =A0Or, will > > you encounter a burst of consecutive errors? =A0(if so, how > > wide?) > > > Finally, regarding your hinted algorithm: =A0note that the > > time constant you use in determining when/if to change rates > > has to take into consideration these observations on the likely > > environment. =A0E.g., if errors are likely to creep in "slowly" > > (beginning with low probability, low error rate), then you > > can "notice" the errors and start anticipating more (?) and > > back off on your data rate -- hopefully, quick enough that the > > error rate doesn't grow to exceed your *continued* ability > > for your CRC to remain effective. > > > OTOH, if the error rate ever "grows" (instantaneously) faster > > than your CRC is able to detect the increased error rate, > > you run the risk of accepting bad data "as good". =A0And, sitting > > "fat, happy and glorious" all the while you are doing so! > > (i.e., sort of like a PLL locking on a harmonic outside the > > intended capture range). > > > Can you, instead, figure out how to *ensure* a reliable channel? > > > -------------------- > > [1] and *atypically*! > > Interesting points, thanks. =A0The environment can be just about > anything. =A0I suspect we'll back off the baud rate fairly quickly once > errors start occurring. =A0I'm also thinking we could raise the security > for some of the critical messages, like double transmissions perhaps.
Use a proper forward error correction scheme. You'll be able to monitor the increase in error rate while still getting most packets through. A Reed-Solomon code will allow you to (for example) add 20 bytes to a 235 byte message and correct any 10 bad bytes (and all detect all bad messages with no more than 19 bad bytes). If you're getting a bit corrected every few dozen packets, it's probably safe to bump up the data rate. If it's a couple dozen bits in every packet, it's time to back off. In fact, this can substantially increase your effective data rate, as you can continue to run in the presence of a moderate number of errors (disk drives, for instance, run well into that region, and it's relatively rare these days that *any* sector actually reads "clean," and a very heavy duty ECC code is used to compensate). You can also improve things by using a multi level scheme, which could be a simple duplication (think disk RAID-1), or some combined code over multiple packets (simply parity like RAID-5, or Reed-Solomon-ish like RAID-6), which would provide added recovery, at the expense of added latency (mainly in the presence of errors). Since you mentioned that you have at least two classes of data (critical and nice to have), apply the second level of FEC to just the critical data (after protecting each packet with an appropriate RS code), and even a substantial spike in error rate, you're likely to get the critical stuff through.
On Mar 28, 6:23=A0pm, D Yuniskis <not.going.to...@seen.com> wrote:
> On 3/27/2011 5:41 PM, Shane williams wrote: > > > The system is a ring of devices with each connection point to point > > with one device at each end. > > Do you *literally* mean a ring topology? =A0I.e., (excuse the > crappy ASCII art) > > =A0 AAAA ----> BBBB ----> CCCC ----> DDDD > =A0 AAAA =A0 =A0 =A0 BBBB =A0 =A0 =A0 CCCC =A0 =A0 =A0 DDDD > =A0 AAAA <----------<----------<---- DDDD > > So, for A to send to D, B and C act as intermediaries? > > Now, hold that thought... > > How does C send to A? =A0I.e., is the "bottom" connection > simply a pass-thru connection from the downstream node? > Or, is it an active connection (like a second comm channel)? > Asked another way, can C send to A *without* going through > D (i.e., by going through B, instead)? > > Regardless... =A0consider that if you twiddle with the baud rate > on any link, you will either need to make sure *all* links > "simultaneously" update their baud-rates (taking into > consideration any packets "in the pipe") > > -- or -- > > you have to provide an elastic store in each node and some > smarts to decide what data that node can *drop* (since it's > outbound connection may not? be at the same rate as it's > inbound connection) > > [this last bit applies iff there is a real second channel > in each node like: > > =A0 AAAA ----> BBBB ----> CCCC ----> DDDD > =A0 AAAA =A0 =A0 =A0 BBBB =A0 =A0 =A0 CCCC =A0 =A0 =A0 DDDD > =A0 AAAA <---- BBBB <---- CCCC <---- DDDD
It's physically a 2 wire half duplex ring with messages going in both directions around the ring to provide redundancy. Say 8 nodes 1 to 8. Node 1 talks to nodes 2 and 8, node 2 talks to nodes 1 and 3 etc. However we may end up with 3 ports per node making it a collection of rings or a mesh. The loading at the slowest baud rate is approx 10% for 64 nodes. If we decide to allow mixed baud rates, each node will have the ability to tell its adjacent nodes to slow down when its message queue gets to a certain level, allowing it to cope with a brief surge in messages. Also to help the propagation delay, we might split long messages to a max of 50 bytes or so.
Shane williams wrote:

> It's physically a 2 wire half duplex ring with messages going in both > directions around the ring to provide redundancy. Say 8 nodes 1 to > 8. Node 1 talks to nodes 2 and 8, node 2 talks to nodes 1 and 3 etc. > > However we may end up with 3 ports per node making it a collection of > rings or a mesh. The loading at the slowest baud rate is approx 10% > for 64 nodes. If we decide to allow mixed baud rates, each node will > have the ability to tell its adjacent nodes to slow down when its > message queue gets to a certain level, allowing it to cope with a > brief surge in messages. Also to help the propagation delay, we might > split long messages to a max of 50 bytes or so. >
I think ddcmp dates to the mid 70's and was originally designed by digital / dec for their decnet network, then updated later for ethernet. Fwir, it is a connection oriented protocol implemented as a multilayer stack, that provided reliable comms between nodes. It had error detection, retries etc much as tcp/ip does. It's a long time since I used decnet, but I know that there are ddcmp protocol specs and other docs out there which describe the whole stack. There is, I think, even a linux decnet protocol driver which might be a usefull bit of code to look at, even if the complete stack is too much for the application... Regards, Chris

Shane williams wrote:
> On Mar 28, 6:51 am, Tim Wescott <t...@seemywebsite.com> wrote: > >>On 03/27/2011 07:21 AM, Vladimir Vassilevsky wrote: >> >> >> >> >>>Shane williams wrote: >> >>>>Thanks. I'm trying to figure out whether it's possible/ viable to >>>>dynamically determine the fastest baud rate we can use by checking the >>>>error rate. >> >>>Yes. But: >> >>>1) It is easier, faster and more reliable to evaluate the channel by >>>transmitting a known pseudo-random test pattern rather then the actual >>>data. >> >>I've done this -- and it is. >> >> >>>2) If the baud rate is changed dynamically, how would the receivers know >>>the baud rate of the transmitters? >> >>There's ways. Any good embedded programmer should be able to figure out >>half a dozen before they even put pen to napkin. >> >> >>>3) Since the system is intended to be operable even at the lowest baud, >>>why not always use the lowest baud? >> >>If it's like ones that I've worked with, the data over the link is a >>combination of high-priority "gotta haves" like operational data, and >>lower-priority "dang this would be nice" things like diagnostics, faster >>status updates, and that sort of thing. >> >>So the advantages of going up in speed are obvious. For that matter, >>there may be advantages to being able to tell the a maintenance guy what >>not-quite-fast-enough speed can be achieved, so he can make an informed >>choice about what faults to look for. >> > > > Didn't think about that. > > You're exactly right about the need for speed. Background data is > fine at the slower rate but when an operator is doing something on the > system we want the response to be faster than the slowest rate gives > us. > > Switching rates seems fairly easy to me. One end tells the other what > rate they're switching to, the other acknowledges, if no ack then > retry a couple of times. If one end switches and the other doesn't, > after one second or so of no communication, they both switch back to > the slowest rate.
Some people are just looking to find trouble for their ass. Perhaps, they are masochists; they like to be fucked. Good luck with that; there are almost limitless possibilities for the protocol malfunctioning. VLV
On 2011-03-28, ChrisQ <meru@devnull.com> wrote:
> > I think ddcmp dates to the mid 70's and was originally designed by > digital / dec for their decnet network, then updated later for ethernet.
Yes, it was before the VAX days. (VMS is a part of my day job, so I am familiar with DEC history.)
> Fwir, it is a connection oriented protocol implemented as a multilayer > stack, that provided reliable comms between nodes. It had error > detection, retries etc much as tcp/ip does. It's a long time since I > used decnet, but I know that there are ddcmp protocol specs and other > docs out there which describe the whole stack. There is, I think, even a > linux decnet protocol driver which might be a usefull bit of code to > look at, even if the complete stack is too much for the application... >
The Phase IV documents can be found at: http://linux-decnet.sourceforge.net/docs/doc_index.html I don't know what the current status of the DECnet code in Linux is however as I never use it. Simon. -- Simon Clubley, clubley@remove_me.eisner.decus.org-Earth.UFP Microsoft: Bringing you 1980s technology to a 21st century world
Simon Clubley wrote:

> > The Phase IV documents can be found at: > > http://linux-decnet.sourceforge.net/docs/doc_index.html > > I don't know what the current status of the DECnet code in Linux is however > as I never use it. > > Simon. >
A dec document describing the low level protocol, crc, retries and states etc can be found at: http://decnet.ipv7.net/docs/dundas/aa-d599a-tc.pdf I had a previous life working with dec kit and thought I recognised the name, perhaps from the vms group, but were you by any chance a contractor in the mid to late 80's ?... Regards, Chris
On Mar 28, 6:54=A0pm, "robertwess...@yahoo.com"
<robertwess...@yahoo.com> wrote:
> On Mar 27, 5:31=A0pm, Shane williams <shane.2471...@gmail.com> wrote: > > > Interesting points, thanks. =A0The environment can be just about > > anything. =A0I suspect we'll back off the baud rate fairly quickly once > > errors start occurring. =A0I'm also thinking we could raise the securit=
y
> > for some of the critical messages, like double transmissions perhaps. > > Use a proper forward error correction scheme. =A0You'll be able to > monitor the increase in error rate while still getting most packets > through. =A0A Reed-Solomon code will allow you to (for example) add 20 > bytes to a 235 byte message and correct any 10 bad bytes (and all > detect all bad messages with no more than 19 bad bytes). =A0If you're > getting a bit corrected every few dozen packets, it's probably safe to > bump up the data rate. =A0If it's a couple dozen bits in every packet, > it's time to back off. =A0In fact, this can substantially increase your > effective data rate, as you can continue to run in the presence of a > moderate number of errors (disk drives, for instance, run well into > that region, and it's relatively rare these days that *any* sector > actually reads "clean," and a very heavy duty ECC code is used to > compensate). > > You can also improve things by using a multi level scheme, which could > be a simple duplication (think disk RAID-1), or some combined code > over multiple packets (simply parity like RAID-5, or Reed-Solomon-ish > like RAID-6), which would provide added recovery, at the expense of > added latency (mainly in the presence of errors). =A0Since you mentioned > that you have at least two classes of data (critical and nice to > have), apply the second level of FEC to just the critical data (after > protecting each packet with an appropriate RS code), and even a > substantial spike in error rate, you're likely to get the critical > stuff through.
Thanks. Error correction sounds like it would be too CPU intensive. I'd be happy just to detect errors. Do you have any idea how many bytes we would have to add to a 60 byte message to detect 19 bad bytes or less and how CPU intensive it is?
On Mar 29, 4:04=A0am, Vladimir Vassilevsky <nos...@nowhere.com> wrote:
> Shane williams wrote: > > On Mar 28, 6:51 am, Tim Wescott <t...@seemywebsite.com> wrote: > > > Switching rates seems fairly easy to me. =A0One end tells the other wha=
t
> > rate they're switching to, the other acknowledges, if no ack then > > retry a couple of times. =A0If one end switches and the other doesn't, > > after one second or so of no communication, they both switch back to > > the slowest rate. > > Some people are just looking to find trouble for their ass. Perhaps, > they are masochists; they like to be fucked. Good luck with that; there > are almost limitless possibilities for the protocol malfunctioning. >
Can you describe just one possibility?

The 2024 Embedded Online Conference