EmbeddedRelated.com
Forums

Binary protocol design: TLV, LTV, or else?

Started by Aleksandar Kuktin January 8, 2014
Hi Aleksander,

On 1/10/2014 12:53 PM, Aleksandar Kuktin wrote:
> On Thu, 09 Jan 2014 16:56:49 -0700, Don Y wrote: > >> [The OP hasn't really indicated what sort of environment he expects to >> operate within nor the intent of the device and the relative importance >> (or lack thereof) of the comms therein] > > The device is a CNC robot, to be used in manufacture. Because of that, I > can require and assume a fairly strict, secure and "proper" setup, with > or without Stuxnet and its ilk.
Ah, well... there will be an extra service charge to have Stuxnet installed. Check with the sales office for more details. I think they are running a "2-for-1" promotion -- THIS MONTH ONLY! :>
> The protocol is supposed to support transfer of compiled G-code from the > PC to a device (really a battery of devices), transfer of telemetry, > configuration and perhaps a few other things I forgot to think of by now. > > Since its main purpose is transfer of G-code, the protocol is expected to > be able to utilize fairly small packets, small enough that fragmentation > is not expected to happen (60 octets should be enough).
Remember, UDP's "eficiency" (if you want to call it that) comes at a reasonably high cost! There are no guarantees that a given datagram will be delivered (or received). The protocol that you develop *atop* UDP has to notice when stuff "goes missing" (e.g., require an explicit acknowledgement, sequence numbers, etc.) There are no guarantees that datagram 1 will be received *before* datagram 2. "Turn off plasma cutter" "Move left" can be received as "Move left" "Turn off plasma cutter" (which might be "A Bad Thing" if there is something located off to the left that doesn't like being exposed to plasma! :> ) There is no sense of a "connection" between the source and destination beyond that of each *individual* datagram. Neither party is ever aware if the other party is "still there". (add keepalives if this is necessary) There is no mechanism to moderate traffic as it increases (and, those increases can lead to more dropped/lost datagrams which leads to more retransmission *requests*, which leads to more traffic which leads to... keep in mind any other traffic on your network that could worsen this -- or, be endangered by it!) Appliances other than switches can effectively block UDP connections. If you ever intend to support a physically distributed domain that exceeds what you can achieve using "maximum cable lengths" (one of the drawbacks about moving away from "orange hose" and its ilk was the drop in maximum cable length), you have to be careful in considering what any *other* "interconnect appliances" do to the traffic you intend to pass (and, if your protocol will be routed!) [It's surprising how *short* "100m" is when it comes to cable lengths! Esp in a manufacturing setting where you might have to go *up* a considerable way -- and, leave a suitable service loop -- before you can even begin to go "over"! And, line-of-sight cable routing may be impractical. For example, here (residential), the *average* length of a network cable is close to 70 feet -- despite the fact that the (2D) diagonal of the house is just about that same length *and* all the drops are centrally terminated!] When someone later decides it should be a piece of cake for your "engineering office" to directly converse with your controllers located in the "manufacturing facility", you'll find yourself explaining why that's not easily accomplished. "Why not? I can talk to Google's servers in another *state*/country... (damn consultants always trying to charge extra for stuff they should have done in the first place!)" [N.B. Raw ethernet frames don't even give you the above (lack of) assurances :> ]
On Fri, 10 Jan 2014 15:30:28 -0700, Don Y <this@isnotme.com> wrote:

>Hi Aleksander, > >On 1/10/2014 12:53 PM, Aleksandar Kuktin wrote: >> On Thu, 09 Jan 2014 16:56:49 -0700, Don Y wrote: >> >>> [The OP hasn't really indicated what sort of environment he expects to >>> operate within nor the intent of the device and the relative importance >>> (or lack thereof) of the comms therein] >> >> The device is a CNC robot, to be used in manufacture. Because of that, I >> can require and assume a fairly strict, secure and "proper" setup, with >> or without Stuxnet and its ilk.
Such applications have been traditionally handled with half duplex RS-485 multidrop request/response protocols (such as Modbus) at speeds of 9600 or even 115200 bit/s. In a new implementation with Ethernet hardware, you get galvanic isolation, much higher gross throughput (at least 10 Mbit/s), bus arbitration, message framing and CRC detection for "free", i.e. in hardware. In a multidrop environment you can communicate with each device in parallel in a full-duplex. This greatly compensates for the latencies with simple half-duplex transactions between the master and a single slave.
>> The protocol is supposed to support transfer of compiled G-code from the >> PC to a device (really a battery of devices), transfer of telemetry, >> configuration and perhaps a few other things I forgot to think of by now. >> >> Since its main purpose is transfer of G-code, the protocol is expected to >> be able to utilize fairly small packets, small enough that fragmentation >> is not expected to happen (60 octets should be enough). > >Remember, UDP's "eficiency" (if you want to call it that) comes at >a reasonably high cost! > >There are no guarantees that a given datagram will be delivered >(or received). The protocol that you develop *atop* UDP has >to notice when stuff "goes missing" (e.g., require an explicit >acknowledgement, sequence numbers, etc.)
On traditional RS-4xx networks there is no such guarantees either, so request/response+timeout/retransmit protocols have been used for decades, why not use it on raw eth or UDP ?
>There are no guarantees that datagram 1 will be received *before* >datagram 2. "Turn off plasma cutter" "Move left" can be received >as "Move left" "Turn off plasma cutter" (which might be "A Bad Thing" >if there is something located off to the left that doesn't like being >exposed to plasma! :> )
Still using the traditional request/response model, you do not send the "Move left" before you receive the ack from "Turn off" command. Better yet, use the longer message available, put all the critical elements into a single transaction (eth/UDP frame). A frame could consist of "Move to X,Y", "Plasma on", "Move to A,B at speed z", "Plasma off". After this full sequence has been acknowledged, the master should not send a new burn sequence.
>There is no sense of a "connection" between the source and destination >beyond that of each *individual* datagram. Neither party is ever >aware if the other party is "still there". (add keepalives if this is >necessary)
When the slave acknowledges the master request, this is easily handled.
>There is no mechanism to moderate traffic as it increases (and, >those increases can lead to more dropped/lost datagrams which >leads to more retransmission *requests*, which leads to more >traffic which leads to... keep in mind any other traffic on >your network that could worsen this -- or, be endangered by it!)
The amount of traffic in a network controlling a real (mechanical) device, is limited by the mechanical movement etc.) of that device, not the network capacity. In old coaxial based 10Base2/5 half duplex networks, that might have been an issue. For switch based (butt not hub based) 10xxxBaseT networks, this is not really an issue, since a typical industrial device do not need more than 10BaseT.
>Appliances other than switches can effectively block UDP connections. >If you ever intend to support a physically distributed domain >that exceeds what you can achieve using "maximum cable lengths" >(one of the drawbacks about moving away from "orange hose" and its >ilk was the drop in maximum cable length), you have to be careful >in considering what any *other* "interconnect appliances" do to >the traffic you intend to pass (and, if your protocol will be routed!)
I would not be that foolish to do direct machine control over unpredictable nets such as the Internet or even less over some wireless connections, such as WLANs, microwave or satellite links. All the security issues must be handled with wired connection and in some cases with certified LAN systems.
>[It's surprising how *short* "100m" is when it comes to cable lengths! >Esp in a manufacturing setting where you might have to go *up* a >considerable way -- and, leave a suitable service loop -- before you >can even begin to go "over"! And, line-of-sight cable routing may be >impractical. For example, here (residential), the *average* length of >a network cable is close to 70 feet -- despite the fact that the >(2D) diagonal of the house is just about that same length *and* all >the drops are centrally terminated!]
100 m is the distance for a 10xBaseT twisted pair cable limit, putting switches in between will solve that. For real heavy industry copper wiring is a no no issue and you have to use fibers anyway, with single mode cable fibers with over 30 km range without optical repeaters.
>When someone later decides it should be a piece of cake for your >"engineering office" to directly converse with your controllers located >in the "manufacturing facility", you'll find yourself explaining >why that's not easily accomplished. "Why not? I can talk to >Google's servers in another *state*/country... (damn consultants >always trying to charge extra for stuff they should have done in >the first place!)"
Isn't that a good thing that you can deny access from the outside world to a critical controller ? If there is a specific need to let a qualified person from a remote site access the device directly, build a secured VPN connection to the PC controlling the devices and use all required firewall etc. methods to restrict the access.
>[N.B. Raw ethernet frames don't even give you the above (lack of) >assurances :> ]
This is a really good thing.
On Fri, 10 Jan 2014 19:21:24 +0000 (UTC), Aleksandar Kuktin
<akuktin@gmail.com> wrote:

>On Thu, 09 Jan 2014 07:05:33 -0700, Don Y wrote: > >> It must have been entertaining for the folks who came up with ethernet, >> IP, etc. way back when to start with a clean slate and *guess* as to >> what would work best! :> > >Actually, that's not how it happened at all. :) > >Just like in any evolutionary process, several possible solutions were >produced and the ones that were "fittest" and most adapted to the >environment were the ones that prevailed.
it is interesting to note that the Ethernet was not a top contester in the beginning due to the high cost. One of the first DIX ethernet implementations for DECNET was the DEUNA card for PDP-11/VAX-11, requiring two Unibus cards (very expensive), which was connected to the RG-8 coaxial cable using vampire tap transceivers (very expensive) using the AUI interface (essentially RS-422 for Tx, Rx, Control and Collision detect). In fact the AUI interface is electrically and functionally quite similar to 10BaseT interface (except control and collision detect). The cost of the vampire tap transceivers were so high that the first "Ethernet" network I designed and built was a network in a computer room between computers, using AUI cabling (with 15 bit connectors) through a DELNI "hub", later adding some long AUI cables for terminal servers at the other end of the office. Thus no coaxial cable used. 10xxxBaseT based hubs and switches really made Ethernet a viable option.
On Fri, 10 Jan 2014 19:15:28 +0000 (UTC), Aleksandar Kuktin
<akuktin@gmail.com> wrote:

>On Wed, 08 Jan 2014 22:14:35 +0000, Grant Edwards wrote:
>> If you can, I'd recommend using UDP (which is fairly low overhead). The >> PC end can then be written as a normal user-space application that >> doesn't require admin privledges. You'll still have problems with some >> routers and NAT firewalls, but way fewer problems than trying to use raw >> Ethernet. > >UDP/IP is just an extension of IP. I considered using raw IP, but decided >against it on grounds that I didn't want to implement IP, simple as it >may be.
IP is different from raw Ethernet. IP requires handling the ARP issues, adding UDP to it, is just the port number. ARP issues require about a page of code and UDP even less.
On 2014-01-11, upsidedown@downunder.com <upsidedown@downunder.com> wrote:

> The cost of the vampire tap transceivers were so high that the first > "Ethernet" network I designed and built was a network in a computer > room between computers, using AUI cabling (with 15 bit connectors) > through a DELNI "hub", later adding some long AUI cables for terminal > servers at the other end of the office. Thus no coaxial cable used.
The first place I worked where "Ethernet" was widely used, there was a thick Ethernet backbone, but the vast majority of the wiring was AUI cables and hubs. -- Grant
On 1/11/2014 12:50 AM, upsidedown@downunder.com wrote:
> On Fri, 10 Jan 2014 15:30:28 -0700, Don Y<this@isnotme.com> wrote: >> On 1/10/2014 12:53 PM, Aleksandar Kuktin wrote: >>> On Thu, 09 Jan 2014 16:56:49 -0700, Don Y wrote:
>>> The protocol is supposed to support transfer of compiled G-code from the >>> PC to a device (really a battery of devices), transfer of telemetry, >>> configuration and perhaps a few other things I forgot to think of by now. >>> >>> Since its main purpose is transfer of G-code, the protocol is expected to >>> be able to utilize fairly small packets, small enough that fragmentation >>> is not expected to happen (60 octets should be enough). >> >> Remember, UDP's "eficiency" (if you want to call it that) comes at >> a reasonably high cost! >> >> There are no guarantees that a given datagram will be delivered >> (or received). The protocol that you develop *atop* UDP has >> to notice when stuff "goes missing" (e.g., require an explicit >> acknowledgement, sequence numbers, etc.) > > On traditional RS-4xx networks there is no such guarantees either, so > request/response+timeout/retransmit protocols have been used for > decades, why not use it on raw eth or UDP ?
You're making my point for me: the protocol that is layered atop UDP has to include these provisions. (e.g., using TCP handles much of this -- at an added expense).
>> There are no guarantees that datagram 1 will be received *before* >> datagram 2. "Turn off plasma cutter" "Move left" can be received >> as "Move left" "Turn off plasma cutter" (which might be "A Bad Thing" >> if there is something located off to the left that doesn't like being >> exposed to plasma! :> ) > > Still using the traditional request/response model, you do not send > the "Move left" before you receive the ack from "Turn off" command. > Better yet, use the longer message available, put all the critical > elements into a single transaction (eth/UDP frame). A frame could > consist of "Move to X,Y", "Plasma on", "Move to A,B at speed z", > "Plasma off". After this full sequence has been acknowledged, the > master should not send a new burn sequence.
What if the message gets fragmented (by a device along the way) and a fragment gets dropped? What if the message is VERY long (i.e., won't even fit in a jumbo frame -- assuming every device accepts jumbo frames) -- like a software update, CNC "program", etc.? Again: the protocol that is layered atop UDP has to include these provisions.
>> There is no sense of a "connection" between the source and destination >> beyond that of each *individual* datagram. Neither party is ever >> aware if the other party is "still there". (add keepalives if this is >> necessary) > > When the slave acknowledges the master request, this is easily > handled.
There needs to be a "NoOp" request -- something that can be sent that has no other effects besides exercising the link. Again: the protocol that is layered atop UDP has to include these provisions.
>> There is no mechanism to moderate traffic as it increases (and, >> those increases can lead to more dropped/lost datagrams which >> leads to more retransmission *requests*, which leads to more >> traffic which leads to... keep in mind any other traffic on >> your network that could worsen this -- or, be endangered by it!) > > The amount of traffic in a network controlling a real (mechanical) > device, is limited by the mechanical movement etc.) of that device, > not the network capacity.
You are assuming a single device is sitting on the network. And, that all messages involve "control". E.g., any status updates (polling) consume bandwidth as well. And, traffic that fits into "none of the above" (e.g., firmware updates... or, do you require the plant to be shut down when you do these?) You're also assuming it's 10/100BaseTX from start to finish with no lower bandwidth links along the way (or, virtual networks sharing *physical* networks.
> In old coaxial based 10Base2/5 half duplex networks, that might have > been an issue. For switch based (butt not hub based) 10xxxBaseT > networks, this is not really an issue, since a typical industrial > device do not need more than 10BaseT.
I designed an integrated "air handler" many years ago. It was easy to saturate a 10Base2 network controlling/monitoring just *one* such device. And, that's just "moving process air". I.e., you don't just say "turn on" and "turn off". Instead, you are querying sensors and controlling actuators to run the (sub)system in a particular way. It's foolish to think you're just going to tell a wire EDM machine: "here's your program. make me five of these." without also monitoring its progress, responding to alarms ("running low on wire"), etc. As with all resources, need grows to fit the resources available. Hence the appeal of moving up to a fatter pipe.
>> Appliances other than switches can effectively block UDP connections. >> If you ever intend to support a physically distributed domain >> that exceeds what you can achieve using "maximum cable lengths" >> (one of the drawbacks about moving away from "orange hose" and its >> ilk was the drop in maximum cable length), you have to be careful >> in considering what any *other* "interconnect appliances" do to >> the traffic you intend to pass (and, if your protocol will be routed!) > > I would not be that foolish to do direct machine control over
You are again assuming the only use for the network is "direct machine control". Do you want the service technician to have to drive across town (or, to another state/province) to *interrogate* a failing device? Do you want the engineering staff that have designed the part to be machined to have to FedEx a USB drive with the program for the wire EDM machine to the manufacturing site? Firmware updates to require "on site" installation? Or, do you want to develop yet another protocol for these activities and a gateway *product* that ties the "secured" manufacturing network to an external network? [There's nothing wrong with exposing networks -- if you've taken measures to *protect* them while exposed! Otherwise, what's the value of a WAN?]
> unpredictable nets such as the Internet or even less over some > wireless connections, such as WLANs, microwave or satellite links. All > the security issues must be handled with wired connection and in some > cases with certified LAN systems. > >> [It's surprising how *short* "100m" is when it comes to cable lengths! >> Esp in a manufacturing setting where you might have to go *up* a >> considerable way -- and, leave a suitable service loop -- before you >> can even begin to go "over"! And, line-of-sight cable routing may be >> impractical. For example, here (residential), the *average* length of >> a network cable is close to 70 feet -- despite the fact that the >> (2D) diagonal of the house is just about that same length *and* all >> the drops are centrally terminated!] > > 100 m is the distance for a 10xBaseT twisted pair cable limit, putting > switches in between will solve that.
Sure! Put a switch up in the metal rafters 20 ft above the manufacturing floor :> I've a friend who owns a *small* machine shop (mostly traditional Bridgeports, etc. but two or three wire EDM's). I suspect he would be hard pressed to cover the shop floor from a single switch -- probably 20m just to get up to the rafters and back down again.
> For real heavy industry copper > wiring is a no no issue and you have to use fibers anyway, with single > mode cable fibers with over 30 km range without optical repeaters.
Gee, a moment ago we were talking about CAN... now suddenly we're running optical fibre...
>> When someone later decides it should be a piece of cake for your >> "engineering office" to directly converse with your controllers located >> in the "manufacturing facility", you'll find yourself explaining >> why that's not easily accomplished. "Why not? I can talk to >> Google's servers in another *state*/country... (damn consultants >> always trying to charge extra for stuff they should have done in >> the first place!)" > > Isn't that a good thing that you can deny access from the outside > world to a critical controller ?
If the technology *supports* remote communication, you can *still* "deny access from the outside world to a critical controller"... by CUTTING THE CABLE in a physical or virtual sense. OTOH, if the technology *can't* get beyond your four walls, you can't just "stretch" the cable!
> If there is a specific need to let a qualified person from a remote > site access the device directly, build a secured VPN connection to the > PC controlling the devices and use all required firewall etc. methods > to restrict the access.
Another product. What if the remote site is elsewhere on the "campus"? Or, just on the other end of the building? E.g., most factories that I've been in have an "office space" at one end of the building with the factory floor "out back". E.g., one of the places I worked had all the engineering offices up front -- and the "factory" hiding behind a single door at the back of the office. Other buildings were within a mile of the main offices (most buildings were entire city blocks). The (old) Burr Brown campus, here, (now TI) is a similar layout -- you'd need a motorized cart to get around the facility but I'm sure they wouldn't want to have to use sneakernet to move files/data to/from the factory floor.
>> [N.B. Raw ethernet frames don't even give you the above (lack of) >> assurances :> ] > > This is a really good thing. >
On 2014-01-11, Don Y <this@isnotme.com> wrote:

> What if the message is VERY long (i.e., won't even fit in a jumbo > frame -- assuming every device accepts jumbo frames) -- like a > software update, CNC "program", etc.? > > Again: the protocol that is layered atop UDP has to include these > provisions.
UDP/IP handles fragmentation and reassembly of datgrams (messages) up to 64KB in length. While you have to deal with UDP datagrams that get dropped, you don't have to worry about fragmentation. -- Grant
Hi Grant,

On 1/11/2014 10:58 AM, Grant Edwards wrote:
> On 2014-01-11, Don Y<this@isnotme.com> wrote: > >> What if the message is VERY long (i.e., won't even fit in a jumbo >> frame -- assuming every device accepts jumbo frames) -- like a >> software update, CNC "program", etc.? >> >> Again: the protocol that is layered atop UDP has to include these >> provisions. > > UDP/IP handles fragmentation and reassembly of datgrams (messages) up > to 64KB in length. While you have to deal with UDP datagrams that get > dropped, you don't have to worry about fragmentation.
Yes -- the OP would have to make a *full* UDP implementation instead of "cheating" (if all your messages can be forced to fit in ~500 bytes, then you've just satisfied the low end for the requirements). Given that the OP claims memory to be a concern, its unlikely he's going to want to have a *big* datagram buffer *and* be willing to track (potentially) lots of fragments. Recall, my comments are geared towards pointing out issues that will affect the OP's design of *his* protocol, based on what he is willing to accept beneath it. E.g., adding sequence numbers to packets/messages; implementing timers so you know *when* you can reuse sequence numbers (which will obviously have constraints on their widths), etc. There is a reason TCP is so "heavy" -- it takes care of *lots* of these messy details for you! OTOH, Aleksander has expressly ruled it out. So, he has to be aware of all those little details and their analogs in *his* protocol.
Grant Edwards <invalid@invalid.invalid> writes:

> On 2014-01-11, upsidedown@downunder.com <upsidedown@downunder.com> wrote: > >> The cost of the vampire tap transceivers were so high that the first >> "Ethernet" network I designed and built was a network in a computer >> room between computers, using AUI cabling (with 15 bit connectors) >> through a DELNI "hub", later adding some long AUI cables for terminal >> servers at the other end of the office. Thus no coaxial cable used. > > The first place I worked where "Ethernet" was widely used, there was a > thick Ethernet backbone, but the vast majority of the wiring was AUI > cables and hubs.
We had "cheapernet"(?) where, when a single one of the dodgy homemade BNCs went flakey, the whole network went down. Fun times. -- John Devereux
On 1/11/2014 1:36 AM, upsidedown@downunder.com wrote:
> On Fri, 10 Jan 2014 19:15:28 +0000 (UTC), Aleksandar Kuktin > <akuktin@gmail.com> wrote: > >> On Wed, 08 Jan 2014 22:14:35 +0000, Grant Edwards wrote: > >>> If you can, I'd recommend using UDP (which is fairly low overhead). The >>> PC end can then be written as a normal user-space application that >>> doesn't require admin privledges. You'll still have problems with some >>> routers and NAT firewalls, but way fewer problems than trying to use raw >>> Ethernet. >> >> UDP/IP is just an extension of IP. I considered using raw IP, but decided >> against it on grounds that I didn't want to implement IP, simple as it >> may be. > > IP is different from raw Ethernet. > > IP requires handling the ARP issues, adding UDP to it, is just the > port number.
There's a fair bit of hand-waving in that statement! Dealing with raw ethernet frames means MAC addrs and little else. No concept of "networks"... just "my (MAC) address and your (MAC) address". Bring in IP and now you have to map IP to MAC (and vice versa). Another header (with its checksum, etc). The possibility of fragmentation. TTL. Protocol demultiplexing. Routing options. Timestamps. Etc. You can design an application where neither ARP nor RARP are *required*. But, in practice, you need to implement *both* (unless you hard-code IP addrsses in each node). [And, the obvious followup question: do you want to support ICMP? Or, move its features into *your* protocol??) UDP sits atop IP and "just" adds port numbers (src,dest) and an optional *datagram* checksum (IP doesn't checksum the payload).
> ARP issues require about a page of code and UDP even less.
A naive ARP implementation can be small (I think more than a page as there are two sides to ARP -- issuing and answering requests). But, if you are relying on IP addrs as an authentication mechanism, you probably want to think more carefully about just how naively you make such an implementation! E.g., OP claims memory constraints. So, probably don't want a sizeable ARP cache. OTOH, probably don't want to issue an ARP request for each message! In a synchronous protocol, you could harvest the (IP,MAC) from the inbound "request" and use it in the reply (elminating need for a cache and cache lookup). But, then each "message handler" has to cache this information for its message (unless you have a single threaded message handler). And, your network stack has to pass this information "up" to the application layer. The OP has to decide if comms are one-to-one or many-to-one (or many-to-many) to decide how many (IP,MAC) bindings need to be tracked -- and how best to do so depending on whether multiple messages can be "active" at any given time (i.e., can you initiate a command and wait for its completion while another message makes inquiries as to its progress?) In my recent projects, I have a separate (secure) protocol that carries (IP,MAC) bindings to/from nodes. One of the sanity tests I apply to packets is to verify these fields agree with the "authoritative" bindings that I've previously received. I.e., if you want to attack my network, the *first* thing you have to do is spoof a valid MAC address and its current IP binding. The presence of any "bogus" packets tells a node that the network has been compromised (or, is under attack). [There are many other such hurdles/sanity checks before you can work around my security/integrity] UDP can be small -- *if* it can rely on IP's services beneath it! Too often (in resource constrained implementations), IP and UDP get merged into a single "crippled" layer. This can work well *if* you make provisions for any "foreign traffic" that your "stack" (stackette? :> ) simply can't accommodate. (recall, if you don't have ICMP, then you have to sort out how you are going to handle these "exceptions" (which, in their eyes, *aren't* exceptions!)