EmbeddedRelated.com
Forums

Binary protocol design: TLV, LTV, or else?

Started by Aleksandar Kuktin January 8, 2014
Hi all.

I'm making a protocol for communication between a PC and a peripheral 
device. The protocol is expected to, at first, run on raw Ethernet but I 
am also supposed to not make any blunders that would make it impossible 
to later use the exact same protocol on things like IP and friends.

Since I saw these kinds of things in many Internet protocols (DNS, DHCP, 
TCP options, off the top of my head - but note that these may have a 
different order of fields), I have decided to make it an array of type-
length-value triplets encapsulated in the packet frame (no header). The 
commands would fill the "type" field, "length" would specify the length 
of data ("value") following the length field, and "value" would contain 
the data for the command.

But I would like to hear other (read: opposing) opinions. Particularly so 
since I am self-taught so there may be considerations obvious to 
graduated engineers that I am oblivious to.

BTW, the periphery that is on the other end is autonomous and rather 
intelligent, but very resource constrained. Really, resource 
constrainment of the periphery is my main problem here.


Some interesting questions:
Is ommiting a packet header a good idea? In the long run?

If I put a packet header, what do I put in it? Since addressing and error 
detection and "recovery" is supposed to be done by underlying protocols, 
the only thing I can think of putting into the header is the total-length 
field, and maybe, maybe, maybe a packet-id or transaction-id field. But I 
really don't need any of these.

My reasoning with packet-id and transaction-id (and protocol-version, 
really) is that I don't need them now, so I can omit them, and if I ever 
do need them, I can just add a command which implements them. In doing 
this, am I setting myself up for a very nasty problem in the future?

Is using flexible packets like this one (opposed the the contents of, 
say, IP header which has strictly defined fields) a good idea, or am I 
better off rigidifying my packets?

Is there a special prefference or reason as to why some protocols do TLV 
and others do LTV? (Note that I am not trying to ignite a holy war, I'm 
just asking.)

Is it good practice to require aligning the beggining of a TLV with a 
boundary, say 16-bit word boundary?
On 2014-01-08, Aleksandar Kuktin <akuktin@gmail.com> wrote:

> I'm making a protocol for communication between a PC and a peripheral > device. The protocol is expected to, at first, run on raw Ethernet
I've been supporting a protocol like that for many years. Doing raw Ethernet on Windows hosts is becoming increasingly problematic due to attempts by Microsoft to fix security issues. We anticipate it will soon no longer be feasible and we'll be forced to switch to UDP. I'm not the Windows guy, but as I understand it you'll have to write a Windows kernel-mode driver to support your protcol, and users will require admin privlidges. Even then you'll have problems with various firewall setups and anti-virus software. If the PC is running Linux, raw Ethernet isn't nearly as problematic as it is on Windows, but it does still require either root privledges or special security capabilities. If you can, I'd recommend using UDP (which is fairly low overhead). The PC end can then be written as a normal user-space application that doesn't require admin privledges. You'll still have problems with some routers and NAT firewalls, but way fewer problems than trying to use raw Ethernet. Using TCP will allow the easiest deployment, but TCP requires quite a bit more overhead than UDP. -- Grant Edwards grant.b.edwards Yow! HAIR TONICS, please!! at gmail.com
Hi Aleksander,

On 1/8/2014 2:30 PM, Aleksandar Kuktin wrote:

> I'm making a protocol for communication between a PC and a peripheral
Here there be dragons...
> device. The protocol is expected to, at first, run on raw Ethernet but I > am also supposed to not make any blunders that would make it impossible > to later use the exact same protocol on things like IP and friends. > > Since I saw these kinds of things in many Internet protocols (DNS, DHCP, > TCP options, off the top of my head - but note that these may have a > different order of fields), I have decided to make it an array of type- > length-value triplets encapsulated in the packet frame (no header). The > commands would fill the "type" field, "length" would specify the length > of data ("value") following the length field, and "value" would contain > the data for the command.
Are you sure you have enough variety to merit the extra overhead (in the packet *and* in the parsing of the packet)? Can you, instead, create a single packet format whose contents are indicated by a "packet type" specified in the header? Even if this means leaving space for values/parameters that might not be required in every packet type? For example: <header> <field1> <field2> <field3> <field4> Where certain fields may not be used in certain packet types (their contents then being "don't care"). Alternatively, a packet type that implicitly *defines* the format of the balance of the packet. For example: type1: <header1> <fieldA> <fieldB> type2: <header2> <fieldA> type3: <header3> <fieldA> <fieldB> <fieldC> <fieldD> (where the format of each field may vary significantly between message types) It seems like you are headed in the direction of: <header> <fields> where the number of fields can vary as can their individual formats.
> But I would like to hear other (read: opposing) opinions. Particularly so > since I am self-taught so there may be considerations obvious to > graduated engineers that I am oblivious to. > > BTW, the periphery that is on the other end is autonomous and rather > intelligent, but very resource constrained. Really, resource > constrainment of the periphery is my main problem here.
So, the less "thinking" (i.e., handling of variations) that the remote device has to handle, the better. Of course, this can be done in a variety of different ways! E.g., you could adopt a format where each field consists of: <parameterNumber> <parameterValue> and the receiving device can blindly parse the parameterNumber and plug the corresponding parameterValue into a "slot" in an array of parameters that your algorithms use. Alternatively, you could write a parser that expects an entire message to have a fixed format and plug the parameters it discovers into predefined locations in your app.
> Some interesting questions: > Is ommiting a packet header a good idea? In the long run?
Headers (and, where necessary, trailers) are intended to pass specific data (e.g., message type) in a way that is invariant of the content of the balance of the message. Like saying, "What follows is ...". They also help to improve reliability of the message as they can carry information that helps verify that integrity. E.g., a checksum. Or, simply the definition of "What follows is..." allows the recipient to perform some tests on that which follows! So, if you are claiming that "what follows is an email address", the recipient can expect <alphanumeric>@<domain>. Anything that doesn't fit this template suggests something is broken -- you are claiming this is an email address yet it doesn't conform to the template for an email address!
> If I put a packet header, what do I put in it? Since addressing and error > detection and "recovery" is supposed to be done by underlying protocols,
Will that ALWAYS be the case for you? What if you later decide to run your protocol over EIA232? Will you then require inserting another protocol *beneath* it to provide those guarantees? Will your underlying protocol guarantee that messages are delivered IN ORDER? *Always*? Do you expect the underlying protocol to guarantee delivery? At most once? At least once?
> the only thing I can think of putting into the header is the total-length > field, and maybe, maybe, maybe a packet-id or transaction-id field. But I > really don't need any of these. > > My reasoning with packet-id and transaction-id (and protocol-version, > really) is that I don't need them now, so I can omit them, and if I ever > do need them, I can just add a command which implements them. In doing > this, am I setting myself up for a very nasty problem in the future? > > Is using flexible packets like this one (opposed the the contents of, > say, IP header which has strictly defined fields) a good idea, or am I > better off rigidifying my packets?
That depends on what you expect in the future -- in terms of additions to the protocol as well as the conveyance by which your data gets to/from the device. Simpler tends to be better.
> Is there a special prefference or reason as to why some protocols do TLV > and others do LTV? (Note that I am not trying to ignite a holy war, I'm > just asking.) > > Is it good practice to require aligning the beggining of a TLV with a > boundary, say 16-bit word boundary?
Depends on how you are processing the byte stream. E.g., for ethernet, if you try to deal with any types bigger than single octets, you need to resolve byte ordering issues (so-called network byte order). If you design your protocol to deal exclusively with octets, then you can sidestep this (by specifying an explicit byte ordering) but then force the receiving (and sending) tasks to demangle/mangle the data types outof/into these forms.
On Wed, 08 Jan 2014 21:30:09 +0000, Aleksandar Kuktin wrote:

> Hi all. > > I'm making a protocol for communication between a PC and a peripheral > device. The protocol is expected to, at first, run on raw Ethernet but I > am also supposed to not make any blunders that would make it impossible > to later use the exact same protocol on things like IP and friends. > > Since I saw these kinds of things in many Internet protocols (DNS, DHCP, > TCP options, off the top of my head - but note that these may have a > different order of fields), I have decided to make it an array of type- > length-value triplets encapsulated in the packet frame (no header). The > commands would fill the "type" field, "length" would specify the length > of data ("value") following the length field, and "value" would contain > the data for the command. > > But I would like to hear other (read: opposing) opinions. Particularly > so since I am self-taught so there may be considerations obvious to > graduated engineers that I am oblivious to. > > BTW, the periphery that is on the other end is autonomous and rather > intelligent, but very resource constrained. Really, resource > constrainment of the periphery is my main problem here. > > > Some interesting questions: > Is ommiting a packet header a good idea? In the long run? > > If I put a packet header, what do I put in it? Since addressing and > error detection and "recovery" is supposed to be done by underlying > protocols, the only thing I can think of putting into the header is the > total-length field, and maybe, maybe, maybe a packet-id or > transaction-id field. But I really don't need any of these. > > My reasoning with packet-id and transaction-id (and protocol-version, > really) is that I don't need them now, so I can omit them, and if I ever > do need them, I can just add a command which implements them. In doing > this, am I setting myself up for a very nasty problem in the future? > > Is using flexible packets like this one (opposed the the contents of, > say, IP header which has strictly defined fields) a good idea, or am I > better off rigidifying my packets? > > Is there a special prefference or reason as to why some protocols do TLV > and others do LTV? (Note that I am not trying to ignite a holy war, I'm > just asking.) > > Is it good practice to require aligning the beggining of a TLV with a > boundary, say 16-bit word boundary?
Read the Radius protocol RFCs and how they deal with UDP. There is a boat load of parsing code out there in the various Radius server and client implementations. If you start with UDP you can even cob together a test system using many of the scripting languages like perl, python, ruby, etc. -- Chisolm Republic of Texas
Aleksandar Kuktin <akuktin@gmail.com> wrote in
news:lakg10$kri$1@speranza.aioe.org: 

> Hi all. > > I'm making a protocol for communication between a PC and a peripheral > device. The protocol is expected to, at first, run on raw Ethernet but > I am also supposed to not make any blunders that would make it > impossible to later use the exact same protocol on things like IP and > friends. > > Since I saw these kinds of things in many Internet protocols (DNS, > DHCP, TCP options, off the top of my head - but note that these may > have a different order of fields), I have decided to make it an array > of type- length-value triplets encapsulated in the packet frame (no > header). The commands would fill the "type" field, "length" would > specify the length of data ("value") following the length field, and > "value" would contain the data for the command. > > But I would like to hear other (read: opposing) opinions. Particularly > so since I am self-taught so there may be considerations obvious to > graduated engineers that I am oblivious to. > > BTW, the periphery that is on the other end is autonomous and rather > intelligent, but very resource constrained. Really, resource > constrainment of the periphery is my main problem here. > > > Some interesting questions: > Is ommiting a packet header a good idea? In the long run? > > If I put a packet header, what do I put in it? Since addressing and > error detection and "recovery" is supposed to be done by underlying > protocols, the only thing I can think of putting into the header is > the total-length field, and maybe, maybe, maybe a packet-id or > transaction-id field. But I really don't need any of these. > > My reasoning with packet-id and transaction-id (and protocol-version, > really) is that I don't need them now, so I can omit them, and if I > ever do need them, I can just add a command which implements them. In > doing this, am I setting myself up for a very nasty problem in the > future? > > Is using flexible packets like this one (opposed the the contents of, > say, IP header which has strictly defined fields) a good idea, or am I > better off rigidifying my packets? > > Is there a special prefference or reason as to why some protocols do > TLV and others do LTV? (Note that I am not trying to ignite a holy > war, I'm just asking.) > > Is it good practice to require aligning the beggining of a TLV with a > boundary, say 16-bit word boundary?
Hello, I originated a product that used TLV packets back in the 90s and it is still in use today without any problems. It was similar to a configuration file that contained various parameters for applications that shared data. There was a root packet header. This allowed transmission acros TCP, serial, queued pipes, and file storage. We enforced a 4-byte alignment on fields due to the machines being used to parse the data - we had Windows, linux, and embedded devices reading the data. Just be sure to define the byte order. We wrote and maintained an RFC like document. One rule we followed that may help you is that once a tag is defined it is never redefined. That prevented issues migrating forward and backward. Tags could be removed from use, but were always supported. One issue we had with TLV was with one of the developers taking shortcuts. The TLVs were built in a tree so any V started with a TL until you got to the lowest level item being communicated. Anyway the developer in question would read the T and presume they could bypass reading the lower level tags because the order was fixed - it was not. Upgraded protocols added fields (a low level TLV) that cause read issues. Easy to find but frustrating that we had to re-release one of the node devices. The only other error you are likely to get is due with TLVs like this is an issue if they entrire message isn't delivered. The follow on data becomes part of the previous message. That is why some encaptulation might be wise. If you are using UDP and there is no need for multiple packets per message (ever) that might be your encapsulation method. Good luck, David
On Wed, 8 Jan 2014 22:14:35 +0000 (UTC), Grant Edwards
<invalid@invalid.invalid> wrote:

>On 2014-01-08, Aleksandar Kuktin <akuktin@gmail.com> wrote: > >> I'm making a protocol for communication between a PC and a peripheral >> device. The protocol is expected to, at first, run on raw Ethernet > >I've been supporting a protocol like that for many years. Doing raw >Ethernet on Windows hosts is becoming increasingly problematic due to >attempts by Microsoft to fix security issues. We anticipate it will >soon no longer be feasible and we'll be forced to switch to UDP.
UDP adds very little compared to raw ethernet, some more or less stable header bytes and a small ARP protocol (much less than a page of code). There are a lot of tools to display the various IP and UDP headers and standard socket drivers should work OK. If you are using raw ethernet on a big host, you most likely would have to put the ethernet adapter into promiscuous mode, which might security / permission issue.
On Thursday, January 9, 2014 8:59:25 AM UTC+2, upsid...@downunder.com wrote:
> On Wed, 8 Jan 2014 22:14:35 +0000 (UTC), Grant Edwards > <invalid@invalid.invalid> wrote: > > >On 2014-01-08, Aleksandar Kuktin <akuktin@gmail.com> wrote: > > > >> I'm making a protocol for communication between a PC and a peripheral > >> device. The protocol is expected to, at first, run on raw Ethernet > > > >I've been supporting a protocol like that for many years. Doing raw > >Ethernet on Windows hosts is becoming increasingly problematic due to > >attempts by Microsoft to fix security issues. We anticipate it will > >soon no longer be feasible and we'll be forced to switch to UDP. > > UDP adds very little compared to raw ethernet, some more or less > stable header bytes and a small ARP protocol (much less than a page of > code). There are a lot of tools to display the various IP and UDP > headers and standard socket drivers should work OK.
I would also advocate using UDP rather than raw Ethernet. Implementing IP can be pretty simple if one does not intend (as in this case) connect the device to the internet, fragment/defragment out of order datagrams etc. UDP on top of that is almost negligible. I can't see which MCU will have an Ethernet MAC and lack the resources for such an "almost IP" implementation. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
Hi Dimiter,

On 1/9/2014 12:37 AM, dp wrote:
> On Thursday, January 9, 2014 8:59:25 AM UTC+2, upsid...@downunder.com wrote: >> On Wed, 8 Jan 2014 22:14:35 +0000 (UTC), Grant Edwards >> <invalid@invalid.invalid> wrote: >> >>> On 2014-01-08, Aleksandar Kuktin<akuktin@gmail.com> wrote: >>> >>>> I'm making a protocol for communication between a PC and a peripheral >>>> device. The protocol is expected to, at first, run on raw Ethernet >>> >>> I've been supporting a protocol like that for many years. Doing raw >>> Ethernet on Windows hosts is becoming increasingly problematic due to >>> attempts by Microsoft to fix security issues. We anticipate it will >>> soon no longer be feasible and we'll be forced to switch to UDP. >> >> UDP adds very little compared to raw ethernet, some more or less >> stable header bytes and a small ARP protocol (much less than a page of >> code). There are a lot of tools to display the various IP and UDP >> headers and standard socket drivers should work OK. > > I would also advocate using UDP rather than raw Ethernet. > Implementing IP can be pretty simple if one does not intend > (as in this case) connect the device to the internet, fragment/defragment > out of order datagrams etc. UDP on top of that is almost negligible. > I can't see which MCU will have an Ethernet MAC and lack the > resources for such an "almost IP" implementation.
UDP tends to hit the "sweet spot" between "bare iron" and the bloat of TCP/IP. The implementer has probably the most leeway in deciding what he *wants* to implement vs. what he *must* implement (once you climb up into TCP, most of the "options" go away). Having said that, the OP still has a fair number of decisions to make if he chooses to layer his protocol atop UDP. MTU, ARP/RARP implementation, checksum support (I'd advocate doing this in *his* protocol if he ever intends to run it over a leaner protocol where *he* has to provide this reliability), etc. I've (we've?) been assuming he can cram an entire message into a tiny "no-fragment" packet -- that may not be the case! (Or, may prove to be a problem when run over protocols with smaller MTU's)
On Thursday, January 9, 2014 10:28:21 AM UTC+2, Don Y wrote:
> ... > I've (we've?) been assuming he can cram an entire message into > a tiny "no-fragment" packet -- that may not be the case! (Or, > may prove to be a problem when run over protocols with smaller > MTU's)
Hi Don, UDP does not add any fragmentation overhead compared to his raw Ethernet anyway (that is, if he stays with UDP packets fitting in apr. 1500 bytes he will be no worse off than without UDP). IP does add fragmentation overhead - if it is a real IP. The sender may choose its MTU (likely a full size Ethernet packet) but a receiver must be ready to get that same fragmented in a few pieces and out of order and be able to defragment it. But since he is OK with raw Ethernet he does not need a true IP implementation so he can just do it as if everybody is fine with a fullsized ethernet MTU and get on with it as you suggest. Will lose a few bytes for encapsulation but if losing 100 bytes out of 1500 is an issue chances are there will be a lot of other, real problems :-). Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/sets/72157600228621276/
Hi Dimiter,

On 1/9/2014 1:53 AM, dp wrote:
> On Thursday, January 9, 2014 10:28:21 AM UTC+2, Don Y wrote: >> ... >> I've (we've?) been assuming he can cram an entire message into >> a tiny "no-fragment" packet -- that may not be the case! (Or, >> may prove to be a problem when run over protocols with smaller >> MTU's) > > UDP does not add any fragmentation overhead compared to his > raw Ethernet anyway (that is, if he stays with UDP packets > fitting in apr. 1500 bytes he will be no worse off than without > UDP).
I'm thinking more in terms of any other media (protocols) over which he may eventually use for transport. If he doesn't want to add support for packet reassembly in *his* protocol, then he would be wise to pick a message format that fits in the smallest MTU "imaginable". For ethernet, I think that is ~60+ octets (i.e., just bigger than the frame header). I'm a big fan of ~500 byte messages (the minimum that any node *must* be able to accommodate). I think you have to consider any other media that may get injected along the path from source to destination (i.e., if it is not purely "ethernet" from end to end. IIRC, a PPP link drops the MTU to the 200-300 range.
> IP does add fragmentation overhead - if it is a real IP. The sender > may choose its MTU (likely a full size Ethernet packet) but a > receiver must be ready to get that same fragmented in a few pieces > and out of order and be able to defragment it.
As above, I think if you truly want to avoid dealing with fragments, you have to be able to operate with an MTU that is little more than the header (plus 4? or 8?? octets). Even a ~500 byte message could, conceivably, appear as *100* little fragments! :-/ (and the receiving node had better be equipped to handle all 500 bytes as they trickle in!)
> But since he is OK with raw Ethernet he does not need a true IP > implementation so he can just do it as if everybody is fine with > a fullsized ethernet MTU and get on with it as you suggest. > Will lose a few bytes for encapsulation but if losing 100 bytes > out of 1500 is an issue chances are there will be a lot of other, > real problems :-).
OP hasn't really indicated how complex/big his messages need to be. Nor what the ultimate fabric might look like. E.g., here, I've tried really hard to keep messages *ultra* tiny by thinking about exactly what *needs* to fit in the message and how best to encode it. So, for example, I can build an ethernet-CAN bridge in a heartbeat and not have to worry about trading latency and responsiveness for packet size on the CAN bus (those nodes can have super tiny input buffers and still handle complete messages without having to worry about fragmentation, etc.) It must have been entertaining for the folks who came up with ethernet, IP, etc. way back when to start with a clean slate and *guess* as to what would work best! :>