EmbeddedRelated.com
Forums

Do you use serialization formats for communication?

Started by pozz October 19, 2016
Il 20/10/2016 12:30, kalvin.news@gmail.com ha scritto:
 > [...]
> The FRAME-PAYLOAD is a sequence on PAYLOAD-ITEMs: > > PAYLOAD-ITEM: > ITEM-ID > ITEM-LENGTH > ITEM-PAYLOAD. > > The ITEM-ID tells what the data item this is, the ITEM-LENGTH tells > the length of the item payload and the ITEM-PAYLOAD contains the byte > information for the item. > > Pretty simple message format and easy to parse. If the parser > doesn't recognize the item id, it knows how many bytes to skip for the next
> item. Yes, of course this *is* a "self-made" serialization format. There are many other that are similar, but introduce many other advantages. For example, consider MessagePack[1] (I don't work for MessagePack). You can encode many types of data. Your serialization format can be encoded as a MessagePack map: a sequence of key-value pairs (a dictionary in Python). As in your format, MessagePack encodes the key (your ITEM-ID), the value type (similar to your ITEM-LENGTH) and the value (your ITEM-PAYLOAD). The value type automatically specify the length and the object type (integer, boolean, an array...). For example, the dictionary { 3: 5 } (only one item, 3 as ID and 5 as value) is encoded as three bytes { 0x81, 0x03, 0x05 }. If the value is much higher, for example { 3: 1000000 }, the encoded stream is 7-bytes long { 0x81, 0x03, 0xCE, 0x00, 0x0F, 0x42, 0x40 }. The decoder automatically understand what is the key (type, length and value) and the value (type, length and value). If you want, you can use a string for the keys (all keys or some of them). Or you can encode an array instead of a map/dictionary. [1]http://msgpack.org/
On 20/10/16 18:06, pozz wrote:
> Il 20/10/2016 15:45, David Brown ha scritto: >> On 20/10/16 13:52, pozz wrote: >>> Il 20/10/2016 09:40, David Brown ha scritto: >>>> On 20/10/16 00:22, pozz wrote: >>>>> I often have the need to exchange some data between two or more >>>>> MCUs. I >>>>> usually use I2C or UART as physical layers. >>>>> >>>>> Normally I design a simple protocol between the MCUs: one framing >>>>> mechanism (Start Of Frame, End Of Frame), one integrity check >>>>> mechanism >>>>> (CRC), and so on. >>>>> >>>>> The payload is statically defined between the two MCUs: >>>>> - first byte is the version >>>>> - second byte is the voltage monitoring level >>>>> - third and fourt bytes are some flags >>>>> - ... and so on >>>>> >>>>> As you can understand, both MCUs *must* know and agree about that >>>>> protocol format. However during the lifetime of the product, I need to >>>>> add some functionality or fix some bugs and those activites can >>>>> lead to >>>>> a review of the protocol format (maybe i need two bytes for the >>>>> voltage >>>>> level). Sometime, the two MCUs have a different version with a >>>>> different >>>>> protocol format implementation. In order to avoid protocol >>>>> incompatibility, they all knows about the protocol formats used >>>>> before, >>>>> so they can adapt the parsing function to the real current protocol >>>>> format. >>>>> As you can understand, it could be a trouble. >>>>> >>>>> So I'm thinking to use a "self-descriptive" serializer protocol >>>>> format, >>>>> such as Protobuf, Message Pack, BSON and so on. >>>>> >>>>> Do you use one serialization format? Which one? >>>>> >>>>> Of course, it should be simple to implement (in transmission/encoding >>>>> and reception/decoding) in a small embedded MCU in C language, without >>>>> dynamic memory support. >>>> >>>> It depends on how flexible you want to be. Self-descriptive or tagged >>>> formats, like JSON, BSON, etc., are very future-proof - but they are >>>> also much more effort in development time and run time. >>>> >>>> You can come a /long/ way with just a little more than the system you >>>> have. Keep the same framing mechanism, but make sure you have a field >>>> for "length of payload". In the payload, you have "type of telegram" >>>> and "version of telegram format". Then when you need to change the >>>> formats, you add new data to the old structure. >>>> >>>> So format version 1 might be: >>>> >>>> typedef struct { >>>> uint8_t programVersion; >>>> uint8_t voltageMonitor; >>>> uint16_t flags; >>>> } format1payload; >>>> static_assert(sizeof(format1payload) == 4); >>>> >>>> Format version 2, with voltage now in millivolts, will be: >>>> >>>> typedef struct { >>>> uint8_t programVersion; >>>> uint8_t voltageMonitor; >>>> uint16_t flags; >>>> // Start of version 2 >>>> uint16_t voltageMonitorMillivolts; >>>> } format2payload; >>>> static_assert(sizeof(format2payload) == 6); >>>> >>>> A transmitter always sends with the latest version it knows, and will >>>> fill in both the voltageMonitor and voltageMonitorMillivolts fields. A >>>> receiver interprets as much as it can based on the latest version it >>>> knows and the version it receives - any excess data beyond its >>>> understanding can safely be ignored. >>>> >>>> Your encoder and decoders are now nothing more than casts between char* >>>> pointers and struct pointers. >>> >>> So you use cast your struct pointers to char pointers and send it as is? >>> I used this very simple technique in the past, but I don't use it >>> anymore. Because the two MCUs could be different, could use a different >>> endianness, could use a different compiler that places padding in >>> different places, and so on. >>> >> >> It is not a problem if the MCUs are different. It would matter if they >> had different encodings for signed integers or padding bits in their >> types, but let's assume you are not communicating with a mainframe from >> the 60's. >> >> Padding is not a problem if you design your structs carefully. Make >> sure everything is naturally aligned - 16-bit data is 16-bit aligned, >> 32-bit data is 32-bit aligned, 64-bit data is 64-bit aligned. Use your >> tools to check this - "-Wpadded" for gcc, and static_asserts to check >> that the sizes of your structs match what you expect. >> >> That just leaves endianness. Most microcontrollers are little-endian, >> as are PC's, so that is the endianness I normally use. The only >> exception would be if I were transferring data between two big-endian >> devices, I would probably use big-endian ordering. >> >> So if I have a networked system with different endians on different >> microcontrollers, then I need to do endian swaps on the structs at one >> end. Some compilers support this, letting you annotate your structs >> with the endianness (gcc 6 has this, though I haven't tried the feature >> yet). Otherwise it must be done manually when receiving or transmitting >> the struct. But still, it is a fraction of the effort (in development >> time and run time) of decoding more general protocol formats. > > I knew all your arguments. As I wrote, I used in the past exactly this > trick. However I don't like it. In certain cases, you have to change the > order of the fields in a struct (an order that appears logical), only > because you have to avoid padding bytes. > > Moreover, if you need to encode some complex structs, understanding if > the compiler will introduce padding in-between is not trivial. > > send(&struct1, sizeof(struct1)); > send(&struct2, sizeof(struct2)); > > sizeof(struct1) could consider some extra padding bytes at the end of > the struct. The receiver should know about it. > > One time I had to communicate with a Visual Basic application. In that > case, managin padding bytes was a mess. >
It really is not hard at all. /No/ compiler, for any sane processor, adds padding or extra alignment requirements beyond the natural size of the fundamental types. You only have to be concerned with padding if you try to mix and match in other ways. And if you have a "uint8_t" field which should logically be followed by another field that happens to be "uint16_t", just add an explicit "uint8_t" padding field. Don't let the compiler add its own padding - use compiler warnings where possible to ensure it, and use static assertions to confirm that everything is correct.
On Thu, 20 Oct 2016 23:39:24 +0200, David Brown
<david.brown@hesbynett.no> wrote:

> ... /No/ compiler, for any sane processor, >adds padding or extra alignment requirements beyond the natural size of >the fundamental types.
There are plenty of chips (and compilers for them) that are not sane by your definition. The many "word-oriented" chips come to mind ... George
On 21/10/16 03:01, George Neuner wrote:
> On Thu, 20 Oct 2016 23:39:24 +0200, David Brown > <david.brown@hesbynett.no> wrote: > >> ... /No/ compiler, for any sane processor, >> adds padding or extra alignment requirements beyond the natural size of >> the fundamental types. > > There are plenty of chips (and compilers for them) that are not sane > by your definition. The many "word-oriented" chips come to mind ... >
You are thinking of things like the TMS320F dsps (16-bit char) or the SHARC (32-bit char) ? First off, these are not MCU's, and are unlikely (not impossible, but unlikely) to be the kind of chip involved in this sort of communication. You pick your solution based on what is practical for real-life cases - not on what is necessary for the most awkward situations that you perhaps might meet. Secondly, these chips and their tools also do not add any padding or alignment requirements beyond the natural size of their fundamental types - they are perfectly "sane" in this sense. The difference is that they do not have types uint8_t or int8_t (and perhaps not the 16-bit types if they have 32-bit chars). If your structs have 8-bit fields, then these won't compile directly. But it is not a big problem - after all, since you have explicitly added any padding needed to keep alignment for any bigger fields, you can always group your 8-bit fields in pairs (or make groups of 4 bytes if you have 32-bit chars). The most you might have to do is add a few extra explicit padding bytes at the end of the struct. So with 16-bit char and 32-bit char architectures dealt with, are there any other problem or "non-sane" devices that come to mind? I know there are a few 24-bit architectures (eTPU, and some audio DSP's) - such devices are so specialised that you would make a solution specifically for those chips if you need them. Over the years, I have worked with quite a range of microcontrollers - but there are vast numbers out there that I have never heard of, never mind used. So if you have examples of awkward (or "insane"!) devices, I would like to hear of them - even if I don't use them it is interesting to think about how the challenges they would pose.
Il 20/10/2016 23:39, David Brown ha scritto:
> On 20/10/16 18:06, pozz wrote: >> Il 20/10/2016 15:45, David Brown ha scritto: >>> On 20/10/16 13:52, pozz wrote: >>>> Il 20/10/2016 09:40, David Brown ha scritto: >>>>> On 20/10/16 00:22, pozz wrote: >>>>>> I often have the need to exchange some data between two or more >>>>>> MCUs. I >>>>>> usually use I2C or UART as physical layers. >>>>>> >>>>>> Normally I design a simple protocol between the MCUs: one framing >>>>>> mechanism (Start Of Frame, End Of Frame), one integrity check >>>>>> mechanism >>>>>> (CRC), and so on. >>>>>> >>>>>> The payload is statically defined between the two MCUs: >>>>>> - first byte is the version >>>>>> - second byte is the voltage monitoring level >>>>>> - third and fourt bytes are some flags >>>>>> - ... and so on >>>>>> >>>>>> As you can understand, both MCUs *must* know and agree about that >>>>>> protocol format. However during the lifetime of the product, I >>>>>> need to >>>>>> add some functionality or fix some bugs and those activites can >>>>>> lead to >>>>>> a review of the protocol format (maybe i need two bytes for the >>>>>> voltage >>>>>> level). Sometime, the two MCUs have a different version with a >>>>>> different >>>>>> protocol format implementation. In order to avoid protocol >>>>>> incompatibility, they all knows about the protocol formats used >>>>>> before, >>>>>> so they can adapt the parsing function to the real current protocol >>>>>> format. >>>>>> As you can understand, it could be a trouble. >>>>>> >>>>>> So I'm thinking to use a "self-descriptive" serializer protocol >>>>>> format, >>>>>> such as Protobuf, Message Pack, BSON and so on. >>>>>> >>>>>> Do you use one serialization format? Which one? >>>>>> >>>>>> Of course, it should be simple to implement (in transmission/encoding >>>>>> and reception/decoding) in a small embedded MCU in C language, >>>>>> without >>>>>> dynamic memory support. >>>>> >>>>> It depends on how flexible you want to be. Self-descriptive or tagged >>>>> formats, like JSON, BSON, etc., are very future-proof - but they are >>>>> also much more effort in development time and run time. >>>>> >>>>> You can come a /long/ way with just a little more than the system you >>>>> have. Keep the same framing mechanism, but make sure you have a field >>>>> for "length of payload". In the payload, you have "type of telegram" >>>>> and "version of telegram format". Then when you need to change the >>>>> formats, you add new data to the old structure. >>>>> >>>>> So format version 1 might be: >>>>> >>>>> typedef struct { >>>>> uint8_t programVersion; >>>>> uint8_t voltageMonitor; >>>>> uint16_t flags; >>>>> } format1payload; >>>>> static_assert(sizeof(format1payload) == 4); >>>>> >>>>> Format version 2, with voltage now in millivolts, will be: >>>>> >>>>> typedef struct { >>>>> uint8_t programVersion; >>>>> uint8_t voltageMonitor; >>>>> uint16_t flags; >>>>> // Start of version 2 >>>>> uint16_t voltageMonitorMillivolts; >>>>> } format2payload; >>>>> static_assert(sizeof(format2payload) == 6); >>>>> >>>>> A transmitter always sends with the latest version it knows, and will >>>>> fill in both the voltageMonitor and voltageMonitorMillivolts >>>>> fields. A >>>>> receiver interprets as much as it can based on the latest version it >>>>> knows and the version it receives - any excess data beyond its >>>>> understanding can safely be ignored. >>>>> >>>>> Your encoder and decoders are now nothing more than casts between >>>>> char* >>>>> pointers and struct pointers. >>>> >>>> So you use cast your struct pointers to char pointers and send it as >>>> is? >>>> I used this very simple technique in the past, but I don't use it >>>> anymore. Because the two MCUs could be different, could use a >>>> different >>>> endianness, could use a different compiler that places padding in >>>> different places, and so on. >>>> >>> >>> It is not a problem if the MCUs are different. It would matter if they >>> had different encodings for signed integers or padding bits in their >>> types, but let's assume you are not communicating with a mainframe from >>> the 60's. >>> >>> Padding is not a problem if you design your structs carefully. Make >>> sure everything is naturally aligned - 16-bit data is 16-bit aligned, >>> 32-bit data is 32-bit aligned, 64-bit data is 64-bit aligned. Use your >>> tools to check this - "-Wpadded" for gcc, and static_asserts to check >>> that the sizes of your structs match what you expect. >>> >>> That just leaves endianness. Most microcontrollers are little-endian, >>> as are PC's, so that is the endianness I normally use. The only >>> exception would be if I were transferring data between two big-endian >>> devices, I would probably use big-endian ordering. >>> >>> So if I have a networked system with different endians on different >>> microcontrollers, then I need to do endian swaps on the structs at one >>> end. Some compilers support this, letting you annotate your structs >>> with the endianness (gcc 6 has this, though I haven't tried the feature >>> yet). Otherwise it must be done manually when receiving or transmitting >>> the struct. But still, it is a fraction of the effort (in development >>> time and run time) of decoding more general protocol formats. >> >> I knew all your arguments. As I wrote, I used in the past exactly this >> trick. However I don't like it. In certain cases, you have to change the >> order of the fields in a struct (an order that appears logical), only >> because you have to avoid padding bytes. >> >> Moreover, if you need to encode some complex structs, understanding if >> the compiler will introduce padding in-between is not trivial. >> >> send(&struct1, sizeof(struct1)); >> send(&struct2, sizeof(struct2)); >> >> sizeof(struct1) could consider some extra padding bytes at the end of >> the struct. The receiver should know about it. >> >> One time I had to communicate with a Visual Basic application. In that >> case, managin padding bytes was a mess. >> > > It really is not hard at all. /No/ compiler, for any sane processor, > adds padding or extra alignment requirements beyond the natural size of > the fundamental types. You only have to be concerned with padding if > you try to mix and match in other ways. And if you have a "uint8_t" > field which should logically be followed by another field that happens > to be "uint16_t", just add an explicit "uint8_t" padding field. Don't > let the compiler add its own padding - use compiler warnings where > possible to ensure it, and use static assertions to confirm that > everything is correct.
Oh yes, it isn't hard, but could be error-prone and isn't versatile. I stopped using this method when I had the need to replace one MCU with a PC running an application written in a high-level different language (Visual Basic, Python, ...) At first they were two small MCUs so communicating "raw structs" was sufficient. With high-level languages isn't so straightforward. You have to explicitly consider padding. When the structs are long, are nested or there are arrays, it's not so simple. Even staying in the MCU world, you need to write something similar: void send_frame(void) { struct { uint8_t id; uint8_t padding1; uint16_t salary; char name[9]; uint8_t padding2; } frame_data[2]; static_assert(sizeof(frame_data[0]) == 14); frame_data[0].id = get_id(1); frame_data[0].salary = get_salary(1); strcpy(frame_data.[0]name, get_name(1)); frame_data[1].id = get_id(2); frame_data[1].salary = get_salary(2); strcpy(frame_data.[1]name, get_name(2)); uart_send((uint8_t *)frame_data, sizeof(frame_data)); } If you use a serialization format, the code is not so different, but you earn some points in versatility: void send_frame(void) { uint8_t frame_data[32]; size_t i = 0; i += serialize_u8(&frame_data[i], get_id(1)); i += serialize_u16(&frame_data[i], get_salary(1)); i + = serialize_str(&frame_data[i], get_name(1)); i += serialize_u8(&frame_data[i], get_id(2)); i += serialize_u16(&frame_data[i], get_salary(2)); i + = serialize_str(&frame_data[i], get_name(2)); uart_send(frame_data, i); }
On Fri, 21 Oct 2016 08:56:07 +0200, David Brown
<david.brown@hesbynett.no> wrote:

>On 21/10/16 03:01, George Neuner wrote: >> On Thu, 20 Oct 2016 23:39:24 +0200, David Brown >> <david.brown@hesbynett.no> wrote: >> >>> ... /No/ compiler, for any sane processor, >>> adds padding or extra alignment requirements beyond the natural size of >>> the fundamental types. >> >> There are plenty of chips (and compilers for them) that are not sane >> by your definition. The many "word-oriented" chips come to mind ... > >You are thinking of things like the TMS320F dsps (16-bit char) or the >SHARC (32-bit char) ? First off, these are not MCU's, and are unlikely >(not impossible, but unlikely) to be the kind of chip involved in this >sort of communication. You pick your solution based on what is >practical for real-life cases - not on what is necessary for the most >awkward situations that you perhaps might meet.
You're wrong if you think DSPs don't get used as MCUs. DSPs (relatively) are expensive, so a system that really needs a DSP in the first place will tend to use it as the main processor rather than as a peripheral to something else. There are OS kernels and communication stacks available for many DSP families that encourage such extended use.
>Secondly, these chips and their tools also do not add any padding or >alignment requirements beyond the natural size of their fundamental >types - they are perfectly "sane" in this sense.
Yes, but ...
>The difference is that >they do not have types uint8_t or int8_t (and perhaps not the 16-bit >types if they have 32-bit chars). If your structs have 8-bit fields, >then these won't compile directly.
Many floating point DSPs don't have IEEE-754 compatible types. Many trade range for precision in their basic "single-precision" type, and some also have extended precision types with odd lengths. Binary transfers to/from other systems require [sometimes non-trivial] data conversion. There also are many DSPs that support odd length integer types that require care when/if transferring between systems.
>But it is not a big problem - after >all, since you have explicitly added any padding needed to keep >alignment for any bigger fields, you can always group your 8-bit fields >in pairs (or make groups of 4 bytes if you have 32-bit chars). The most >you might have to do is add a few extra explicit padding bytes at the >end of the struct.
There are simple workarounds for most data except non-IEEE floating point types. George.
Il 20/10/2016 23:39, David Brown ha scritto:
 > On 20/10/16 18:06, pozz wrote:
 >> Il 20/10/2016 15:45, David Brown ha scritto:
 >>> On 20/10/16 13:52, pozz wrote:
 >>>> Il 20/10/2016 09:40, David Brown ha scritto:
 >>>>> On 20/10/16 00:22, pozz wrote:
 >>>>>> I often have the need to exchange some data between two or more
 >>>>>> MCUs. I
 >>>>>> usually use I2C or UART as physical layers.
 >>>>>>
 >>>>>> Normally I design a simple protocol between the MCUs: one framing
 >>>>>> mechanism (Start Of Frame, End Of Frame), one integrity check
 >>>>>> mechanism
 >>>>>> (CRC), and so on.
 >>>>>>
 >>>>>> The payload is statically defined between the two MCUs:
 >>>>>> - first byte is the version
 >>>>>> - second byte is the voltage monitoring level
 >>>>>> - third and fourt bytes are some flags
 >>>>>> - ... and so on
 >>>>>>
 >>>>>> As you can understand, both MCUs *must* know and agree about that
 >>>>>> protocol format. However during the lifetime of the product, I
 >>>>>> need to
 >>>>>> add some functionality or fix some bugs and those activites can
 >>>>>> lead to
 >>>>>> a review of the protocol format (maybe i need two bytes for the
 >>>>>> voltage
 >>>>>> level). Sometime, the two MCUs have a different version with a
 >>>>>> different
 >>>>>> protocol format implementation. In order to avoid protocol
 >>>>>> incompatibility, they all knows about the protocol formats used
 >>>>>> before,
 >>>>>> so they can adapt the parsing function to the real current protocol
 >>>>>> format.
 >>>>>> As you can understand, it could be a trouble.
 >>>>>>
 >>>>>> So I'm thinking to use a "self-descriptive" serializer protocol
 >>>>>> format,
 >>>>>> such as Protobuf, Message Pack, BSON and so on.
 >>>>>>
 >>>>>> Do you use one serialization format? Which one?
 >>>>>>
 >>>>>> Of course, it should be simple to implement (in 
transmission/encoding
 >>>>>> and reception/decoding) in a small embedded MCU in C language,
 >>>>>> without
 >>>>>> dynamic memory support.
 >>>>>
 >>>>> It depends on how flexible you want to be.  Self-descriptive or 
tagged
 >>>>> formats, like JSON, BSON, etc., are very future-proof - but they are
 >>>>> also much more effort in development time and run time.
 >>>>>
 >>>>> You can come a /long/ way with just a little more than the system you
 >>>>> have.  Keep the same framing mechanism, but make sure you have a 
field
 >>>>> for "length of payload".  In the payload, you have "type of telegram"
 >>>>> and "version of telegram format".  Then when you need to change the
 >>>>> formats, you add new data to the old structure.
 >>>>>
 >>>>> So format version 1 might be:
 >>>>>
 >>>>> typedef struct {
 >>>>>     uint8_t programVersion;
 >>>>>     uint8_t voltageMonitor;
 >>>>>     uint16_t flags;
 >>>>> } format1payload;
 >>>>> static_assert(sizeof(format1payload) == 4);
 >>>>>
 >>>>> Format version 2, with voltage now in millivolts, will be:
 >>>>>
 >>>>> typedef struct {
 >>>>>     uint8_t programVersion;
 >>>>>     uint8_t voltageMonitor;
 >>>>>     uint16_t flags;
 >>>>>     // Start of version 2
 >>>>>     uint16_t voltageMonitorMillivolts;
 >>>>> } format2payload;
 >>>>> static_assert(sizeof(format2payload) == 6);
 >>>>>
 >>>>> A transmitter always sends with the latest version it knows, and will
 >>>>> fill in both the voltageMonitor and voltageMonitorMillivolts
 >>>>> fields.  A
 >>>>> receiver interprets as much as it can based on the latest version it
 >>>>> knows and the version it receives - any excess data beyond its
 >>>>> understanding can safely be ignored.
 >>>>>
 >>>>> Your encoder and decoders are now nothing more than casts between
 >>>>> char*
 >>>>> pointers and struct pointers.
 >>>>
 >>>> So you use cast your struct pointers to char pointers and send it as
 >>>> is?
 >>>> I used this very simple technique in the past, but I don't use it
 >>>> anymore.  Because the two MCUs could be different, could use a
 >>>> different
 >>>> endianness, could use a different compiler that places padding in
 >>>> different places, and so on.
 >>>>
 >>>
 >>> It is not a problem if the MCUs are different.  It would matter if they
 >>> had different encodings for signed integers or padding bits in their
 >>> types, but let's assume you are not communicating with a mainframe from
 >>> the 60's.
 >>>
 >>> Padding is not a problem if you design your structs carefully.  Make
 >>> sure everything is naturally aligned - 16-bit data is 16-bit aligned,
 >>> 32-bit data is 32-bit aligned, 64-bit data is 64-bit aligned.  Use your
 >>> tools to check this - "-Wpadded" for gcc, and static_asserts to check
 >>> that the sizes of your structs match what you expect.
 >>>
 >>> That just leaves endianness.  Most microcontrollers are little-endian,
 >>> as are PC's, so that is the endianness I normally use.  The only
 >>> exception would be if I were transferring data between two big-endian
 >>> devices, I would probably use big-endian ordering.
 >>>
 >>> So if I have a networked system with different endians on different
 >>> microcontrollers, then I need to do endian swaps on the structs at one
 >>> end.  Some compilers support this, letting you annotate your structs
 >>> with the endianness (gcc 6 has this, though I haven't tried the feature
 >>> yet).  Otherwise it must be done manually when receiving or 
transmitting
 >>> the struct.  But still, it is a fraction of the effort (in development
 >>> time and run time) of decoding more general protocol formats.
 >>
 >> I knew all your arguments. As I wrote, I used in the past exactly this
 >> trick. However I don't like it. In certain cases, you have to change the
 >> order of the fields in a struct (an order that appears logical), only
 >> because you have to avoid padding bytes.
 >>
 >> Moreover, if you need to encode some complex structs, understanding if
 >> the compiler will introduce padding in-between is not trivial.
 >>
 >> send(&struct1, sizeof(struct1));
 >> send(&struct2, sizeof(struct2));
 >>
 >> sizeof(struct1) could consider some extra padding bytes at the end of
 >> the struct. The receiver should know about it.
 >>
 >> One time I had to communicate with a Visual Basic application. In that
 >> case, managin padding bytes was a mess.
 >>
 >
 > It really is not hard at all.  /No/ compiler, for any sane processor,
 > adds padding or extra alignment requirements beyond the natural size of
 > the fundamental types.  You only have to be concerned with padding if
 > you try to mix and match in other ways.  And if you have a "uint8_t"
 > field which should logically be followed by another field that happens
 > to be "uint16_t", just add an explicit "uint8_t" padding field.  Don't
 > let the compiler add its own padding - use compiler warnings where
 > possible to ensure it, and use static assertions to confirm that
 > everything is correct.

There is another issue that can happen when you use "casting" approach.

Over the wire they are all bytes, but you know a block of bytes are a C 
struct.  When they are bytes, you can use memcpy() and similar 
functions, but they don't guarantee your struct remains aligned.

In this case, the cast could fail and this may depend on the processor.

I had an experience of this kind of problem when I ported some code from 
one MCU where not-aligned access was possible (with additional clock 
ticks) to another MCU that didn't let the not-aligned access.  The code 
that worked on the first MCU, didn't work on the second.
I used cast approach and this was the reason of failure.


On 23/10/16 13:05, George Neuner wrote:
> On Fri, 21 Oct 2016 08:56:07 +0200, David Brown > <david.brown@hesbynett.no> wrote: > >> On 21/10/16 03:01, George Neuner wrote: >>> On Thu, 20 Oct 2016 23:39:24 +0200, David Brown >>> <david.brown@hesbynett.no> wrote: >>> >>>> ... /No/ compiler, for any sane processor, >>>> adds padding or extra alignment requirements beyond the natural size of >>>> the fundamental types. >>> >>> There are plenty of chips (and compilers for them) that are not sane >>> by your definition. The many "word-oriented" chips come to mind ... >> >> You are thinking of things like the TMS320F dsps (16-bit char) or the >> SHARC (32-bit char) ? First off, these are not MCU's, and are unlikely >> (not impossible, but unlikely) to be the kind of chip involved in this >> sort of communication. You pick your solution based on what is >> practical for real-life cases - not on what is necessary for the most >> awkward situations that you perhaps might meet. > > You're wrong if you think DSPs don't get used as MCUs. > > DSPs (relatively) are expensive, so a system that really needs a DSP > in the first place will tend to use it as the main processor rather > than as a peripheral to something else. There are OS kernels and > communication stacks available for many DSP families that encourage > such extended use.
DSP's get used for some MCU uses, but they are a relatively minor player. In the solid majority of cases of intercommunication between two devices on a board or two boards in a system, they will be devices with 8-bit chars. And as noted below, it is quite possible to use the same technique for 16-bit and 32-bit char architectures. (Also note that in high-end DSPs, there is a trend of including a "normal" core such as an M3/M4 along side the DSP core, so that you can let the DSP concentrate on the stuff it is good at, and let the MCU do the stuff the DSP core is bad at. I haven't used such devices myself, merely heard this from distributors.)
> > >> Secondly, these chips and their tools also do not add any padding or >> alignment requirements beyond the natural size of their fundamental >> types - they are perfectly "sane" in this sense. > > Yes, but ... > >> The difference is that >> they do not have types uint8_t or int8_t (and perhaps not the 16-bit >> types if they have 32-bit chars). If your structs have 8-bit fields, >> then these won't compile directly. > > Many floating point DSPs don't have IEEE-754 compatible types. Many > trade range for precision in their basic "single-precision" type, and > some also have extended precision types with odd lengths. Binary > transfers to/from other systems require [sometimes non-trivial] data > conversion.
Clearly if you are going to use odd, non-standard floating point formats then you can only use binary protocols to communicate between devices that also support these weird formats. If you want to communicate with something else, you have to convert to standard formats and/or ASCII formats.
> > There also are many DSPs that support odd length integer types that > require care when/if transferring between systems.
There are such devices, yes. The question is, are they common enough to be relevant? The challenge is not to find a communication format that will work for /everything/, programmed in every conceivable language, and running on every conceivable device past, present and future. The challenge is to find a method of communicating that is easy to develop, efficient at run-time, has efficient bandwidth usage, is flexible and expandable, and works on the solid majority of realistic systems. It's okay to say you need something different if you are working with that 12-bit DSP from the dark ages. Optimise for the common case, with an understanding of any limitations that might have - don't worry about devices that are not relevant.
> > >> But it is not a big problem - after >> all, since you have explicitly added any padding needed to keep >> alignment for any bigger fields, you can always group your 8-bit fields >> in pairs (or make groups of 4 bytes if you have 32-bit chars). The most >> you might have to do is add a few extra explicit padding bytes at the >> end of the struct. > > There are simple workarounds for most data except non-IEEE floating > point types. > > George. >
On 23/10/16 16:07, pozz wrote:
> Il 20/10/2016 23:39, David Brown ha scritto: > > On 20/10/16 18:06, pozz wrote: > >> Il 20/10/2016 15:45, David Brown ha scritto: > >>> On 20/10/16 13:52, pozz wrote: > >>>> Il 20/10/2016 09:40, David Brown ha scritto: > >>>>> On 20/10/16 00:22, pozz wrote: > >>>>>> I often have the need to exchange some data between two or more > >>>>>> MCUs. I > >>>>>> usually use I2C or UART as physical layers. > >>>>>> > >>>>>> Normally I design a simple protocol between the MCUs: one framing > >>>>>> mechanism (Start Of Frame, End Of Frame), one integrity check > >>>>>> mechanism > >>>>>> (CRC), and so on. > >>>>>> > >>>>>> The payload is statically defined between the two MCUs: > >>>>>> - first byte is the version > >>>>>> - second byte is the voltage monitoring level > >>>>>> - third and fourt bytes are some flags > >>>>>> - ... and so on > >>>>>> > >>>>>> As you can understand, both MCUs *must* know and agree about that > >>>>>> protocol format. However during the lifetime of the product, I > >>>>>> need to > >>>>>> add some functionality or fix some bugs and those activites can > >>>>>> lead to > >>>>>> a review of the protocol format (maybe i need two bytes for the > >>>>>> voltage > >>>>>> level). Sometime, the two MCUs have a different version with a > >>>>>> different > >>>>>> protocol format implementation. In order to avoid protocol > >>>>>> incompatibility, they all knows about the protocol formats used > >>>>>> before, > >>>>>> so they can adapt the parsing function to the real current protocol > >>>>>> format. > >>>>>> As you can understand, it could be a trouble. > >>>>>> > >>>>>> So I'm thinking to use a "self-descriptive" serializer protocol > >>>>>> format, > >>>>>> such as Protobuf, Message Pack, BSON and so on. > >>>>>> > >>>>>> Do you use one serialization format? Which one? > >>>>>> > >>>>>> Of course, it should be simple to implement (in > transmission/encoding > >>>>>> and reception/decoding) in a small embedded MCU in C language, > >>>>>> without > >>>>>> dynamic memory support. > >>>>> > >>>>> It depends on how flexible you want to be. Self-descriptive or > tagged > >>>>> formats, like JSON, BSON, etc., are very future-proof - but they are > >>>>> also much more effort in development time and run time. > >>>>> > >>>>> You can come a /long/ way with just a little more than the system > you > >>>>> have. Keep the same framing mechanism, but make sure you have a > field > >>>>> for "length of payload". In the payload, you have "type of > telegram" > >>>>> and "version of telegram format". Then when you need to change the > >>>>> formats, you add new data to the old structure. > >>>>> > >>>>> So format version 1 might be: > >>>>> > >>>>> typedef struct { > >>>>> uint8_t programVersion; > >>>>> uint8_t voltageMonitor; > >>>>> uint16_t flags; > >>>>> } format1payload; > >>>>> static_assert(sizeof(format1payload) == 4); > >>>>> > >>>>> Format version 2, with voltage now in millivolts, will be: > >>>>> > >>>>> typedef struct { > >>>>> uint8_t programVersion; > >>>>> uint8_t voltageMonitor; > >>>>> uint16_t flags; > >>>>> // Start of version 2 > >>>>> uint16_t voltageMonitorMillivolts; > >>>>> } format2payload; > >>>>> static_assert(sizeof(format2payload) == 6); > >>>>> > >>>>> A transmitter always sends with the latest version it knows, and > will > >>>>> fill in both the voltageMonitor and voltageMonitorMillivolts > >>>>> fields. A > >>>>> receiver interprets as much as it can based on the latest version it > >>>>> knows and the version it receives - any excess data beyond its > >>>>> understanding can safely be ignored. > >>>>> > >>>>> Your encoder and decoders are now nothing more than casts between > >>>>> char* > >>>>> pointers and struct pointers. > >>>> > >>>> So you use cast your struct pointers to char pointers and send it as > >>>> is? > >>>> I used this very simple technique in the past, but I don't use it > >>>> anymore. Because the two MCUs could be different, could use a > >>>> different > >>>> endianness, could use a different compiler that places padding in > >>>> different places, and so on. > >>>> > >>> > >>> It is not a problem if the MCUs are different. It would matter if > they > >>> had different encodings for signed integers or padding bits in their > >>> types, but let's assume you are not communicating with a mainframe > from > >>> the 60's. > >>> > >>> Padding is not a problem if you design your structs carefully. Make > >>> sure everything is naturally aligned - 16-bit data is 16-bit aligned, > >>> 32-bit data is 32-bit aligned, 64-bit data is 64-bit aligned. Use > your > >>> tools to check this - "-Wpadded" for gcc, and static_asserts to check > >>> that the sizes of your structs match what you expect. > >>> > >>> That just leaves endianness. Most microcontrollers are little-endian, > >>> as are PC's, so that is the endianness I normally use. The only > >>> exception would be if I were transferring data between two big-endian > >>> devices, I would probably use big-endian ordering. > >>> > >>> So if I have a networked system with different endians on different > >>> microcontrollers, then I need to do endian swaps on the structs at one > >>> end. Some compilers support this, letting you annotate your structs > >>> with the endianness (gcc 6 has this, though I haven't tried the > feature > >>> yet). Otherwise it must be done manually when receiving or > transmitting > >>> the struct. But still, it is a fraction of the effort (in development > >>> time and run time) of decoding more general protocol formats. > >> > >> I knew all your arguments. As I wrote, I used in the past exactly this > >> trick. However I don't like it. In certain cases, you have to change > the > >> order of the fields in a struct (an order that appears logical), only > >> because you have to avoid padding bytes. > >> > >> Moreover, if you need to encode some complex structs, understanding if > >> the compiler will introduce padding in-between is not trivial. > >> > >> send(&struct1, sizeof(struct1)); > >> send(&struct2, sizeof(struct2)); > >> > >> sizeof(struct1) could consider some extra padding bytes at the end of > >> the struct. The receiver should know about it. > >> > >> One time I had to communicate with a Visual Basic application. In that > >> case, managin padding bytes was a mess. > >> > > > > It really is not hard at all. /No/ compiler, for any sane processor, > > adds padding or extra alignment requirements beyond the natural size of > > the fundamental types. You only have to be concerned with padding if > > you try to mix and match in other ways. And if you have a "uint8_t" > > field which should logically be followed by another field that happens > > to be "uint16_t", just add an explicit "uint8_t" padding field. Don't > > let the compiler add its own padding - use compiler warnings where > > possible to ensure it, and use static assertions to confirm that > > everything is correct. > > There is another issue that can happen when you use "casting" approach. > > Over the wire they are all bytes, but you know a block of bytes are a C > struct. When they are bytes, you can use memcpy() and similar > functions, but they don't guarantee your struct remains aligned. >
Certainly you need to be aware of alignment issues. If you receive your message as a char* pointer into a block of data with unknown alignment, then you cannot cast it to a struct pointer without taking alignment into account. Either you arrange things so that your incoming data goes directly into a properly aligned area (that is often quite easy to achieve), or you will need to memcpy() from your buffer into your struct area.
> In this case, the cast could fail and this may depend on the processor. > > I had an experience of this kind of problem when I ported some code from > one MCU where not-aligned access was possible (with additional clock > ticks) to another MCU that didn't let the not-aligned access. The code > that worked on the first MCU, didn't work on the second. > I used cast approach and this was the reason of failure. >
The compiler should give you are warning on such casts if you do them blindly - it is worth listening to such warnings. Rather than a simple cast, it is often useful to use a union: union { uint64_t dummyForAlignment; struct { uint8_t telegramType; ... } uint8_t rawBuffer[64]; }
torstai 20. lokakuuta 2016 10.40.44 UTC+3 David Brown kirjoitti:
> <snip> > You can come a /long/ way with just a little more than the system you > have. Keep the same framing mechanism, but make sure you have a field > for "length of payload". In the payload, you have "type of telegram" > and "version of telegram format". Then when you need to change the > formats, you add new data to the old structure. > > So format version 1 might be: > > typedef struct { > uint8_t programVersion; > uint8_t voltageMonitor; > uint16_t flags; > } format1payload; > static_assert(sizeof(format1payload) == 4); > > Format version 2, with voltage now in millivolts, will be: > > typedef struct { > uint8_t programVersion; > uint8_t voltageMonitor; > uint16_t flags; > // Start of version 2 > uint16_t voltageMonitorMillivolts; > } format2payload; > static_assert(sizeof(format2payload) == 6); > > A transmitter always sends with the latest version it knows, and will > fill in both the voltageMonitor and voltageMonitorMillivolts fields. A > receiver interprets as much as it can based on the latest version it > knows and the version it receives - any excess data beyond its > understanding can safely be ignored. > > Your encoder and decoders are now nothing more than casts between char* > pointers and struct pointers.
Typically a uint8_t is just unsigned char, but the the char may be more than one octet ie. 8 bits. So, the static_assert(sizeof(format1payload) == 4) will be valid but depending of the target architecture the structure may be more than 4 octets. When you pass the payload structure to the transmit function, it will send 4 or more octets depending of how many octets the structure contains. I wouldn't call this method a robust and portable at all. A better way would be to create a transmit buffer, and add the structure fileds one at a time into the buffer. There should be different functions for different data types (char, uint8, uint16, int, long int etc.) which will take care of the proper size matching. When the all items of the structure is added into the buffer, the transmitter will send the buffer. I know, this is not for lazy people but it is portable and more robust way of doing things. When you port the application to a new platform, you just need to tweak those which will take care of the actual data size matching (char, uint8, uint16, int, long int etc.) I know, this method requires more initial work, but it is the way to do it in a portable manner. Br, Kalvin