Forums

Octets with non-8 bit bytes...

Started by Alex Sanks June 10, 2004
Ok, I'm sure this has been beaten to death, but google, etc. found a
lot of descriptions of the problem but none of a portable solution.

I'm working with some firmware drivers which are intended to be as
portable as possible.  Data moves thru a switchable 8- or 16-bit data
bus chip (a USB device controller specifically).  Performance is
critical so 16-bit is pretty much necessary.  Following that example,
let's look at the USB mass storage class.  You get commands from the
host in 31 octet command wrappers that look like this (endian issues
aside...):

typedef struct
{
	u32    Signature;
        u32    Tag;
	u32    TransferLength;
	u8     Flags;
        u8     Lun;
	u8     CommandLength;
	u8     Command[15];
} Cbw;

If I have 8 bit data types that's easy enough to get and deal with. 
But right now I'm working with a TMS320C55x variant with nothing
smaller than 16-bit data types.  So naturally the 8 bit types get all
mixed up when I read them and when I send back similar data every
other octet is garbage.  Some responses are filled at runtime, a few
are global constants.  I can pack things early, but then I need to
unpack, modify, and repack.  Or I can pack before transmission, but
that'd take a bite out of performance.  Or I can break things down:

typedef struct
{
	BYTE	Signature0;
	BYTE	Signature1;
	BYTE	Signature2;
	BYTE	Signature3;
	BYTE	Tag0;
	BYTE	Tag1;
	BYTE	Tag2;
	BYTE	Tag3;
	BYTE	TransferLength0;
	BYTE	TransferLength1;
	BYTE	TransferLength2;
	BYTE	TransferLength3;
	BYTE	Flags;
	BYTE	Lun;
	BYTE	CommandLength;
	BYTE	Command[15];
} Cbw;

Ugly.  I'd really like to avoid that...

Now, I see this problem described countless times (yes, yes,
sizeof(char)==sizeof(int)==1, 16 bit byte is 100% ok by the standard),
but what's the best portable solution to dealing with this?  Or at
least *mostly* portable.  All the messages I see say "don't store
binary data and don't worry about how many bits are in anything". 
Great, but that embedded command field being sent from my host
computer 5 meters away is 15 octets whether I like it or not.  I don't
care if everything's stored locally inefficiently so long as
performance is reasonable (and it's clear!  Other people *will* be
dealing with this code!)

I'm making progress getting things to work, but it's getting ugly so I
was curious how people deal with this in real life.

Thanks for whatever guidance you can provide,
alex
"Octets" and "Bytes" are always 8 bits.  The term you want is "Words."

Guy Macon wrote:

> "Octets" and "Bytes" are always 8 bits. ...
Not in C. A C byte id the smallest of 1) a character used by the system, 2) the smallest memory chunk that can be individually addressed, or 3) eight bits. In most DSPs, a C compilers considers a byte to contain 16 or 32 bits. sizeof(char) is always 1. sizeof() returns storage size in bytes. On most DSPs, sizeof(int) is 1. On many, sizeof(long) is also 1. Try it. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������
On 10 Jun 2004 16:59:49 -0700, usenet1@sanks.net (Alex Sanks) wrote in
comp.arch.embedded:

> Ok, I'm sure this has been beaten to death, but google, etc. found a > lot of descriptions of the problem but none of a portable solution. > > I'm working with some firmware drivers which are intended to be as > portable as possible. Data moves thru a switchable 8- or 16-bit data > bus chip (a USB device controller specifically). Performance is > critical so 16-bit is pretty much necessary. Following that example, > let's look at the USB mass storage class. You get commands from the > host in 31 octet command wrappers that look like this (endian issues > aside...): > > typedef struct > { > u32 Signature; > u32 Tag; > u32 TransferLength; > u8 Flags; > u8 Lun; > u8 CommandLength; > u8 Command[15]; > } Cbw; > > If I have 8 bit data types that's easy enough to get and deal with. > But right now I'm working with a TMS320C55x variant with nothing > smaller than 16-bit data types. So naturally the 8 bit types get all > mixed up when I read them and when I send back similar data every > other octet is garbage. Some responses are filled at runtime, a few > are global constants. I can pack things early, but then I need to > unpack, modify, and repack. Or I can pack before transmission, but > that'd take a bite out of performance. Or I can break things down: > > typedef struct > { > BYTE Signature0; > BYTE Signature1; > BYTE Signature2; > BYTE Signature3; > BYTE Tag0; > BYTE Tag1; > BYTE Tag2; > BYTE Tag3; > BYTE TransferLength0; > BYTE TransferLength1; > BYTE TransferLength2; > BYTE TransferLength3; > BYTE Flags; > BYTE Lun; > BYTE CommandLength; > BYTE Command[15]; > } Cbw; > > Ugly. I'd really like to avoid that... > > Now, I see this problem described countless times (yes, yes, > sizeof(char)==sizeof(int)==1, 16 bit byte is 100% ok by the standard), > but what's the best portable solution to dealing with this? Or at > least *mostly* portable. All the messages I see say "don't store > binary data and don't worry about how many bits are in anything". > Great, but that embedded command field being sent from my host > computer 5 meters away is 15 octets whether I like it or not. I don't > care if everything's stored locally inefficiently so long as > performance is reasonable (and it's clear! Other people *will* be > dealing with this code!) > > I'm making progress getting things to work, but it's getting ugly so I > was curious how people deal with this in real life. > > Thanks for whatever guidance you can provide, > alex
I ran across something similar in parsing and formatting CAN packets for the TI 2812 DSP, which likewise has 16-bit chars and ints. A CAN packet may contain between 0 and 8 octets in the data field of the frame. In our interface, any octet may be part of an 8-bit, 16-bit, or 32-bit value. I wrote two low-level routines to pack/unpack to an array of eight 1-bit words. When compiled with full optimization it is quite short and fast, at least on the 2812, which has a C-friendly architecture compared to some older DSPs. The result was good enough that I had no need to write it in assembly language. In fact one of my colleagues who wrote the other side of the interface on an ARM used the code unchanged. You might be able to adapt something from them: #define OCTET_MASK 0xFFU static void split_frame(const uint16_t words [4], uint_least8_t *split) { /* can't just walk a pointer to unsigned char through the octets of the */ /* data frame because unsigned char is 16 bits on the 2812 DSP! */ split [0] = words[0] & OCTET_MASK; split [1] = (words[0] >> 8) & OCTET_MASK; split [2] = words[1] & OCTET_MASK; split [3] = (words[1] >> 8) & OCTET_MASK; split [4] = words[2] & OCTET_MASK; split [5] = (words[2] >> 8) & OCTET_MASK; split [6] = words[3] & OCTET_MASK; split [7] = (words[3] >> 8) & OCTET_MASK; } static void assemble_frame(const uint_least8_t *split, uint16_t *words) { /* can't just walk a pointer to unsigned char through the octets of the */ /* data frame because unsigned char is 16 bits on the 2812 DSP! */ words [0] = ((uint16_t)split [1] << 8) | split [0]; words [1] = ((uint16_t)split [3] << 8) | split [2]; words [2] = ((uint16_t)split [5] << 8) | split [4]; words [3] = ((uint16_t)split [7] << 8) | split [6]; } Note that TI doesn't supply a C99 <stdint.h> header with Code Composer Studio for the 2812, I had to write my own. On mine for the TI, the C99 type uint_least_8_t is a typedef for unsigned int. On the ARM compiler, which does supply a <stdint.h>, it is unsigned char. These things can be done in C in a portable way, it just takes a little thought. -- Jack Klein Home: http://JK-Technology.Com FAQs for comp.lang.c http://www.eskimo.com/~scs/C-faq/top.html comp.lang.c++ http://www.parashift.com/c++-faq-lite/ alt.comp.lang.learn.c-c++ http://www.contrib.andrew.cmu.edu/~ajo/docs/FAQ-acllc.html
Jerry Avins <jya@ieee.org> says...
> >Guy Macon wrote: > >> "Octets" and "Bytes" are always 8 bits. ... > >Not in C. A C byte id the smallest of > >1) a character used by the system, >2) the smallest memory chunk that can be individually addressed, or >3) eight bits.
Yup. One more reason to hate C.
Guy Macon wrote:
> > "Octets" and "Bytes" are always 8 bits. The term you want is "Words."
In C, Octets yes, but bytes contain CHAR_BIT bits, as defined in <limits.h> -- Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net) Available for consulting/temporary embedded and systems. <http://cbfalconer.home.att.net> USE worldnet address!
Please excuse as I can give no "whatever guidance".
But another question to you:
Can you tell me where to get information about the USB mass storage class ?

                                        Thanks, Wolfgang


In comp.arch.embedded Alex Sanks <usenet1@sanks.net> wrote:

> I'm working with some firmware drivers which are intended to be as > portable as possible. Data moves thru a switchable 8- or 16-bit > data bus chip (a USB device controller specifically).
What the data bus of that chip is should be pretty much irrelevant. What you need to know is what size the registers are. Or more generally, how that 16-bit layout actually works. The makers of that USB controller *must* be aware of this problem, so check them for app notes.
> But right now I'm working with a TMS320C55x variant with nothing > smaller than 16-bit data types. So naturally the 8 bit types get all > mixed up when I read them and when I send back similar data every > other octet is garbage.
So don't do that. Marshal your incoming data into something your CPU can use (e.g. one 16-bit word for each octet, let 32bit words keep 32bit words, and forget about possible waste), right at the interface betwen the USB controller and the DSP.
> Ugly. I'd really like to avoid that...
You won't manage to avoid all the ugliness --- you've maneouvered yourself into too ugly a situation for that.
> but what's the best portable solution to dealing with this?
Essentially the same one you use to work with single bits in a C byte: masks and shifts. Or, only if you know your compiler will _never_ change its behaviour in that aspect, bit-fields. -- Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de) Even if all the snow were burnt, ashes would remain.
On Fri, 11 Jun 2004 08:26:45 +0200, Wolfgang <never@nowhere.com> wrote:

> Can you tell me where to get information about the USB mass storage > class ?
USB.org has a good collection of documents, including class specs. http://www.usb.org/developers/devclass/ HTH, Vadim
On Thu, 10 Jun 2004 21:42:56 -0400, Jerry Avins <jya@ieee.org> wrote:

>Guy Macon wrote: > >> "Octets" and "Bytes" are always 8 bits. ... > > >Not in C. A C byte id the smallest of > >1) a character used by the system, >2) the smallest memory chunk that can be individually addressed, or >3) eight bits.
Close. It's the smallest addressable unit which will hold a character. From the standard: byte addressable unit of data storage large enough to hold any member of the basic character set of the execution environment 2 NOTE 1 It is possible to express the address of each individual byte of an object uniquely. 3 NOTE 2 A byte is composed of a contiguous sequence of bits, the number of which is implementation defined. The least significant bit is called the low-order bit; the most significant bit is called the high-order bit.
> >In most DSPs, a C compilers considers a byte to contain 16 or 32 bits. > >sizeof(char) is always 1. sizeof() returns storage size in bytes. On >most DSPs, sizeof(int) is 1. On many, sizeof(long) is also 1. Try it. > >Jerry
-- Al Balmer Balmer Consulting removebalmerconsultingthis@att.net