EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

Octets with non-8 bit bytes...

Started by Alex Sanks June 10, 2004
Alan Balmer <albalmer@att.net> wrote:
> On Thu, 10 Jun 2004 21:42:56 -0400, Jerry Avins <jya@ieee.org> wrote:
> >Guy Macon wrote: > > > >> "Octets" and "Bytes" are always 8 bits. ... > > > > > >Not in C. A C byte id the smallest of > > > >1) a character used by the system, > >2) the smallest memory chunk that can be individually addressed, or > >3) eight bits.
> Close. It's the smallest addressable unit which will hold a character.
Closer, but still no cigar. It must be addressable, and must be able to represent each character distinctly. But by no means does it *have* to be the _smallest_ addressable unit fulfilling those requirements. E.g. a C translation system targetting a 32-bit x86 PC yet using 19-bit chars, although obviously a total perversion, is quite certainly allowed by the C standard. -- Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de) Even if all the snow were burnt, ashes would remain.
In article <10chvgq33qr2v00@corp.supernews.com>, Guy Macon
<http@?.guymacon.com> writes
> >"Octets" and "Bytes" are always 8 bits. The term you want is "Words." >
Octets yes but not so bytes so I am told. Back in the depths of computing history bytes could be other than 8 bits hence the use of "octet" /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ \/\/\/\/\ Chris Hills Staffs England /\/\/\/\/\ /\/\/ chris@phaedsys.org www.phaedsys.org \/\/ \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
Hans-Bernhard Broeker wrote:
> > Alan Balmer <albalmer@att.net> wrote: > > On Thu, 10 Jun 2004 21:42:56 -0400, Jerry Avins <jya@ieee.org> wrote: > > > >Guy Macon wrote: > > > > > >> "Octets" and "Bytes" are always 8 bits. ... > > > > > > > > >Not in C. A C byte id the smallest of > > > > > >1) a character used by the system, > > >2) the smallest memory chunk that can be individually addressed, or > > >3) eight bits. > > > Close. It's the smallest addressable unit which will hold a character. > > Closer, but still no cigar. It must be addressable, and must be able > to represent each character distinctly. But by no means does it > *have* to be the _smallest_ addressable unit fulfilling those > requirements. E.g. a C translation system targetting a 32-bit x86 PC > yet using 19-bit chars, although obviously a total perversion, is > quite certainly allowed by the C standard.
So how large is a byte on such a machine? I'm not clear if this is 24 or 32 bits. I guess this would be 32 bits since 24 bits would not be "directly" addressable. How is that different from what Alan said? -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX
rickman wrote:
> Hans-Bernhard Broeker wrote: > >>Alan Balmer <albalmer@att.net> wrote: >> >>>On Thu, 10 Jun 2004 21:42:56 -0400, Jerry Avins <jya@ieee.org> wrote: >> >>>>Guy Macon wrote: >>>> >>>> >>>>>"Octets" and "Bytes" are always 8 bits. ... >>>> >>>> >>>>Not in C. A C byte id the smallest of >>>> >>>>1) a character used by the system, >>>>2) the smallest memory chunk that can be individually addressed, or >>>>3) eight bits. >> >>>Close. It's the smallest addressable unit which will hold a character. >> >>Closer, but still no cigar. It must be addressable, and must be able >>to represent each character distinctly. But by no means does it >>*have* to be the _smallest_ addressable unit fulfilling those >>requirements. E.g. a C translation system targetting a 32-bit x86 PC >>yet using 19-bit chars, although obviously a total perversion, is >>quite certainly allowed by the C standard. > > > So how large is a byte on such a machine? I'm not clear if this is 24 > or 32 bits. I guess this would be 32 bits since 24 bits would not be > "directly" addressable. How is that different from what Alan said?
Just to stir the pot a little, a pdp8 was a 12-bit machine and 12 bits was common usage for a byte. The machine addressed data in 12-bit bytes, but also common usage was to store and manipulate chars as 6-bit nibbles packed 2 per byte. This was back on TTY's where upper case was irrelevent.
On 11 Jun 2004 15:51:48 GMT, Hans-Bernhard Broeker
<broeker@physik.rwth-aachen.de> wrote:

> E.g. a C translation system targetting a 32-bit x86 PC >yet using 19-bit chars, although obviously a total perversion, is >quite certainly allowed by the C standard.
Providing that the 32-bit words are addressable in 19-bit chunks? My head hurts. -- Al Balmer Balmer Consulting removebalmerconsultingthis@att.net
On Fri, 11 Jun 2004 08:38:36 -0700, Alan Balmer <albalmer@att.net>
wrote:

>> >>Not in C. A C byte id the smallest of >> >>1) a character used by the system, >>2) the smallest memory chunk that can be individually addressed, or >>3) eight bits. > >Close. It's the smallest addressable unit which will hold a character.
Now the question is, what is a character ? I can think of character sets based on 5, 6, 7, 8, (9), 16, 21 and 31 (32) bits. Paul
Jim Stewart <jstewart@jkmicro.com> says...

>Just to stir the pot a little, a pdp8 was a >12-bit machine and 12 bits was common usage >for a byte. The machine addressed data in >12-bit bytes, but also common usage was to >store and manipulate chars as 6-bit nibbles >packed 2 per byte. This was back on TTY's >where upper case was irrelevent.
Just to stir the pot a little more... http://www.catb.org/~esr/jargon/html/B/byte.html byte: /bi:t/, n. [techspeak] A unit of memory or data equal to the amount used to represent one character; on modern architectures this is invariably 8 bits. Some older architectures used byte for quantities of 6, 7, or (especially) 9 bits, and the PDP-10 supported bytes that were actually bitfields of 1 to 36 bits! These usages are now obsolete, killed off by universal adoption of power-of-2 word sizes. Historical note: The term was coined by Werner Buchholz in 1956 during the early design phase for the IBM Stretch computer; originally it was described as 1 to 6 bits (typical I/O equipment of the period used 6-bit chunks of information). The move to an 8-bit byte happened in late 1956, and this size was later adopted and promulgated as a standard by the System/360. The word was coined by mutating the word &#4294967295;bite&#4294967295; so it would not be accidentally misspelled as bit. See also nybble. http://www.catb.org/~esr/jargon/html/C/chawmp.html chawmp: n. [University of Florida] 16 or 18 bits (half of a machine word). This term was used by FORTH hackers during the late 1970s/early 1980s; it is said to have been archaic then, and may now be obsolete. It was coined in revolt against the promiscuous use of &#4294967295;word&#4294967295; for anything between 16 and 32 bits; &#4294967295;word&#4294967295; has an additional special meaning for FORTH hacks that made the overloading intolerable. For similar reasons, /gaw&#4294967295;bl/ (spelled &#4294967295;gawble&#4294967295; or possibly &#4294967295;gawbul&#4294967295;) was in use as a term for 32 or 48 bits (presumably a full machine word, but our sources are unclear on this). These terms are more easily understood if one thinks of them as faithful phonetic spellings of &#4294967295;chomp&#4294967295; and &#4294967295;gobble&#4294967295; pronounced in a Florida or other Southern U.S. dialect. For general discussion of similar terms, see nybble. nybble: /nib&#4294967295;l/, nibble, n. [from v. nibble by analogy with &#4294967295;bite&#4294967295; ? &#4294967295;byte&#4294967295;] Four bits; one hex digit; a half-byte. Though &#4294967295;byte&#4294967295; is now techspeak, this useful relative is still jargon. Compare byte; see also bit. The more mundane spelling &#4294967295;nibble&#4294967295; is also commonly used. Apparently the &#4294967295;nybble&#4294967295; spelling is uncommon in Commonwealth Hackish, as British orthography would suggest the pronunciation /ni:&#4294967295;bl/. Following &#4294967295;bit&#4294967295;, &#4294967295;byte&#4294967295; and &#4294967295;nybble&#4294967295; there have been quite a few analogical attempts to construct unambiguous terms for bit blocks of other sizes. All of these are strictly jargon, not techspeak, and not very common jargon at that (most hackers would recognize them in context but not use them spontaneously). We collect them here for reference together with the ambiguous techspeak terms &#4294967295;word&#4294967295;, &#4294967295;half-word&#4294967295;, &#4294967295;double word&#4294967295;, and &#4294967295;quad&#4294967295; or quad word; some (indicated) have substantial information separate entries. 2 bits: crumb, quad, quarter, tayste, tydbit, morsel 4 bits: nybble 5 bits: nickle 10 bits: deckle 16 bits: playte, chawmp (on a 32-bit machine), word (on a 16-bit machine), half-word (on a 32-bit machine). 18 bits: chawmp (on a 36-bit machine), half-word (on a 36-bit machine) 32 bits: dynner, gawble (on a 32-bit machine), word (on a 32-bit machine), longword (on a 16-bit machine). 36 bits: word (on a 36-bit machine) 48 bits: gawble (under circumstances that remain obscure) 64 bits: double word (on a 32-bit machine) quad (on a 16-bit machine) 128 bits: quad (on a 32-bit machine) The fundamental motivation for most of these jargon terms (aside from the normal hackerly enjoyment of punning wordplay) is the extreme ambiguity of the term word and its derivatives Also see: http://www.catb.org/~esr/jargon/html/P/playte.html http://www.catb.org/~esr/jargon/html/T/tayste.html http://www.catb.org/~esr/jargon/html/Q/quarter.html http://www.catb.org/~esr/jargon/html/B/bit.html http://www.catb.org/~esr/jargon/ http://www.catb.org/~esr/jargon/jargoogle.html Comment by Guy Macon: Concerning the statement "on modern architectures this is invariably 8 bits", in my opinion C has no resemblance to anything that can reasonably be called "modern." See [ http://cm.bell-labs.com/cm/cs/who/dmr/chist.html ]. -- Guy Macon, Electronics Engineer & Project Manager for hire. Remember Doc Brown from the _Back to the Future_ movies? Do you have an "impossible" engineering project that only someone like Doc Brown can solve? My resume is at http://www.guymacon.com/
On Fri, 11 Jun 2004 12:17:15 -0700, Guy Macon
<http://www.guymacon.com> wrote:

[...]
>http://www.catb.org/~esr/jargon/html/C/chawmp.html >chawmp: n. >[University of Florida] 16 or 18 bits (half of a machine word). >This term was used by FORTH hackers during the late 1970s/early >1980s; it is said to have been archaic then, and may now be >obsolete. It was coined in revolt against the promiscuous use
I first used Forth in the late 70's-early 80's (though I've never been a "Forth Hacker"), and I've never seen this term before. The term I've always heard used is "cell," which is the size of a single entry on the data stack, and at least 16 bits wide in ANSI standard Forth. A "cell pair" holds "double cell" values. A "character" is allowed (but not required) to be narrower than a "cell."
>of &#4294967295;word&#4294967295; for anything between 16 and 32 bits; &#4294967295;word&#4294967295; has an >additional special meaning for FORTH hacks that made the >overloading intolerable. For similar reasons, /gaw&#4294967295;bl/ (spelled
FWIW, a "word" in Forth is what you might call an "operator" or a "function" in c. Actually, it's a little more generic than that. Almost everything in a Forth program is a word.
>&#4294967295;gawble&#4294967295; or possibly &#4294967295;gawbul&#4294967295;) was in use as a term for 32 or >48 bits (presumably a full machine word, but our sources are
Never heard of that one either... [...]
>2 bits: [...] quarter,
Shave and a haircut... Regards, -=Dave -- Change is inevitable, progress is not.
Dave Hansen wrote:
> > On Fri, 11 Jun 2004 12:17:15 -0700, Guy Macon > <http://www.guymacon.com> wrote: > > [...] > >http://www.catb.org/~esr/jargon/html/C/chawmp.html > > >2 bits: [...] quarter, > > Shave and a haircut...
Would that make 8 bits a dollar? I've never liked calling 8 bits an octet, it sounds like an overgrown musical group... Lets see... a dollar buys an ascii char, two dollars gets you signed numbers from 32575 to -32576 and four dollars can buy... well you get the idea. :) I am building a 50 cent CPU! Cool, thats what I'll call it, FiftyCents. -- Rick "rickman" Collins rick.collins@XYarius.com Ignore the reply address. To email me use the above address with the XY removed. Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL http://www.arius.com 4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX
oN 11-Jun-04, rickman said:

> Would that make 8 bits a dollar?
Yes, and comes from the practice of cutting a silver dollar into 8 "bits" for making change. -- Bill Posted with XanaNews Version 1.16.3.1

The 2024 Embedded Online Conference