Reply by Dave Thompson April 2, 20062006-04-02
On Tue, 21 Mar 2006 20:42:46 -0500 (EST), "Arthur J. O'Dwyer"
<ajonospam@andrew.cmu.edu> wrote:

> I believe Andrew means > > void foo(int & x)
<snip>
> void bar() > { > register int a; > foo(a); /* Will C++ accept this? */ > } > > I don't know whether standard C++ would accept the above code, or whether > it would, like standard C, insist that the programmer can't take the > address of a 'register' variable, even implicitly. <snip>
The former. C++ goes the other way: if you take the address of a 'register' variable, the 'register' is silently overridden. (Silently in that the standard does not require a diagnostic; implementors, in both C++ and C, are _allowed_ to diagnose anything they want.) - David.Thompson1 at worldnet.att.net
Reply by Keith Thompson March 27, 20062006-03-27
Jordan Abel <random832@gmail.com> writes:
> On 2006-03-27, Ben Pfaff <blp@cs.stanford.edu> wrote: >> Jordan Abel <random832@gmail.com> writes: >>> maybe not that in particular, but *p-- past 0 is no less idiomatic than >>> *p++ past the end. >> >> Really? It's not in *my* idiom, because I like to write code >> that doesn't gratuitously invoke undefined behavior. > > a circular argument when you are defending the decision to leave it > undefined.
The C standard, as it exists, makes decrementing a pointer past the beginning of an array undefined behavior. Most of us avoid doing this, not because we think the standard *should* make it undefined, but because the standard *does* make it undefined. Code that does this is not idiomatic, because careful programmers don't write such code. There's nothing circular about that. Note that you can run into similar problems if you use indices rather than pointers, if the index type is unsigned. The behavior when you decrement past 0 is well-defined, but it's likely to cause problems if the unsigned value is being used as an array index (except that, unlike for pointers, 0-1+1 is guaranteed to be 0). -- Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst> San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst> We must do something. This is something. Therefore, we must do this.
Reply by Chris Torek March 27, 20062006-03-27
In article <pan.2006.03.27.11.39.43.167688@areilly.bpc-users.org>
Andrew Reilly  <andrew-newspost@areilly.bpc-users.org> wrote:
>How much undefined behaviour can you stand?
Quite a bit, *provided* that this "undefined" is only in terms of the C standard. As I have noted elsewhere, doing something like: #include <graphics.h> invokes undefined behavior. I have no problem with including such a file, though, where the behavior defined by some *other* document is required. What I try to avoid is: - depending on behavior that is not only not defined by the C standard, but also not defined by anything else, and merely "happens to work today"; - making use of implementation(s)-specific behavior when there is a well-defined variant of the code that also meets whatever specifications are in use. The latter covers things like doing arithmetic in "int" that deliberately overflow temporarily, assumes that the overflow does not trap, and then "un-overflows" back into range. If one codes this in "unsigned int" arithmetic instead, one gets guaranteed mod-2-sup-k behavior, and the code is just as small and fast as the not-guaranteed version.
>That's one of the main things that I like about assembly language, btw: it >might be all kinds of painful to express an algorithm (although generally >not really all that bad), but the instruction descriptions in >the data books tell you *precicely* what each one will do, and >you can compose your code with no doubts about how it will perform.
Actually, there are a number of instruction sets (for various machines) that tell you to avoid particular situations with particular instructions. Consider the VAX's "movtuc" ("move translated until character") instruction, which takes a source-and-source-length, destination (and destination-length?), and translation-table. The manual says that the effect of the instruction is unpredictable if the translation table overlaps with the source (and/or destination?). Someone put a comment into a piece of assembly code in 4.1BSD that read "# comet sucks". I wondered what this was about. It turns out that whoever implemented the printf engine for the VAX used "movtuc" to find '%' and '\0' characters, and did the movtuc with the source string having "infinite" length (actually 65535 bytes, the length being restricted to 16 bits) so that it often overlapped the translation table. On the VAX-11/780, this "worked right" (as in, did what he wanted it to). On the VAX-11/750 -- known internally as the "Comet" -- it did not behave the way he wanted. The result was that printf() misbehaved for various programs, because the assembly code depended on undefined behavior. (The "fix" applied, along with the comment, was to limit the length of the source so as not to overlap the table. Of course, when we rewrote the printf engine in C for portability and C89 support, we stopped using movtuc entirely.) -- In-Real-Life: Chris Torek, Wind River Systems Salt Lake City, UT, USA (40&#4294967295;39.22'N, 111&#4294967295;50.29'W) +1 801 277 2603 email: forget about it http://web.torek.net/torek/index.html Reading email is like searching for food in the garbage, thanks to spammers.
Reply by Jordan Abel March 27, 20062006-03-27
On 2006-03-27, Ben Pfaff <blp@cs.stanford.edu> wrote:
> Jordan Abel <random832@gmail.com> writes: > >> maybe not that in particular, but *p-- past 0 is no less idiomatic than >> *p++ past the end. > > Really? It's not in *my* idiom, because I like to write code > that doesn't gratuitously invoke undefined behavior.
a circular argument when you are defending the decision to leave it undefined.
Reply by Ben Pfaff March 27, 20062006-03-27
Jordan Abel <random832@gmail.com> writes:

> maybe not that in particular, but *p-- past 0 is no less idiomatic than > *p++ past the end.
Really? It's not in *my* idiom, because I like to write code that doesn't gratuitously invoke undefined behavior. -- Ben Pfaff email: blp@cs.stanford.edu web: http://benpfaff.org
Reply by Jordan Abel March 27, 20062006-03-27
On 2006-03-27, Arthur J. O'Dwyer <ajonospam@andrew.cmu.edu> wrote:
> That's why precious few C implementations /do/ pointer validity > checking in the first place. As I understand it, not even the AS/400's > compiler did pointer checking in software; it just did whatever the > hardware forced it to. And the hardware check presumably /would/ have > gone off at each dereference.
according to others in this thread, apparently not, hence why it checks on load.
Reply by Jordan Abel March 27, 20062006-03-27
On 2006-03-27, Richard Bos <rlb@hoekstra-uitgeverij.nl> wrote:
> Andrew Reilly <andrew-newspost@areilly.bpc-users.org> wrote: > >> On Fri, 24 Mar 2006 08:20:12 +0000, David Holland wrote: >> > Because p -= 2, when performed on the pointer 1234:4, tries to deduct >> > 8 from the offset field. This underflows and traps. >> >> And this is the behaviour that is at odds with idiomatic C. > > _Whose_ idiom? No programmer I'd respect writes such code intentionally. > > Richard
maybe not that in particular, but *p-- past 0 is no less idiomatic than *p++ past the end.
Reply by Arthur J. O'Dwyer March 27, 20062006-03-27
On Mon, 27 Mar 2006, CBFalconer wrote:
> Andrew Reilly wrote: >> On Mon, 27 Mar 2006 03:07:28 +0000, Dik T. Winter wrote: >>> Jordan Abel <random832@gmail.com> writes: >>>> On 2006-03-26, Stephen Sprunk <stephen@sprunk.org> wrote: >>>>> >>>>> It simply doesn't make sense to do things that way since the >>>>> only purpose is to allow violations of the processor's memory >>>>> protection model. Work with the model, not against it.
(FWIW, I agree with Stephen's sentiment. C's memory model seems consistent to me: pointers point at objects, or are NULL, or are garbage, with one special-case exception for pointers that point "one past" objects. Extending the model to allow pointers that point "one before" objects, or "ten past," doesn't seem useful enough to be worth the hassle of defining all the behaviors on overflow, or what happens if 'x' is "ten past" 'y' in memory, and so on. Just don't write code that loops backward in an unsafe manner.) [Proposing a different, flat-memory model for C.]
>> The trap isn't ignored. There is no trap: the platform's "sane C >> memory model" compiler and run-time system updated p.array_index >> to -1 and p.array_base to a.array_base at the third line, as >> expected. The trap would be left enabled, so that it would >> actually hit if/when a real pointer was formed from >> &p.array_base[p.C_pointer_index] if/when *p was ever referenced >> in the subsequent code. >> >> Consequently, the above code leaves p == a, as expected, and no >> trap is encountered. Neat, huh? > > Nope. Consider some code such as: > > for (...; ...; ++p) { > for (...; ...; ++q) { > dothingswith(*p, *q); > /* qchecktime */ > } > /* pchecktime */ > } > > With the normal check at pointer creation time, p is checked once > per iteration of the outer for. Your way, it is checked at every > use of *p, which will probably be far more often. Thus slowing > down the whole system and bringing snarlers_against_runtime_checks > out of every crack in the walls.
Straw man. Every decent compiler does hoisting of loop invariants, making both checks equivalent. (And if your compiler doesn't hoist invariants, then you have no business talking about runtime efficiency in the first place.)
> Checking pointer validity can be an involved process, depending on > architecture. It should be avoided, similar to casts, which at > least are obvious because the programmer writes them in.
Obviously. That's why precious few C implementations /do/ pointer validity checking in the first place. As I understand it, not even the AS/400's compiler did pointer checking in software; it just did whatever the hardware forced it to. And the hardware check presumably /would/ have gone off at each dereference. -Arthur
Reply by Chris Dollin March 27, 20062006-03-27
Andrew Reilly wrote:

> On Mon, 27 Mar 2006 12:59:59 +0100, Chris Dollin wrote: >> The C standard don't /outlaw/ forming illegal pointer values; they >> just say that if you do that, they don't say anything more about the >> behaviour of your code, so if you want defined behaviour, you have >> to look elsewhere for the definition. > > How much undefined behaviour can you stand?
No more than what's covered by the defined behaviour on the platforms I'm prepared to support, where `defined` isn't limited to the C standard but over-enthusiatic uses of random other definitions isn't desired.
> Sure, your code works OK this > year, but what if next year's super-optimizer switch takes a different > reading on some behaviour that you've coded to, because it was > "universally" supported, but never the less undefined. Want to chase down > those bugs?
Were I actively writing C - which at the moment I'm not - I'd have tests to check behaviour, for this reason among others.
> How many substantial applications do you suppose are written, that *only* > use defined behaviours? I suspect that the answer is very close to none.
That only use behaviour defined by the C standard? Few. That only use behaviour defined by their intended platforms? Rather more.
>> If you're writing code that has, for whatever reason, to rely on >> non-C-standard definitions, well then, rely on them. I've written code >> that relies on non-C-standard behaviour, too - but I didn't expect it to >> port everywhere, and I didn't expect such use to be a requirement on >> future standardisation to support it, much as I might like to; the >> leaves-it-undefined /allows/ the code to work where it works. > > I like C. A lot. > > I think that it could do to have a few fewer undefined behaviours, and a > few more defined (obvious) behaviours that you could rely on to describe > your algorithms.
Well, me too. But that doesn't stop me thinking that the standard seems to be a reasonable compromise between the different requirements, as things stand.
> That's one of the main things that I like about assembly language, btw: it > might be all kinds of painful to express an algorithm (although generally > not really all that bad), but the instruction descriptions in > the data books tell you *precicely* what each one will do, and > you can compose your code with no doubts about how it will perform.
The first half is the reason I'd typically stay away from assembly language, and I'm not convinced about the second unless one goes into the amount of detail I'd happily leave to the compiler-writer. -- Chris "x.f(y) == f(x, y) == (x, y).f" Dollin The shortcuts are all full of people using them.
Reply by Andrew Reilly March 27, 20062006-03-27
On Mon, 27 Mar 2006 12:59:59 +0100, Chris Dollin wrote:
> The C standard don't /outlaw/ forming illegal pointer values; they > just say that if you do that, they don't say anything more about the > behaviour of your code, so if you want defined behaviour, you have > to look elsewhere for the definition.
How much undefined behaviour can you stand? Sure, your code works OK this year, but what if next year's super-optimizer switch takes a different reading on some behaviour that you've coded to, because it was "universally" supported, but never the less undefined. Want to chase down those bugs? How many substantial applications do you suppose are written, that *only* use defined behaviours? I suspect that the answer is very close to none.
> If you're writing code that has, for whatever reason, to rely on > non-C-standard definitions, well then, rely on them. I've written code > that relies on non-C-standard behaviour, too - but I didn't expect it to > port everywhere, and I didn't expect such use to be a requirement on > future standardisation to support it, much as I might like to; the > leaves-it-undefined /allows/ the code to work where it works.
I like C. A lot. I think that it could do to have a few fewer undefined behaviours, and a few more defined (obvious) behaviours that you could rely on to describe your algorithms. That's one of the main things that I like about assembly language, btw: it might be all kinds of painful to express an algorithm (although generally not really all that bad), but the instruction descriptions in the data books tell you *precicely* what each one will do, and you can compose your code with no doubts about how it will perform. [I don't read comp.lang.c, so if you want me to see any replies (hah! :-), you won't take comp.arch.embedded out of the Newsgroups. Of course, I can imagine that just about everyone doesn't care, at this stage...] -- Andrew