On Tue, 21 Mar 2006 20:42:46 -0500 (EST), "Arthur J. O'Dwyer"
<ajonospam@andrew.cmu.edu> wrote:

> I believe Andrew means
> 
>        void foo(int & x)
<snip>
>        void bar()
>        {
>           register int a;
>           foo(a);         /* Will C++ accept this? */
>        }
> 
> I don't know whether standard C++ would accept the above code, or whether 
> it would, like standard C, insist that the programmer can't take the 
> address of a 'register' variable, even implicitly. <snip>

The former. C++ goes the other way: if you take the address of a
'register' variable, the 'register' is silently overridden. (Silently
in that the standard does not require a diagnostic; implementors, in
both C++ and C, are _allowed_ to diagnose anything they want.)

- David.Thompson1 at worldnet.att.net

Jordan Abel <random832@gmail.com> writes:
> On 2006-03-27, Ben Pfaff <blp@cs.stanford.edu> wrote:
>> Jordan Abel <random832@gmail.com> writes:
>>> maybe not that in particular, but *p-- past 0 is no less idiomatic than 
>>> *p++ past the end.
>>
>> Really?  It's not in *my* idiom, because I like to write code
>> that doesn't gratuitously invoke undefined behavior.
>
> a circular argument when you are defending the decision to leave it 
> undefined.

The C standard, as it exists, makes decrementing a pointer past the
beginning of an array undefined behavior.  Most of us avoid doing
this, not because we think the standard *should* make it undefined,
but because the standard *does* make it undefined.  Code that does
this is not idiomatic, because careful programmers don't write such
code.  There's nothing circular about that.

Note that you can run into similar problems if you use indices rather
than pointers, if the index type is unsigned.  The behavior when you
decrement past 0 is well-defined, but it's likely to cause problems if
the unsigned value is being used as an array index (except that,
unlike for pointers, 0-1+1 is guaranteed to be 0).

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
San Diego Supercomputer Center             <*>  <http://users.sdsc.edu/~kst>
We must do something.  This is something.  Therefore, we must do this.

In article <pan.2006.03.27.11.39.43.167688@areilly.bpc-users.org>
Andrew Reilly  <andrew-newspost@areilly.bpc-users.org> wrote:
>How much undefined behaviour can you stand?

Quite a bit, *provided* that this "undefined" is only in terms of
the C standard.

As I have noted elsewhere, doing something like:

    #include <graphics.h>

invokes undefined behavior.  I have no problem with including such
a file, though, where the behavior defined by some *other* document
is required.

What I try to avoid is:

 - depending on behavior that is not only not defined by the C
   standard, but also not defined by anything else, and merely
   "happens to work today";

 - making use of implementation(s)-specific behavior when there is
   a well-defined variant of the code that also meets whatever
   specifications are in use.

The latter covers things like doing arithmetic in "int" that
deliberately overflow temporarily, assumes that the overflow does
not trap, and then "un-overflows" back into range.  If one codes
this in "unsigned int" arithmetic instead, one gets guaranteed
mod-2-sup-k behavior, and the code is just as small and fast as
the not-guaranteed version.

>That's one of the main things that I like about assembly language, btw: it
>might be all kinds of painful to express an algorithm (although generally
>not really all that bad), but the instruction descriptions in
>the data books tell you *precicely* what each one will do, and
>you can compose your code with no doubts about how it will perform.

Actually, there are a number of instruction sets (for various
machines) that tell you to avoid particular situations with particular
instructions.  Consider the VAX's "movtuc" ("move translated until
character") instruction, which takes a source-and-source-length,
destination (and destination-length?), and translation-table.  The
manual says that the effect of the instruction is unpredictable if
the translation table overlaps with the source (and/or destination?).

Someone put a comment into a piece of assembly code in 4.1BSD that
read "# comet sucks".  I wondered what this was about.

It turns out that whoever implemented the printf engine for the
VAX used "movtuc" to find '%' and '\0' characters, and did the
movtuc with the source string having "infinite" length (actually
65535 bytes, the length being restricted to 16 bits) so that
it often overlapped the translation table.  On the VAX-11/780,
this "worked right" (as in, did what he wanted it to).  On the
VAX-11/750 -- known internally as the "Comet" -- it did not behave
the way he wanted.  The result was that printf() misbehaved for
various programs, because the assembly code depended on
undefined behavior.

(The "fix" applied, along with the comment, was to limit the length
of the source so as not to overlap the table.  Of course, when we
rewrote the printf engine in C for portability and C89 support, we
stopped using movtuc entirely.)
-- 
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40&#4294967295;39.22'N, 111&#4294967295;50.29'W)  +1 801 277 2603
email: forget about it   http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

On 2006-03-27, Ben Pfaff <blp@cs.stanford.edu> wrote:
> Jordan Abel <random832@gmail.com> writes:
>
>> maybe not that in particular, but *p-- past 0 is no less idiomatic than 
>> *p++ past the end.
>
> Really?  It's not in *my* idiom, because I like to write code
> that doesn't gratuitously invoke undefined behavior.

a circular argument when you are defending the decision to leave it 
undefined.

Jordan Abel <random832@gmail.com> writes:

> maybe not that in particular, but *p-- past 0 is no less idiomatic than 
> *p++ past the end.

Really?  It's not in *my* idiom, because I like to write code
that doesn't gratuitously invoke undefined behavior.
-- 
Ben Pfaff 
email: blp@cs.stanford.edu
web: http://benpfaff.org

On 2006-03-27, Arthur J. O'Dwyer <ajonospam@andrew.cmu.edu> wrote:
> That's why precious few C implementations /do/ pointer validity 
> checking in the first place. As I understand it, not even the AS/400's 
> compiler did pointer checking in software; it just did whatever the 
> hardware forced it to. And the hardware check presumably /would/ have 
> gone off at each dereference.

according to others in this thread, apparently not, hence why it checks 
on load.

On 2006-03-27, Richard Bos <rlb@hoekstra-uitgeverij.nl> wrote:
> Andrew Reilly <andrew-newspost@areilly.bpc-users.org> wrote:
>
>> On Fri, 24 Mar 2006 08:20:12 +0000, David Holland wrote:
>> > Because p -= 2, when performed on the pointer 1234:4, tries to deduct
>> > 8 from the offset field. This underflows and traps.
>> 
>> And this is the behaviour that is at odds with idiomatic C.
>
> _Whose_ idiom? No programmer I'd respect writes such code intentionally.
>
> Richard

maybe not that in particular, but *p-- past 0 is no less idiomatic than 
*p++ past the end.

On Mon, 27 Mar 2006, CBFalconer wrote:
> Andrew Reilly wrote:
>> On Mon, 27 Mar 2006 03:07:28 +0000, Dik T. Winter wrote:
>>> Jordan Abel <random832@gmail.com> writes:
>>>> On 2006-03-26, Stephen Sprunk <stephen@sprunk.org> wrote:
>>>>>
>>>>> It simply doesn't make sense to do things that way since the
>>>>> only purpose is to allow violations of the processor's memory
>>>>> protection model.  Work with the model, not against it.

   (FWIW, I agree with Stephen's sentiment. C's memory model seems 
consistent to me: pointers point at objects, or are NULL, or are
garbage, with one special-case exception for pointers that point
"one past" objects. Extending the model to allow pointers that point
"one before" objects, or "ten past," doesn't seem useful enough to
be worth the hassle of defining all the behaviors on overflow, or
what happens if 'x' is "ten past" 'y' in memory, and so on. Just don't
write code that loops backward in an unsafe manner.)

[Proposing a different, flat-memory model for C.]
>> The trap isn't ignored.  There is no trap: the platform's "sane C
>> memory model" compiler and run-time system updated p.array_index
>> to -1 and p.array_base to a.array_base at the third line, as
>> expected. The trap would be left enabled, so that it would
>> actually hit if/when a real pointer was formed from
>> &p.array_base[p.C_pointer_index] if/when *p was ever referenced
>> in the subsequent code.
>>
>> Consequently, the above code leaves p == a, as expected, and no
>> trap is encountered.  Neat, huh?
>
> Nope.  Consider some code such as:
>
>      for (...; ...; ++p) {
>         for (...; ...; ++q) {
>            dothingswith(*p, *q);
>            /* qchecktime */
>         }
>         /* pchecktime */
>      }
>
> With the normal check at pointer creation time, p is checked once
> per iteration of the outer for.  Your way, it is checked at every
> use of *p, which will probably be far more often.  Thus slowing
> down the whole system and bringing snarlers_against_runtime_checks
> out of every crack in the walls.

   Straw man. Every decent compiler does hoisting of loop invariants,
making both checks equivalent. (And if your compiler doesn't hoist
invariants, then you have no business talking about runtime efficiency
in the first place.)

> Checking pointer validity can be an involved process, depending on
> architecture.  It should be avoided, similar to casts, which at
> least are obvious because the programmer writes them in.

   Obviously. That's why precious few C implementations /do/ pointer
validity checking in the first place. As I understand it, not even
the AS/400's compiler did pointer checking in software; it just did
whatever the hardware forced it to. And the hardware check presumably
/would/ have gone off at each dereference.

-Arthur

Andrew Reilly wrote:

> On Mon, 27 Mar 2006 12:59:59 +0100, Chris Dollin wrote:
>> The C standard don't /outlaw/ forming illegal pointer values; they
>> just say that if you do that, they don't say anything more about the
>> behaviour of your code, so if you want defined behaviour, you have
>> to look elsewhere for the definition.
> 
> How much undefined behaviour can you stand?

No more than what's covered by the defined behaviour on the platforms
I'm prepared to support, where `defined` isn't limited to the C
standard but over-enthusiatic uses of random other definitions isn't
desired.

> Sure, your code works OK this
> year, but what if next year's super-optimizer switch takes a different
> reading on some behaviour that you've coded to, because it was
> "universally" supported, but never the less undefined.  Want to chase down
> those bugs?

Were I actively writing C - which at the moment I'm not - I'd have
tests to check behaviour, for this reason among others. 

> How many substantial applications do you suppose are written, that *only*
> use defined behaviours?  I suspect that the answer is very close to none.

That only use behaviour defined by the C standard? Few. That only
use behaviour defined by their intended platforms? Rather more.

>> If you're writing code that has, for whatever reason, to rely on
>> non-C-standard definitions, well then, rely on them. I've written code
>> that relies on non-C-standard behaviour, too - but I didn't expect it to
>> port everywhere, and I didn't expect such use to be a requirement on
>> future standardisation to support it, much as I might like to; the
>> leaves-it-undefined /allows/ the code to work where it works.
> 
> I like C.  A lot.
> 
> I think that it could do to have a few fewer undefined behaviours, and a
> few more defined (obvious) behaviours that you could rely on to describe
> your algorithms.

Well, me too. But that doesn't stop me thinking that the standard seems
to be a reasonable compromise between the different requirements, as 
things stand. 

> That's one of the main things that I like about assembly language, btw: it
> might be all kinds of painful to express an algorithm (although generally
> not really all that bad), but the instruction descriptions in
> the data books tell you *precicely* what each one will do, and
> you can compose your code with no doubts about how it will perform.

The first half is the reason I'd typically stay away from assembly
language, and I'm not convinced about the second unless one goes
into the amount of detail I'd happily leave to the compiler-writer.

-- 
Chris "x.f(y) == f(x, y) == (x, y).f" Dollin
The shortcuts are all full of people using them.

On Mon, 27 Mar 2006 12:59:59 +0100, Chris Dollin wrote:
> The C standard don't /outlaw/ forming illegal pointer values; they
> just say that if you do that, they don't say anything more about the
> behaviour of your code, so if you want defined behaviour, you have
> to look elsewhere for the definition.

How much undefined behaviour can you stand?  Sure, your code works OK this
year, but what if next year's super-optimizer switch takes a different
reading on some behaviour that you've coded to, because it was
"universally" supported, but never the less undefined.  Want to chase down
those bugs?

How many substantial applications do you suppose are written, that *only*
use defined behaviours?  I suspect that the answer is very close to none. 

> If you're writing code that has, for whatever reason, to rely on
> non-C-standard definitions, well then, rely on them. I've written code
> that relies on non-C-standard behaviour, too - but I didn't expect it to
> port everywhere, and I didn't expect such use to be a requirement on
> future standardisation to support it, much as I might like to; the
> leaves-it-undefined /allows/ the code to work where it works.

I like C.  A lot.

I think that it could do to have a few fewer undefined behaviours, and a
few more defined (obvious) behaviours that you could rely on to describe
your algorithms.

That's one of the main things that I like about assembly language, btw: it
might be all kinds of painful to express an algorithm (although generally
not really all that bad), but the instruction descriptions in
the data books tell you *precicely* what each one will do, and
you can compose your code with no doubts about how it will perform.

[I don't read comp.lang.c, so if you want me to see any replies (hah! :-),
you won't take comp.arch.embedded out of the Newsgroups.  Of course, I can
imagine that just about everyone doesn't care, at this stage...]

-- 
Andrew