Making Fatal Hidden Assumptions| page 32

Reply by Andrew Reilly ●March 23, 20062006-03-23

On Thu, 23 Mar 2006 22:00:14 +0000, Keith Thompson wrote:
> Because it catches errors as early as possible.

No, it doesn't.  It breaks idioms where no error or illegal access would
occur.

> Why do you want to create an address that you're not allowed to
> dereference?

Because the ability to do so is implied by the syntax of pointer
arithmetic.

This restriction (undefined semantics IS a restriction) makes
pointer-walking versions of algorithms second-class citizens to otherwize
equivalent indexed versions of algorithms.

void
foo(int *p, int n)
{
	for (; --n >= 0;)
		p[n] = n;
}

is "legal" and "defined" on all architectures, but the equivalent with a
pointer cursor isn't:

void
foo(int *p, int n)
{
	p += n-1;
	for (; --n >= 0;)
		*p-- = n;
}

I suppose that this is the root of my surprise and annoyance on
discovering what the standard says.  These versions *should* be
equivalent, and equivalently well-defined.

> The C standard doesn't require a trap when you create an invalid
> address.  It merely declines to define the semantics.  If you think the
> AS/400 architecture is stupid, that's your privilege.  If you want to
> write code that creates addresses outside the bounds of an array, nobody
> is going to stop you; you'll just have to look somewhere other than the
> C standard for guarantees that your code will behave the way you want it
> to.

Fine.  Where can I find that?  Can we make a sub-standard that includes
the common semantics for all "normal-looking" architectures, so that our
code can rely on them, please?

> Your argument, I suppose, is that the C standard *should* guarantee the
> behavior.  That may or may not be a good idea -- but I think we can
> guarantee that nothing is going to change until and unless someone comes
> up with a concrete proposal.  I'm not going to do it myself, because I'm
> satisfied with the standard as it is (at least in this area).  If you do
> so, you can expect a great deal of argument about what behavior should
> be guaranteed and what should be left implementation-defined (or even
> undefined).

Yeah, but I expect most of that argument to go away, as all of the AS/400
wanna-be C coders drift off to use Java or .NET instead, leaving C a niche
language, doing the low-level systems programming that it was designed for.

-- 
Andrew

Reply by Andrew Reilly ●March 23, 20062006-03-23

On Thu, 23 Mar 2006 14:16:41 +0000, Dik T. Winter wrote:

> In article <pan.2006.03.23.12.23.55.181121@areilly.bpc-users.org> Andrew Reilly <andrew-newspost@areilly.bpc-users.org> writes:
>  > On Thu, 23 Mar 2006 12:33:51 +0000, Dik T. Winter wrote:
> ...
>  > How about:
>  >      int a[10];
>  >      foo(a + 1);
>  > where
>  > 
>  > foo(int *p)
>  > {
>  > 	p -= 1;
>  > 	/* do something with p[0]..p[9] */
>  > }
> ...
>  > Does p -= 1 still trap, in the first line of foo, given the way that it's
>  > called in the main routine?
> 
> Why should it?

It shouldn't.  But if the comment (and the code) went off to do somthing
with p[1]..p[10], and the main line passed a, rather than a+1, you're
saying that it would trap.  The coder, therefore, can't use a perfectly
reasonalble idiom (base shifting), even though the syntax, and the
semantics implied by that syntax, allow it.

>  > If not, how could foo() be compiled in a separate unit, in the AS/400
>  > scenario that you described earlier?
> 
> It was not me who described it, but I see no reason why that should be
> impossible.  Consider a pointer as a combination of the indication of a
> region and an index into that region.

I *was* considering a pointer as a combination of the indication of a
region and an index into that region.  In C, that index is a *signed*
integer.  If the hardware has a problem with that, and causes a trap if
the index component is set to a negative value, then the implementation
should go to some lengths to preserve the impression that it works anyway.

>  > If it does trap, why?  It's not forming an "illegal" pointer, even
>  > for the AS/400 world.
> 
> It does not trap.

But the author of the foo() function (as modified above) can't know that.

>  > If it doesn't trap, why should p -= 1 succeed, but p -= 2 fail?
> 
> Because the latter creates an invalid pointer.

But the author of foo() can't know that.  This argument started because it
was pointed out that p -= 1 produced undefined behaviour.  It is clear
that the behaviour *could* be very well defined.  That it isn't is the
root of the discussion.

>  > What if my algorithm's natural expression is to refer to p[0]..p[-9],
>  > and expects to be handed a pointer to the last element of a[]?
> 
> No problem.

It is a problem if those elements are accessed with a walking pointer,
rather than with an array index; something that the syntax of C and
most of it's idioms and historical code implies are equivalent.

-- 
Andrew

Reply by Keith Thompson ●March 23, 20062006-03-23

Andrew Reilly <andrew-newspost@areilly.bpc-users.org> writes:
> On Thu, 23 Mar 2006 22:00:14 +0000, Keith Thompson wrote:
>> Because it catches errors as early as possible.
>
> No, it doesn't.  It breaks idioms where no error or illegal access would
> occur.
>
>> Why do you want to create an address that you're not allowed to
>> dereference?
>
> Because the ability to do so is implied by the syntax of pointer
> arithmetic.

No, it's not implied by the syntax, any more than the ability to
compute MAX_INT + 1 is implied by the syntax of addition.

[snip]

>> The C standard doesn't require a trap when you create an invalid
>> address.  It merely declines to define the semantics.  If you think the
>> AS/400 architecture is stupid, that's your privilege.  If you want to
>> write code that creates addresses outside the bounds of an array, nobody
>> is going to stop you; you'll just have to look somewhere other than the
>> C standard for guarantees that your code will behave the way you want it
>> to.
>
> Fine.  Where can I find that?  Can we make a sub-standard that includes
> the common semantics for all "normal-looking" architectures, so that our
> code can rely on them, please?

Sure, go ahead.

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
San Diego Supercomputer Center             <*>  <http://users.sdsc.edu/~kst>
We must do something.  This is something.  Therefore, we must do this.

Reply by Stephen Sprunk ●March 23, 20062006-03-23

"Andrew Reilly" <andrew-newspost@areilly.bpc-users.org> wrote in message 
news:pan.2006.03.23.23.05.17.816513@areilly.bpc-users.org...
> On Thu, 23 Mar 2006 22:00:14 +0000, Keith Thompson wrote:
>> The C standard doesn't require a trap when you create an invalid
>> address.  It merely declines to define the semantics.  If you think the
>> AS/400 architecture is stupid, that's your privilege.  If you want to
>> write code that creates addresses outside the bounds of an array, nobody
>> is going to stop you; you'll just have to look somewhere other than the
>> C standard for guarantees that your code will behave the way you want it
>> to.
>
> Fine.  Where can I find that?  Can we make a sub-standard that includes
> the common semantics for all "normal-looking" architectures, so that our
> code can rely on them, please?

The problem is that there's really no such thing as a "normal-looking" 
architecture.  Every implementation differs in at least a few fundamental 
things you'd find it useful to nail down, so to provide enough detail to be 
meaningful your sub-standard would basically be defining the behavior of a 
particular implementation.

Just about the only thing that all modern machines agree on is CHAR_BIT == 8 
(and I bet someone will dispute that).  Ints, pointers, address space 
semantics, etc. are all up for grabs, and that's a good thing -- it allows 
systems to evolve in useful ways instead of locking us into something that, 
while appearing optimal today, is not likely to be tomorrow.

If you disagree, please list all of the various currently-undefined 
behaviors you want to define and what implementations conform to your spec. 
Who knows, ISO might adopt it...

S

-- 
Stephen Sprunk        "Stupid people surround themselves with smart
CCIE #3723           people.  Smart people surround themselves with
K5SSS         smart people who disagree with them."  --Aaron Sorkin 

*** Free account sponsored by SecureIX.com ***
*** Encrypt your Internet usage with a free VPN account from http://www.SecureIX.com ***

Reply by Jordan Abel ●March 23, 20062006-03-23

On 2006-03-24, Stephen Sprunk <stephen@sprunk.org> wrote:
> "Andrew Reilly" <andrew-newspost@areilly.bpc-users.org> wrote in message 
> news:pan.2006.03.23.23.05.17.816513@areilly.bpc-users.org...
>> On Thu, 23 Mar 2006 22:00:14 +0000, Keith Thompson wrote:
>>> The C standard doesn't require a trap when you create an invalid
>>> address.  It merely declines to define the semantics.  If you think the
>>> AS/400 architecture is stupid, that's your privilege.  If you want to
>>> write code that creates addresses outside the bounds of an array, nobody
>>> is going to stop you; you'll just have to look somewhere other than the
>>> C standard for guarantees that your code will behave the way you want it
>>> to.
>>
>> Fine.  Where can I find that?  Can we make a sub-standard that includes
>> the common semantics for all "normal-looking" architectures, so that our
>> code can rely on them, please?
>
> The problem is that there's really no such thing as a "normal-looking" 
> architecture.  Every implementation differs in at least a few 
> fundamental things you'd find it useful to nail down, so to provide 
> enough detail to be meaningful your sub-standard would basically be 
> defining the behavior of a particular implementation.
>
> Just about the only thing that all modern machines agree on is 
> CHAR_BIT == 8 (and I bet someone will dispute that).  Ints, pointers, 
> address space semantics, etc. are all up for grabs, and that's a good 
> thing -- it allows systems to evolve in useful ways instead of locking 
> us into something that, while appearing optimal today, is not likely 
> to be tomorrow.
>
> If you disagree, please list all of the various currently-undefined 
> behaviors you want to define and what implementations conform to your 
> spec. Who knows, ISO might adopt it.

How about defining passing a positive signed int to printf %x, %u, %o?, 
or an unsigned int < INT_MAX to %d? That doesn't seem too unreasonable. 
It works fine on every existing platform, as far as I know, and it is 
currently required to work for user-created variadic functions.

Reply by Keith Thompson ●March 23, 20062006-03-23

"Stephen Sprunk" <stephen@sprunk.org> writes:
[...]
> Just about the only thing that all modern machines agree on is
> CHAR_BIT == 8 (and I bet someone will dispute that).  Ints, pointers,
> address space semantics, etc. are all up for grabs, and that's a good
> thing -- it allows systems to evolve in useful ways instead of locking
> us into something that, while appearing optimal today, is not likely
> to be tomorrow.

I don't know of any modern hosted implementations with CHAR_BIT > 8
(though there have certainly been such systems in the past (though I
don't know whether any of them had C compilers)), but CHAR_BIT values
of 16 and 32 are common on DSPs (Digital Signal Processors).

-- 
Keith Thompson (The_Other_Keith) kst-u@mib.org  <http://www.ghoti.net/~kst>
San Diego Supercomputer Center             <*>  <http://users.sdsc.edu/~kst>
We must do something.  This is something.  Therefore, we must do this.

Reply by ●March 23, 20062006-03-23

In article <pan.2006.03.23.23.05.17.816513@areilly.bpc-users.org> Andrew Reilly <andrew-newspost@areilly.bpc-users.org> writes:
...
 > This restriction (undefined semantics IS a restriction) makes
 > pointer-walking versions of algorithms second-class citizens to otherwize
 > equivalent indexed versions of algorithms.
 > 
 > void
 > foo(int *p, int n)
 > {
 > 	for (; --n >= 0;)
 > 		p[n] = n;
 > }
 > 
 > is "legal" and "defined" on all architectures, but the equivalent with a
 > pointer cursor isn't:
 >
 > void
 > foo(int *p, int n)
 > {
 > 	p += n-1;
 > 	for (; --n >= 0;)
 > 		*p-- = n;
 > }

I have no idea on what you base your assertion.  When the first is valid,
the second is valid, and the reverse.  In your first example your first
assignment is to p[n-1] (using the initial value of n), the same for the
second version.  But it is worse:
   void
   foo(int *p, int n)
   {
        p += n;
        for(; --n >= 0)
              *--p = n;
   }
is just as valid.

 > I suppose that this is the root of my surprise and annoyance on
 > discovering what the standard says.  These versions *should* be
 > equivalent, and equivalently well-defined.

They are.
-- 
dik t. winter, cwi, kruislaan 413, 1098 sj  amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn  amsterdam, nederland; http://www.cwi.nl/~dik/

Reply by Andrew Reilly ●March 24, 20062006-03-24

On Fri, 24 Mar 2006 04:53:24 +0000, Dik T. Winter wrote:

> In article <pan.2006.03.23.23.05.17.816513@areilly.bpc-users.org> Andrew Reilly <andrew-newspost@areilly.bpc-users.org> writes:
> ...
>  > This restriction (undefined semantics IS a restriction) makes
>  > pointer-walking versions of algorithms second-class citizens to otherwize
>  > equivalent indexed versions of algorithms.
>  > 
>  > void
>  > foo(int *p, int n)
>  > {
>  > 	for (; --n >= 0;)
>  > 		p[n] = n;
>  > }
>  > 
>  > is "legal" and "defined" on all architectures, but the equivalent with a
>  > pointer cursor isn't:
>  >
>  > void
>  > foo(int *p, int n)
>  > {
>  > 	p += n-1;
>  > 	for (; --n >= 0;)
>  > 		*p-- = n;
>  > }
> 
> I have no idea on what you base your assertion.  When the first is valid,
> the second is valid, and the reverse.  In your first example your first
> assignment is to p[n-1] (using the initial value of n), the same for the
> second version.

But, the second version *finishes* with p pointing to the -1st element of
the array, which (we now know) is undefined, and guaranteed to break an
AS/400.  The first version only finishes with the integer n == -1, and the
pointer p is stil "valid".  This is the discrepancy that irks me.

>  But it is worse:
>    void
>    foo(int *p, int n)
>    {
>         p += n;
>         for(; --n >= 0)
>               *--p = n;
>    }
> is just as valid.

Yes, that one is certainly going to fly, even on the AS/400, as p doesn't
ever point to p(initial) - 1.  But it is (IMO) less idiomatic than the
other construction.  Certainly, different people's experience will differ,
there, and certainly, different processor architectures often have better
or worse support for one form or the other.  In my experience,
post-modification is more common (or, rather, more often fast, where both
are available), but quite a few processors have no specific support for
address register increment or decrement addressing modes.

>  > I suppose that this is the root of my surprise and annoyance on
>  > discovering what the standard says.  These versions *should* be
>  > equivalent, and equivalently well-defined.
> 
> They are.

Come again?  This is the whole point that people (well, me, anyway) have
been arguing about!  If they were truly equivalent (and the
non-unit-stride cousins), I'd go home happy.

-- 
Andrew

Reply by Andrew Reilly ●March 24, 20062006-03-24

On Thu, 23 Mar 2006 21:41:59 -0600, Stephen Sprunk wrote:
> The problem is that there's really no such thing as a "normal-looking" 
> architecture.  Every implementation differs in at least a few fundamental 
> things you'd find it useful to nail down, so to provide enough detail to be 
> meaningful your sub-standard would basically be defining the behavior of a 
> particular implementation.

Sure there is.  All the world's a VAX (but with IEEE float), with
plausible exceptions for pointers different length than int.  I'd also
wear alignment restrictions pretty happily, as long as they're reasonable.
Either-endian word significance is fine, too.  Show me a "normal-looking"
modern architecture that doesn't fit that description, in a material
sense.  Even most of the DSPs developed in the last ten years fit that
mould.  Mostly, so that they can run existing C code well.  [The few that
have been developed in that time frame, which *don't* fit that mould, are
not intended to be programmed in C, and there's no reason to expect that
they will be.]

> Just about the only thing that all modern machines agree on is CHAR_BIT
> == 8 (and I bet someone will dispute that).  Ints, pointers, address
> space semantics, etc. are all up for grabs, and that's a good thing --
> it allows systems to evolve in useful ways instead of locking us into
> something that, while appearing optimal today, is not likely to be
> tomorrow.
> 
> If you disagree, please list all of the various currently-undefined
> behaviors you want to define and what implementations conform to your
> spec. Who knows, ISO might adopt it...

I'd like the pointer memory model to be "flat" in the sense that for p, a
pointer to some object, (p += i, p -= i) == p for any int i.  (In a fixed
word-legnth, non-saturating arithmetic, the "flat" geometry is really
circular, or modulo.  That's as it should be.)

[I'm not interested in arguing multi-processor or multi-thread consistency
semantics here.  That's traditionally been outside the realm of C, and
that's probably an OK thing too, IMO.]

Cheers,

-- 
Andrew

Reply by Jordan Abel ●March 24, 20062006-03-24

On 2006-03-24, Andrew Reilly <andrew-newspost@areilly.bpc-users.org> wrote:
> On Thu, 23 Mar 2006 21:41:59 -0600, Stephen Sprunk wrote:
>> The problem is that there's really no such thing as a "normal-looking" 
>> architecture.  Every implementation differs in at least a few fundamental 
>> things you'd find it useful to nail down, so to provide enough detail to be 
>> meaningful your sub-standard would basically be defining the behavior of a 
>> particular implementation.
>
> Sure there is.  All the world's a VAX (but with IEEE float), 

Ironic, considering VAXen don't have IEEE float. Why not just say all 
the world's a 386? Oh, wait, 386 has that segmented-addressing 
silliness.