EmbeddedRelated.com
Forums

Making Fatal Hidden Assumptions

Started by CBFalconer March 6, 2006
On Thu, 23 Mar 2006 22:00:14 +0000, Keith Thompson wrote:
> Because it catches errors as early as possible.
No, it doesn't. It breaks idioms where no error or illegal access would occur.
> Why do you want to create an address that you're not allowed to > dereference?
Because the ability to do so is implied by the syntax of pointer arithmetic. This restriction (undefined semantics IS a restriction) makes pointer-walking versions of algorithms second-class citizens to otherwize equivalent indexed versions of algorithms. void foo(int *p, int n) { for (; --n >= 0;) p[n] = n; } is "legal" and "defined" on all architectures, but the equivalent with a pointer cursor isn't: void foo(int *p, int n) { p += n-1; for (; --n >= 0;) *p-- = n; } I suppose that this is the root of my surprise and annoyance on discovering what the standard says. These versions *should* be equivalent, and equivalently well-defined.
> The C standard doesn't require a trap when you create an invalid > address. It merely declines to define the semantics. If you think the > AS/400 architecture is stupid, that's your privilege. If you want to > write code that creates addresses outside the bounds of an array, nobody > is going to stop you; you'll just have to look somewhere other than the > C standard for guarantees that your code will behave the way you want it > to.
Fine. Where can I find that? Can we make a sub-standard that includes the common semantics for all "normal-looking" architectures, so that our code can rely on them, please?
> Your argument, I suppose, is that the C standard *should* guarantee the > behavior. That may or may not be a good idea -- but I think we can > guarantee that nothing is going to change until and unless someone comes > up with a concrete proposal. I'm not going to do it myself, because I'm > satisfied with the standard as it is (at least in this area). If you do > so, you can expect a great deal of argument about what behavior should > be guaranteed and what should be left implementation-defined (or even > undefined).
Yeah, but I expect most of that argument to go away, as all of the AS/400 wanna-be C coders drift off to use Java or .NET instead, leaving C a niche language, doing the low-level systems programming that it was designed for. -- Andrew
On Thu, 23 Mar 2006 14:16:41 +0000, Dik T. Winter wrote:

> In article <pan.2006.03.23.12.23.55.181121@areilly.bpc-users.org> Andrew Reilly <andrew-newspost@areilly.bpc-users.org> writes: > > On Thu, 23 Mar 2006 12:33:51 +0000, Dik T. Winter wrote: > ... > > How about: > > int a[10]; > > foo(a + 1); > > where > > > > foo(int *p) > > { > > p -= 1; > > /* do something with p[0]..p[9] */ > > } > ... > > Does p -= 1 still trap, in the first line of foo, given the way that it's > > called in the main routine? > > Why should it?
It shouldn't. But if the comment (and the code) went off to do somthing with p[1]..p[10], and the main line passed a, rather than a+1, you're saying that it would trap. The coder, therefore, can't use a perfectly reasonalble idiom (base shifting), even though the syntax, and the semantics implied by that syntax, allow it.
> > If not, how could foo() be compiled in a separate unit, in the AS/400 > > scenario that you described earlier? > > It was not me who described it, but I see no reason why that should be > impossible. Consider a pointer as a combination of the indication of a > region and an index into that region.
I *was* considering a pointer as a combination of the indication of a region and an index into that region. In C, that index is a *signed* integer. If the hardware has a problem with that, and causes a trap if the index component is set to a negative value, then the implementation should go to some lengths to preserve the impression that it works anyway.
> > If it does trap, why? It's not forming an "illegal" pointer, even > > for the AS/400 world. > > It does not trap.
But the author of the foo() function (as modified above) can't know that.
> > If it doesn't trap, why should p -= 1 succeed, but p -= 2 fail? > > Because the latter creates an invalid pointer.
But the author of foo() can't know that. This argument started because it was pointed out that p -= 1 produced undefined behaviour. It is clear that the behaviour *could* be very well defined. That it isn't is the root of the discussion.
> > What if my algorithm's natural expression is to refer to p[0]..p[-9], > > and expects to be handed a pointer to the last element of a[]? > > No problem.
It is a problem if those elements are accessed with a walking pointer, rather than with an array index; something that the syntax of C and most of it's idioms and historical code implies are equivalent. -- Andrew
Andrew Reilly <andrew-newspost@areilly.bpc-users.org> writes:
> On Thu, 23 Mar 2006 22:00:14 +0000, Keith Thompson wrote: >> Because it catches errors as early as possible. > > No, it doesn't. It breaks idioms where no error or illegal access would > occur. > >> Why do you want to create an address that you're not allowed to >> dereference? > > Because the ability to do so is implied by the syntax of pointer > arithmetic.
No, it's not implied by the syntax, any more than the ability to compute MAX_INT + 1 is implied by the syntax of addition. [snip]
>> The C standard doesn't require a trap when you create an invalid >> address. It merely declines to define the semantics. If you think the >> AS/400 architecture is stupid, that's your privilege. If you want to >> write code that creates addresses outside the bounds of an array, nobody >> is going to stop you; you'll just have to look somewhere other than the >> C standard for guarantees that your code will behave the way you want it >> to. > > Fine. Where can I find that? Can we make a sub-standard that includes > the common semantics for all "normal-looking" architectures, so that our > code can rely on them, please?
Sure, go ahead. -- Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst> San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst> We must do something. This is something. Therefore, we must do this.
"Andrew Reilly" <andrew-newspost@areilly.bpc-users.org> wrote in message 
news:pan.2006.03.23.23.05.17.816513@areilly.bpc-users.org...
> On Thu, 23 Mar 2006 22:00:14 +0000, Keith Thompson wrote: >> The C standard doesn't require a trap when you create an invalid >> address. It merely declines to define the semantics. If you think the >> AS/400 architecture is stupid, that's your privilege. If you want to >> write code that creates addresses outside the bounds of an array, nobody >> is going to stop you; you'll just have to look somewhere other than the >> C standard for guarantees that your code will behave the way you want it >> to. > > Fine. Where can I find that? Can we make a sub-standard that includes > the common semantics for all "normal-looking" architectures, so that our > code can rely on them, please?
The problem is that there's really no such thing as a "normal-looking" architecture. Every implementation differs in at least a few fundamental things you'd find it useful to nail down, so to provide enough detail to be meaningful your sub-standard would basically be defining the behavior of a particular implementation. Just about the only thing that all modern machines agree on is CHAR_BIT == 8 (and I bet someone will dispute that). Ints, pointers, address space semantics, etc. are all up for grabs, and that's a good thing -- it allows systems to evolve in useful ways instead of locking us into something that, while appearing optimal today, is not likely to be tomorrow. If you disagree, please list all of the various currently-undefined behaviors you want to define and what implementations conform to your spec. Who knows, ISO might adopt it... S -- Stephen Sprunk "Stupid people surround themselves with smart CCIE #3723 people. Smart people surround themselves with K5SSS smart people who disagree with them." --Aaron Sorkin *** Free account sponsored by SecureIX.com *** *** Encrypt your Internet usage with a free VPN account from http://www.SecureIX.com ***
On 2006-03-24, Stephen Sprunk <stephen@sprunk.org> wrote:
> "Andrew Reilly" <andrew-newspost@areilly.bpc-users.org> wrote in message > news:pan.2006.03.23.23.05.17.816513@areilly.bpc-users.org... >> On Thu, 23 Mar 2006 22:00:14 +0000, Keith Thompson wrote: >>> The C standard doesn't require a trap when you create an invalid >>> address. It merely declines to define the semantics. If you think the >>> AS/400 architecture is stupid, that's your privilege. If you want to >>> write code that creates addresses outside the bounds of an array, nobody >>> is going to stop you; you'll just have to look somewhere other than the >>> C standard for guarantees that your code will behave the way you want it >>> to. >> >> Fine. Where can I find that? Can we make a sub-standard that includes >> the common semantics for all "normal-looking" architectures, so that our >> code can rely on them, please? > > The problem is that there's really no such thing as a "normal-looking" > architecture. Every implementation differs in at least a few > fundamental things you'd find it useful to nail down, so to provide > enough detail to be meaningful your sub-standard would basically be > defining the behavior of a particular implementation. > > Just about the only thing that all modern machines agree on is > CHAR_BIT == 8 (and I bet someone will dispute that). Ints, pointers, > address space semantics, etc. are all up for grabs, and that's a good > thing -- it allows systems to evolve in useful ways instead of locking > us into something that, while appearing optimal today, is not likely > to be tomorrow. > > If you disagree, please list all of the various currently-undefined > behaviors you want to define and what implementations conform to your > spec. Who knows, ISO might adopt it.
How about defining passing a positive signed int to printf %x, %u, %o?, or an unsigned int < INT_MAX to %d? That doesn't seem too unreasonable. It works fine on every existing platform, as far as I know, and it is currently required to work for user-created variadic functions.
"Stephen Sprunk" <stephen@sprunk.org> writes:
[...]
> Just about the only thing that all modern machines agree on is > CHAR_BIT == 8 (and I bet someone will dispute that). Ints, pointers, > address space semantics, etc. are all up for grabs, and that's a good > thing -- it allows systems to evolve in useful ways instead of locking > us into something that, while appearing optimal today, is not likely > to be tomorrow.
I don't know of any modern hosted implementations with CHAR_BIT > 8 (though there have certainly been such systems in the past (though I don't know whether any of them had C compilers)), but CHAR_BIT values of 16 and 32 are common on DSPs (Digital Signal Processors). -- Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst> San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst> We must do something. This is something. Therefore, we must do this.
In article <pan.2006.03.23.23.05.17.816513@areilly.bpc-users.org> Andrew Reilly <andrew-newspost@areilly.bpc-users.org> writes:
...
 > This restriction (undefined semantics IS a restriction) makes
 > pointer-walking versions of algorithms second-class citizens to otherwize
 > equivalent indexed versions of algorithms.
 > 
 > void
 > foo(int *p, int n)
 > {
 > 	for (; --n >= 0;)
 > 		p[n] = n;
 > }
 > 
 > is "legal" and "defined" on all architectures, but the equivalent with a
 > pointer cursor isn't:
 >
 > void
 > foo(int *p, int n)
 > {
 > 	p += n-1;
 > 	for (; --n >= 0;)
 > 		*p-- = n;
 > }

I have no idea on what you base your assertion.  When the first is valid,
the second is valid, and the reverse.  In your first example your first
assignment is to p[n-1] (using the initial value of n), the same for the
second version.  But it is worse:
   void
   foo(int *p, int n)
   {
        p += n;
        for(; --n >= 0)
              *--p = n;
   }
is just as valid.

 > I suppose that this is the root of my surprise and annoyance on
 > discovering what the standard says.  These versions *should* be
 > equivalent, and equivalently well-defined.

They are.
-- 
dik t. winter, cwi, kruislaan 413, 1098 sj  amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn  amsterdam, nederland; http://www.cwi.nl/~dik/
On Fri, 24 Mar 2006 04:53:24 +0000, Dik T. Winter wrote:

> In article <pan.2006.03.23.23.05.17.816513@areilly.bpc-users.org> Andrew Reilly <andrew-newspost@areilly.bpc-users.org> writes: > ... > > This restriction (undefined semantics IS a restriction) makes > > pointer-walking versions of algorithms second-class citizens to otherwize > > equivalent indexed versions of algorithms. > > > > void > > foo(int *p, int n) > > { > > for (; --n >= 0;) > > p[n] = n; > > } > > > > is "legal" and "defined" on all architectures, but the equivalent with a > > pointer cursor isn't: > > > > void > > foo(int *p, int n) > > { > > p += n-1; > > for (; --n >= 0;) > > *p-- = n; > > } > > I have no idea on what you base your assertion. When the first is valid, > the second is valid, and the reverse. In your first example your first > assignment is to p[n-1] (using the initial value of n), the same for the > second version.
But, the second version *finishes* with p pointing to the -1st element of the array, which (we now know) is undefined, and guaranteed to break an AS/400. The first version only finishes with the integer n == -1, and the pointer p is stil "valid". This is the discrepancy that irks me.
> But it is worse: > void > foo(int *p, int n) > { > p += n; > for(; --n >= 0) > *--p = n; > } > is just as valid.
Yes, that one is certainly going to fly, even on the AS/400, as p doesn't ever point to p(initial) - 1. But it is (IMO) less idiomatic than the other construction. Certainly, different people's experience will differ, there, and certainly, different processor architectures often have better or worse support for one form or the other. In my experience, post-modification is more common (or, rather, more often fast, where both are available), but quite a few processors have no specific support for address register increment or decrement addressing modes.
> > I suppose that this is the root of my surprise and annoyance on > > discovering what the standard says. These versions *should* be > > equivalent, and equivalently well-defined. > > They are.
Come again? This is the whole point that people (well, me, anyway) have been arguing about! If they were truly equivalent (and the non-unit-stride cousins), I'd go home happy. -- Andrew
On Thu, 23 Mar 2006 21:41:59 -0600, Stephen Sprunk wrote:
> The problem is that there's really no such thing as a "normal-looking" > architecture. Every implementation differs in at least a few fundamental > things you'd find it useful to nail down, so to provide enough detail to be > meaningful your sub-standard would basically be defining the behavior of a > particular implementation.
Sure there is. All the world's a VAX (but with IEEE float), with plausible exceptions for pointers different length than int. I'd also wear alignment restrictions pretty happily, as long as they're reasonable. Either-endian word significance is fine, too. Show me a "normal-looking" modern architecture that doesn't fit that description, in a material sense. Even most of the DSPs developed in the last ten years fit that mould. Mostly, so that they can run existing C code well. [The few that have been developed in that time frame, which *don't* fit that mould, are not intended to be programmed in C, and there's no reason to expect that they will be.]
> Just about the only thing that all modern machines agree on is CHAR_BIT > == 8 (and I bet someone will dispute that). Ints, pointers, address > space semantics, etc. are all up for grabs, and that's a good thing -- > it allows systems to evolve in useful ways instead of locking us into > something that, while appearing optimal today, is not likely to be > tomorrow. > > If you disagree, please list all of the various currently-undefined > behaviors you want to define and what implementations conform to your > spec. Who knows, ISO might adopt it...
I'd like the pointer memory model to be "flat" in the sense that for p, a pointer to some object, (p += i, p -= i) == p for any int i. (In a fixed word-legnth, non-saturating arithmetic, the "flat" geometry is really circular, or modulo. That's as it should be.) [I'm not interested in arguing multi-processor or multi-thread consistency semantics here. That's traditionally been outside the realm of C, and that's probably an OK thing too, IMO.] Cheers, -- Andrew
On 2006-03-24, Andrew Reilly <andrew-newspost@areilly.bpc-users.org> wrote:
> On Thu, 23 Mar 2006 21:41:59 -0600, Stephen Sprunk wrote: >> The problem is that there's really no such thing as a "normal-looking" >> architecture. Every implementation differs in at least a few fundamental >> things you'd find it useful to nail down, so to provide enough detail to be >> meaningful your sub-standard would basically be defining the behavior of a >> particular implementation. > > Sure there is. All the world's a VAX (but with IEEE float),
Ironic, considering VAXen don't have IEEE float. Why not just say all the world's a 386? Oh, wait, 386 has that segmented-addressing silliness.