EmbeddedRelated.com
Forums

Making Fatal Hidden Assumptions

Started by CBFalconer March 6, 2006
On Wed, 22 Mar 2006, Andrew Reilly wrote:
> On Wed, 22 Mar 2006 03:48:19 +0000, Keith Thompson wrote: >> Andrew Reilly <andrew-newspost@areilly.bpc-users.org> writes: >>> On Wed, 22 Mar 2006 03:03:47 +0000, Dik T. Winter wrote: >>>> "Arthur J. O'Dwyer" <ajonospam@andrew.cmu.edu> writes: >> [...] >>>> > I believe Andrew means >>>> > >>>> > void foo(int & x) >>>> >>>> I thought we were talking about C. >>> >>> We were. Sure, you can put a pointer into a register argument. >> >> "void foo(int & x)" is a syntax error in C. > > I didn't use that syntax. I believe that Arthur was probably confused > by the use of the term "pass by reference"
Well, I certainly wasn't confused by it! :) As far as I could tell, you hadn't been talking about C since several posts back --- in my post, I quoted context referring to "multiple return values" and reference parameters --- the latter present in C++ but not C, and the former present in very few C-like languages. In case you hadn't noticed, this thread has for a long time been crossposted to two groups in which C++ is topical, and one group in which its apparent subject (language design) is topical. If you guys want to talk about standard C, why don't you do that, and forget all about pass-by-reference, multiple return values, AS/400 machine code, and whatever other topics this thread has drifted through on its way here? I've removed c.p and c.a.e from the crosspost list. Feel free to go back to discussing standard C now. ;) -Arthur
CBFalconer wrote:

<snip>

> #define hasNulByte(x) ((x - 0x01010101) & ~x & 0x80808080) > #define SW (sizeof (int) / sizeof (char)) > > int xstrlen (const char *s) { > const char *p; /* 5 */ > int d; > > p = s - 1; > do { > p++; /* 10 */ > if ((((int) p) & (SW - 1)) == 0) { > do { > d = *((int *) p); > p += SW; > } while (!hasNulByte (d)); /* 15 */ > p -= SW; > } > } while (*p != 0); > return p - s; > } /* 20 */ > > Let us start with line 1! The constants appear to require that > sizeof(int) be 4, and that CHAR_BIT be precisely 8. I haven't > really looked too closely, and it is possible that the ~x term > allows for larger sizeof(int), but nothing allows for larger > CHAR_BIT. A further hidden assumption is that there are no trap > values in the representation of an int. Its functioning is > doubtful when sizeof(int) is less that 4. At the least it will > force promotion to long, which will seriously affect the speed. > > This is an ingenious and speedy way of detecting a zero byte within > an int, provided the preconditions are met. There is nothing wrong > with it, PROVIDED we know when it is valid.
<snip> Just incase it hasn't been mentioned [a rather long thread to check!], and might be useful, Google has an interesting summary on finding a nul in a word by one Scott Douglass - posted to c.l.c back in 1993. "Here's the summary of the responses I got to my query asking for the trick of finding a nul in a long word and other bit tricks." http://tinyurl.com/m7uw9 -- ============== Not a pedant ==============
In article <slrne21dct.17jl.random832@random.yi.org>, Jordan Abel <random832@gmail.com> writes:
> On 2006-03-22, Dik T. Winter <Dik.Winter@cwi.nl> wrote: >> In article <slrne21c2l.17jl.random832@random.yi.org> Jordan Abel <random832@gmail.com> writes: >>> On 2006-03-22, Keith Thompson <kst-u@mib.org> wrote: >>>> Andrew Reilly <andrew-newspost@areilly.bpc-users.org> writes: >>>>> And I still say that constraining C for everyone so that it could fit the >>>>> AS/400, rather than making C-on-AS/400 jump through a few more hoops to >>>>> match traditional C behaviour, was the wrong trade-off.
I must have missed the bit in the C Rationale where the committee wrote, "We did this for the AS/400". They probably thought it was obvious, since no other architecture could ever have the same requirements and support C.
>>>> It is. The C standard wouldn't just have to forbid an implementation >>>> from trapping when it loads an invalid address; it would have to >>>> define the behavior of any program that uses such an address. >>> >>> Why? It's not that difficult to define the behavior of a program that >>> "uses" such an address other than by dereferencing, and no problem to >>> leave the behavior undefined for dereferencing
OK, define the behavior of all non-dereferencing accesses on invalid pointers. Be sure to account for systems with non-linear address spaces, since nothing else in the C standard excludes them.
>> But that would have locked out machines that strictly separate pointers >> and non-pointers, in the sense that you can not load a pointer in a >> non-pointer register and the other way around. Note also that on the >> AS/400 a pointer is longer than any integer, so doing arithmetic on them >> in integer registers would require quite a lot.
Yup. The AS/400 has a set of opcodes for manipulating integers, and a different set for manipulating pointers. Nothing in C currently requires it to treat the latter like the former, and I don't see any reason why it should. (Indeed, I admit to being mystified by Andrew Reilly's position; what would be gained by requiring that C implemen- tations have defined behavior for invalid pointers? How is leaving invalid pointer access undefined by the standard "constraining" C?)
> Surely there's some way to catch and ignore the trap from loading an > invalid pointer, though.
No, there is not. The "trap" (a machine check, actually) can be caught, and it can be responded to, by application code; but ignoring it is not one of the options. On the AS/400, only LIC (Licensed Internal Code) can bypass memory protection, and the C implementation is not LIC. The AS/400 uses a Single-Level Store. It has *one* large virtual address space for all user-mode objects in the system: all jobs (the equivalent of processes), all files, all resources of whatever sort. It enforces access restrictions not by giving each process its own virtual address space, but by dynamically granting jobs access to "materialized" subspaces. (This doesn't apply to processes running under PACE, AIUI, but that's a special case.)
> I mean, it stops _somewhere_ even as it is now,
Yes, it stops: if the machine check isn't handled by the application, the job is paused and a message is sent to the appropriate message queue, where a user or operator can respond to it. That happens under LIC control. The C implementation can't override it; if it could, it'd be violating the system's security model. Of course, the C implementation could emulate some other machine with less-vigilant pointer handling by generating some intermediate representation and interpreting it at runtime. That would have made the early AS/400s unusably slow, rather than just annoyingly slow, for C programs. But in any case a favorite maxim of comp.lang.c applies here: what the AS/400, or any other extant implementation, does *does not matter* to the C standard. If we decommissioned all AS/400s today, there might be a new architecture tomorrow with some other good reason for disallowing operations on invalid pointers in C. -- Michael Wojcik michael.wojcik@microfocus.com The lecturer was detailing a proof on the blackboard. He started to say, "From the above it is obvious that ...". Then he stepped back and thought deeply for a while. Then he left the room. We waited. Five minutes later he returned smiling and said, "Yes, it is obvious", and continued to outline the proof. -- John O'Gorman
In article <pan.2006.03.22.04.53.18.119882@areilly.bpc-users.org>, Andrew Reilly <andrew-newspost@areilly.bpc-users.org> writes:
> > You don't have to do that at all. As you said, AS/400 uses long, > decorative pointers that are longer than integers. So no one's going > to notice if what your C compiler calls a pointer is actually a (base, > index) tuple, underneath. Being object/capability machines, these > tuples point to whole arrays, not just individual bytes or words. The > compilers could quite easily have managed all of C's pointer arithmetic as > actual arithmetic, using integers and indices, and only used or formed > real AS/400 pointers when the code did memory references (as base[index]).
That would break inter-language calls, which were an absolute necessity in early AS/400 C implementations (notably EPM C), as they were unable to use some system facilities (such as communications) directly. Prior to the ILE environment, there was no "linker" as such for most (all?) AS/400 application programming languages. Source files were compiled into separate program objects (*PGM objects) in the filesystem. Calls with external linkage were resolved dynamically. (This is closer to the external-call model COBOL uses, actually, so it made sense for the 400's primary audience.) It would have been a real mess if the C implementation had to figure out, on every external call passing a pointer, whether the target was C (and so could use special fake C pointers) or not (and so needed real AS/400 pointers). Putting this burden on the C programmer would not have improved the situation. And, of course, pointers in aggregate data types would pose a real problem. If a C program wanted to define a struct that corresponded to a COBOL group item, that would've been a right pain. Obviously, it's an implementation-specific task anyway, but on most implementa- tions it's pretty straightforward provided the COBOL item doesn't use any of COBOL's oddball data types. That doesn't mean it couldn't have been done, of course, but it would have made C - already not a member of the popular crowd on the '400 playground - too cumbersome for all but the most determined fans. As the Rationale notes, one of the guiding principles behind C is to do things the way the machine wants to do them. That introduces many incompatibilities between implementations, but has rewards of its own. Since C is rather unusual among HLLs in this respect, why not let it stick to its guns rather than asking it to ape all those other languages by hiding the machine behind its own set-dressing? -- Michael Wojcik michael.wojcik@microfocus.com Aw, shucks. And I was just trying to be rude. -- P.J. Plauger
On 2006-03-23, Michael Wojcik <mwojcik@newsguy.com> wrote:
> > In article <slrne21dct.17jl.random832@random.yi.org>, Jordan Abel <random832@gmail.com> writes: >> On 2006-03-22, Dik T. Winter <Dik.Winter@cwi.nl> wrote: >>> In article <slrne21c2l.17jl.random832@random.yi.org> Jordan Abel <random832@gmail.com> writes: >>>> On 2006-03-22, Keith Thompson <kst-u@mib.org> wrote: >>>>> Andrew Reilly <andrew-newspost@areilly.bpc-users.org> writes: >>>>>> And I still say that constraining C for everyone so that it could >>>>>> fit the AS/400, rather than making C-on-AS/400 jump through a few >>>>>> more hoops to match traditional C behaviour, was the wrong >>>>>> trade-off. > > I must have missed the bit in the C Rationale where the committee > wrote, "We did this for the AS/400". They probably thought it was > obvious, since no other architecture could ever have the same > requirements and support C. > >>>>> It is. The C standard wouldn't just have to forbid an implemen- >>>>> tation from trapping when it loads an invalid address; it would >>>>> have to define the behavior of any program that uses such an >>>>> address. >>>> >>>> Why? It's not that difficult to define the behavior of a program >>>> that "uses" such an address other than by dereferencing, and no >>>> problem to leave the behavior undefined for dereferencing > > OK, define the behavior of all non-dereferencing accesses on invalid > pointers. Be sure to account for systems with non-linear address > spaces, since nothing else in the C standard excludes them.
unspecified result, implementation-defined, compares equal, unspecified result, unspecified result. There, that was easy.
>>> But that would have locked out machines that strictly separate >>> pointers and non-pointers, in the sense that you can not load a >>> pointer in a non-pointer register and the other way around. Note >>> also that on the AS/400 a pointer is longer than any integer, so >>> doing arithmetic on them in integer registers would require quite a >>> lot. > > Yup. The AS/400 has a set of opcodes for manipulating integers, and a > different set for manipulating pointers. Nothing in C currently > requires it to treat the latter like the former, and I don't see any > reason why it should. (Indeed, I admit to being mystified by Andrew > Reilly's position; what would be gained by requiring that C implemen- > tations have defined behavior for invalid pointers? How is leaving > invalid pointer access undefined by the standard "constraining" C?)
It constrains code, in a way. Existing code is more important than existing implementations, right?
> >> Surely there's some way to catch and ignore the trap from loading an >> invalid pointer, though. > > No, there is not. The "trap" (a machine check, actually) can be > caught, and it can be responded to, by application code; but ignoring > it is not one of the options.
You can't "catch it and do nothing"? What are you expected to _do_ about an invalid or protected address being loaded [not dereferenced], anyway? What _can_ you do, having caught the machine check? What responses are typical?
> On the AS/400, only LIC (Licensed Internal Code) can bypass memory > protection, and the C implementation is not LIC. > > The AS/400 uses a Single-Level Store. It has *one* large virtual > address space for all user-mode objects in the system: all jobs (the > equivalent of processes), all files, all resources of whatever sort. > It enforces access restrictions not by giving each process its own > virtual address space, but by dynamically granting jobs access to > "materialized" subspaces. (This doesn't apply to processes running > under PACE, AIUI, but that's a special case.)
And why is anything but a dereference an "access" to the protected address?
> >> I mean, it stops _somewhere_ even as it is now, > > Yes, it stops: if the machine check isn't handled by the application,
What can the application do in the handler? Why couldn't a C implementation cause all C programs to have a handler that does something reasonable?
> the job is paused and a message is sent to the appropriate message > queue, where a user or operator can respond to it. > > That happens under LIC control. The C implementation can't override > it; if it could, it'd be violating the system's security model.
I didn't say override. I said ignore. Since it's not a dereference, no harm actually done. Why does loading a protected address into a register violate security?
In article <slrne24d8u.9cv.random832@random.yi.org> Jordan Abel <random832@gmail.com> writes:
 > On 2006-03-23, Michael Wojcik <mwojcik@newsguy.com> wrote:
...
 > > No, there is not. The "trap" (a machine check, actually) can be 
 > > caught, and it can be responded to, by application code; but ignoring 
 > > it is not one of the options.
 > 
 > You can't "catch it and do nothing"? What are you expected to _do_ about 
 > an invalid or protected address being loaded [not dereferenced], anyway? 
 > What _can_ you do, having caught the machine check? What responses are 
 > typical?

Consider:
    int a[10], *p;

    p = a - 1;
    p = p + 1;

The first line of code traps, you want to ignore that trap, so what
is p after that line of code?  Nothing useful, because nothing was
assigned to it.  Well, the second line also traps, but what is the
sense in doing nothing here?  If you do nothing p is still just as
undefined as before that line.
-- 
dik t. winter, cwi, kruislaan 413, 1098 sj  amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn  amsterdam, nederland; http://www.cwi.nl/~dik/
On Thu, 23 Mar 2006 12:33:51 +0000, Dik T. Winter wrote:
> Consider: > int a[10], *p; > > p = a - 1; > p = p + 1;
How about: int a[10]; foo(a + 1); where foo(int *p) { p -= 1; /* do something with p[0]..p[9] */ }
> The first line of code traps, you want to ignore that trap, so what > is p after that line of code? Nothing useful, because nothing was > assigned to it. Well, the second line also traps, but what is the > sense in doing nothing here? If you do nothing p is still just as > undefined as before that line.
Does p -= 1 still trap, in the first line of foo, given the way that it's called in the main routine? If not, how could foo() be compiled in a separate unit, in the AS/400 scenario that you described earlier? If it does trap, why? It's not forming an "illegal" pointer, even for the AS/400 world. If it doesn't trap, why should p -= 1 succeed, but p -= 2 fail? What if my algorithm's natural expression is to refer to p[0]..p[-9], and expects to be handed a pointer to the last element of a[]? The significant difference of C, to other languages (besides the assembly language of most architectures) is that you can form, store, and use as arguments pointers into the middle of "objects". Given that difference, the memory model is obvious, and the constraint imposed by the "undefined" elements of the standard (laboured in this thread) unreasonably onerous. IMO. YMMV. Cheers, -- Andrew
In article <pan.2006.03.23.12.23.55.181121@areilly.bpc-users.org> Andrew Reilly <andrew-newspost@areilly.bpc-users.org> writes:
 > On Thu, 23 Mar 2006 12:33:51 +0000, Dik T. Winter wrote:
...
 > How about:
 >      int a[10];
 >      foo(a + 1);
 > where
 > 
 > foo(int *p)
 > {
 > 	p -= 1;
 > 	/* do something with p[0]..p[9] */
 > }
...
 > Does p -= 1 still trap, in the first line of foo, given the way that it's
 > called in the main routine?

Why should it?

 > If not, how could foo() be compiled in a separate unit, in the AS/400
 > scenario that you described earlier?

It was not me who described it, but I see no reason why that should be
impossible.  Consider a pointer as a combination of the indication of a
region and an index into that region.

 > If it does trap, why?  It's not forming an "illegal" pointer, even for the
 > AS/400 world.

It does not trap.

 > If it doesn't trap, why should p -= 1 succeed, but p -= 2 fail?

Because the latter creates an invalid pointer.

 > What if my algorithm's natural expression is to refer to p[0]..p[-9], and
 > expects to be handed a pointer to the last element of a[]?

No problem.

 > The significant difference of C, to other languages (besides the
 > assembly language of most architectures) is that you can form, store, and
 > use as arguments pointers into the middle of "objects".

That was also possible in Algol 68.  But I see no problem with it on
a machine like the AS/400.

(As an example from Algol 68:
    'int' a[1:10, 1:10, 1:10];
    'ref' 'int' aa = a[2:6, 3:7,4];
the latter points to a two-dimensional slice...)
-- 
dik t. winter, cwi, kruislaan 413, 1098 sj  amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn  amsterdam, nederland; http://www.cwi.nl/~dik/
On 2006-03-23, Dik T. Winter <Dik.Winter@cwi.nl> wrote:
> In article <slrne24d8u.9cv.random832@random.yi.org> Jordan Abel <random832@gmail.com> writes: > > On 2006-03-23, Michael Wojcik <mwojcik@newsguy.com> wrote: > ... > > > No, there is not. The "trap" (a machine check, actually) can be > > > caught, and it can be responded to, by application code; but ignoring > > > it is not one of the options. > > > > You can't "catch it and do nothing"? What are you expected to _do_ about > > an invalid or protected address being loaded [not dereferenced], anyway? > > What _can_ you do, having caught the machine check? What responses are > > typical? > > Consider: > int a[10], *p; > > p = a - 1; > p = p + 1; > > The first line of code traps, you want to ignore that trap, so what is > p after that line of code? Nothing useful, because nothing was > assigned to it.
Why not? I guess my point is, why are you not allowed to hold an address in a register regardless of whether you would be allowed to access the memory at that address? That seems like a stupid architecture in the first place. It's not "security", it's bad design masquerading as security. If the goal is to protect an area of memory from being accessed, block programs from _accessing_ it, not from talking about it.
> Well, the second line also traps, but what is the sense in doing > nothing here? If you do nothing p is still just as undefined as > before that line.
Only under the current standard. Using the fact that it's undefined to justify it being undefined is begging the question.
Jordan Abel <random832@gmail.com> writes:
> On 2006-03-23, Dik T. Winter <Dik.Winter@cwi.nl> wrote:
[...]
>> Consider: >> int a[10], *p; >> >> p = a - 1; >> p = p + 1; >> >> The first line of code traps, you want to ignore that trap, so what is >> p after that line of code? Nothing useful, because nothing was >> assigned to it. > > Why not?
Because the trap occurred before the value was assigned to p.
> I guess my point is, why are you not allowed to hold an address in a > register regardless of whether you would be allowed to access the memory > at that address?
Because it catches errors as early as possible. Why do you want to create an address that you're not allowed to dereference?
> That seems like a stupid architecture in the first > place. It's not "security", it's bad design masquerading as security. If > the goal is to protect an area of memory from being accessed, block > programs from _accessing_ it, not from talking about it.
Presumably it does both. The C standard doesn't require a trap when you create an invalid address. It merely declines to define the semantics. If you think the AS/400 architecture is stupid, that's your privilege. If you want to write code that creates addresses outside the bounds of an array, nobody is going to stop you; you'll just have to look somewhere other than the C standard for guarantees that your code will behave the way you want it to. Your argument, I suppose, is that the C standard *should* guarantee the behavior. That may or may not be a good idea -- but I think we can guarantee that nothing is going to change until and unless someone comes up with a concrete proposal. I'm not going to do it myself, because I'm satisfied with the standard as it is (at least in this area). If you do so, you can expect a great deal of argument about what behavior should be guaranteed and what should be left implementation-defined (or even undefined). -- Keith Thompson (The_Other_Keith) kst-u@mib.org <http://www.ghoti.net/~kst> San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst> We must do something. This is something. Therefore, we must do this.