On 14-12-06 17:07 , David Brown wrote:> On 05/12/14 20:31, Niklas Holsti wrote: >> On 14-12-05 20:45 , Ed Prochak wrote: >>> I always viewed C as the universal assembly language. >> >> It may have been that, in the past, before the standardisation and >> before the compilers became ambitious about optimisation and code speed. >> Nowadays, standard C has more "gotchas" and hard-to-remember rules than >> a typical real assembly language (reference: recent discussions on >> comp.arch about gcc "miscompiling" typical C programs, because gcc >> assumes that C code with undefined behaviour, per the standard, can do >> anything.) >> > > I haven't seen the thread in comp.arch, but do you have any particular > situations in mind?The subject of the thread was "If It Were Easy...". IIRC, some of the things discussed in the thread were strict aliasing rules and other pointer-punning and type conversion issues -- apparently memcpy() is the only well-defined way, and the traditional "union" trick is not. But surprise! gcc may optimise away a memcpy() call, possibly just reusing the source data in situ. Also discussed was one case in which code in the Linux kernel first dereferenced a pointer, and then tested if the pointer was null -- gcc omitted the test, because the dereference would cause undefined behaviour for a null pointer, so only the non-null case needed to be compiled. The programmer apparently knew that, in the kernel context, a null pointer can be dereferenced without harm. But what was done after the test failed badly if the pointer was null. I think there was also mention of gcc making loops eternal, or deleting them entirely (I don't remember which) if loop termination depends on signed-integer overflow. It was a long and bitterly argued thread, where the "traditionalist" C-is-a-portable-assembler advocates essentially claimed that the C standard committees and gcc maintainers are pushing C to become too much a high-level language and are destroying its predictability for low-level programming, except in the hands of very careful programmers.> And what do you suggest gcc /should/ do about undefined behaviour? Make > wild guesses about what it thinks the user actually intended?I don't have much of an opinion (I avoid using C when I can). I think I see the point on both sides of the argument. The traditionalists want C compilers that emit machine code that "does the same thing" as the source code: if there is a pointer dereference in the source, there should be an indirect load/store in machine code, even if the pointer might be null; if the source tests for a null pointer, there should be a comparison instruction and conditional branch in the code, even if earlier code has done something that makes the behaviour undefined if the pointer is null. The modernists want C to have a well-defined standard and application portability where possible, which unfortunately (given what C is like) means that many things one can write in C, and even compile, will have undefined behaviour across implementations -- but whether the behaviour is defined or undefined (or something in between, such as implementation defined) often depends on run-time dynamic things, so the compiler cannot just reject such code. What IMO is doubtful is for the compiler to latch on to the possibility of undefined behaviour of some part of the code, under some circumstances, and eagerly assume that in those circumstances it does not matter what the code does, in that part or following parts. The modernists claimed on comp.arch that this compiler behaviour follows from the general code optimisation methods, and that it would be hard to report, as warnings, optimisations that depend on the "no-undefined-behaviour" assumption. I'm not quite convinced about that.> In some cases, the compiler will allow the user to define the behaviour > - such as by compiler flags that make signed integers overflow as two's > complement (even though code will almost never use such a "feature", and > changing it reduces some optimisation opportunities in good code). In > many other cases, undefined behaviour is fairly obvious if the > programmer things about it (and programmers /should/ think!) - dividing > by zero is undefined, so the compiler can assume that you don't care > what will happen if you try it.But in some assembler programs, specific run-time errors such as divide-by-zero are sometimes triggered on purpose. Those who see C as a portable assembler would like the expression "1/0" to generate a division causing this error, even if the behaviour is undefined in the C standard. I agree that, formally speaking, C was never a "portable assembler". It was just the simple compilers that made it appear so. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
Modern debuggers cause bad code quality
Started by ●December 2, 2014
Reply by ●December 7, 20142014-12-07
Reply by ●December 7, 20142014-12-07
On 06.12.2014 г. 21:59, Les Cargill wrote:> Dimiter_Popoff wrote: >> On 05.12.2014 г. 20:40, Les Cargill wrote: >>> Dimiter_Popoff wrote: >>>> On 05.12.2014 г. 15:41, Les Cargill wrote: >>>>> Dimiter_Popoff wrote: >>>>>> On 04.12.2014 г. 12:51, Oliver Betz wrote: >>>>>>> Paul E Bennett wrote: >>>>>>> >>>>>>> [...] >>>>>>> >>>>>>>>> Could it be that today's sophisticated tools lead to more "try and >>>>>>>>> error", less thinking before doing? >>>>>>>> >>>>>>>> Talk about cats amongst pigeons. >>>>>>> >>>>>>> causing the foreseeable defensiveness. >>>>>>> >>>>>>> [...] >>>>>>> >>>>>>>> Errors that creep into projects are quite language and technology >>>>>>>> agnostic. >>>>>>> >>>>>>> Ganssle presented numbers: 50..100 errors/KLOC in C, 5..10 in ADA, >>>>>>> zero with SPARK. >>>>>> >>>>>> It is the language, not the rest of the toolchain. >>>>>> "C" is the major contributor to the decline in software quality >>>>>> (where >>>>>> there was some quality to decline of course). >>>>> >>>>> >>>>> That's odd, since 'C' has been there since... well, the start. How >>>>> can a thing-that-has-not-changed be the cause of decline? Some >>>>> massive lag? Changes in the populations of practitioners? >>>> >>>> It is the popularity growth, not the birth date. Then C does not >>>> prevent one from writing decent software, it only makes it more >>>> difficult - >>> >>> I don't think that's ... demonstrable in any reasonable fashion. >> >> It is obvious enough for me. The fact is that C tries to be a >> "universal assembler" as some people see it and >> it does it poorly (too abstracted from any machine model). There >> are a lot more details about my vpa which allow me to do things >> people just can't do in C which are way too lengthy for me to explain >> to myself let alone other people from the trade so I won't go >> into it, neither would any sane person want me to :-). >> > > > That's just not been my experience. Since the mid 80s, I can count the > number of times I've felt like going to assembly on one hand.I imagine I would have felt the same had my main assembly experience been x86 or similar. 68k assembly - the language itself - has been an excellent foundation to build on - (un?)fortunately I am the only person being busy doing that I suppose :-).> > <snip> >>> Then I am not sure what to tell you - the idioms of 'C' are >>> a pretty lengthy thing. I have committed many of the patterns to >>> memory over 25 years but not all of them. >> >> Exactly. This is the basic flaw of high level languages. Instead of >> dealing with text they deal with hieroglyphs - which is much less >> efficient than just using an alphabet and design your words en route >> to evolve the language to fit the whims of life. > > But there really is a problem using "English like" words. COBOL > went that way and, while not exactly deprecated, isn't widely used > outside of , say, banks. > > Seems like punctuation marks are pretty useful.You have misunderstood me. I am not advocating any "close to natural language" thing, why would we want that. We want a language which makes our brains more efficient at programming. So my analogy with natural language is of the sort "when you use a low level language you deal with words and when you use a HLL you deal with predefined sentences". Thus high level languages deprive you from a most basic feature of languages - the ability to design your own sentences. Hence the eternal "C is an assembler" vs. "it is not one" thing, language users just do need the low level to design the higher one themselves according to what they want to say. Predefined sentences are just for the general public which does not author much in writing to enable them also to communicate :-).>> The basic flaw of any (too) high level language is its lack of >> flexibility to adapt to an ever changing world. Sure changes are >> made and the phrasebooks get rewritten - but how is this comparable >> with adding just the new words to the dictionary and twisting the >> language without needing any "official" approval. Before the change >> happens years will have passed and gigatons of poor software will >> have been written (poor simply because the language was not up to date >> with reality). >> > > I personally do not find this an impractical limitation. > >>>>> >>>>>> The thing is, their novels get sold simply because the general >>>>>> public can't even use a phrasebook. >>>>>> And this happened mainly because x86 entered the scene widely, >>>>>> made assembly programming impractical with its messy programming >>>>>> model etc. >>>>>> >>>>> >>>>> I wrote more assembly language in x86 than in any other architecture. >>>>> You want something to wreck things? Try assembly. >>>> >>>> This explains why you see assembly as something impractical. >>> >>> I don't. >> >> OK, your previous post left me with the impression you did, I must >> have misunderstood you. >> > > I feel like 'C' is a better choice. The set of programmers for it is > larger and it's modestly more expressive.I agree that the set of programmers is larger of course and if by "more expressive" you mean "packs more info into less text" I'd have to work hard to check if I agree or not. But this does not make it better in many cases, for example for me it would mean I would be more like the rest of the world but it would degrade my efficiency to a fraction of what it is now which simply would not work, I'd not survive the way I do now (being unable to offer what I have on offer now).>>> >>>> There is no such thing as "assembly" language really, there >>>> are worlds of a difference between this or that "assembly". >>> >>> They're all essentially the same. There is a narcissism of small >>> differences. >> >> Well if this is "the same" the way all human languages are "the same", >> I could agree. Only if so. >> > > > Ah - well, it takes some digging and you have to be prepared to ignore > differences that are smaller :) but all human languages can be arranged > in a tree structure. Turns out there might be more in common than in > difference. Differences tend to be things added after a population > moved to a different place and the language evolved.Well like I said at that level of "same" I agree with you, of course :-).>>>> And then there is my VPA (virtual processor assembly) which >>>> makes me more efficient by at least an order of magnitude than >>>> anyone who uses C when it comes to projects which take more >>>> than a month to program (before you ask my code is in the >>>> millions of lines, >50M sources over the past 20 years). >>>> >>> >>> Those projects are arguably too large. An old saying is "by the time >>> you get N=a million lines of FORTRAN to compile, you no longer >>> care what it was supposed to do." >> >> If a project which takes over a month of programming is "too large" >> in your book then OK, I will agree with you that copying this and >> that and putting something together in a week or two is better done >> using a high level language, yes. > > > You won't get to a million lines in a month. Ten times > what you get in a month won't take ten months; it'll likely > take more - complexity is arguably O(n^2) or O(log(n)) of number > of lines - using the term "complexity" to approximate cost.My average output is around 150 kilobytes of source text/month. I have thrown away very little of what I have written over the past 20 years, and of course the entire thing is subdivided into separate "projects". E.g. when I added a tcp/ip stack to DPS it took me about 6 months to get to basic tcp connect functionality and another 2-3 months to do the basic higher level, DNS, ftp, smtp etc. I needed. Giving these figures to just make the picture of what we talk about clearer, sounded too general. Dimiter ------------------------------------------------------ Dimiter Popoff, TGI http://www.tgi-sci.com ------------------------------------------------------ http://www.flickr.com/photos/didi_tgi/
Reply by ●December 7, 20142014-12-07
On 07/12/14 20:51, Niklas Holsti wrote:> On 14-12-06 17:07 , David Brown wrote: >> On 05/12/14 20:31, Niklas Holsti wrote: >>> On 14-12-05 20:45 , Ed Prochak wrote: >>>> I always viewed C as the universal assembly language. >>> >>> It may have been that, in the past, before the standardisation and >>> before the compilers became ambitious about optimisation and code speed. >>> Nowadays, standard C has more "gotchas" and hard-to-remember rules than >>> a typical real assembly language (reference: recent discussions on >>> comp.arch about gcc "miscompiling" typical C programs, because gcc >>> assumes that C code with undefined behaviour, per the standard, can do >>> anything.) >>> >> >> I haven't seen the thread in comp.arch, but do you have any particular >> situations in mind? > > The subject of the thread was "If It Were Easy...". > > IIRC, some of the things discussed in the thread were strict aliasing > rules and other pointer-punning and type conversion issues -- apparently > memcpy() is the only well-defined way, and the traditional "union" trick > is not. But surprise! gcc may optimise away a memcpy() call, possibly > just reusing the source data in situ.Type-punning through unions is defined in the standards, and works as expected to my knowledge (though earlier C standards were not entirely clear about this). memcpy() will always assume that the source and destination pointers may alias other areas (but they may not overlap each other). But the compiler does not have to generate a call to a memcpy function - it can generate the "copy" inline, and it is free to make as many or as few copies as it wants, as long as the behaviour is /as if/ it called memcpy(). If you rely on memcpy() code to do something else - such as assuming it is a memory barrier or has a visible effect in a multi-threaded environment - then the fault is with these assumptions, not the way the compiler handles the memcpy().> > Also discussed was one case in which code in the Linux kernel first > dereferenced a pointer, and then tested if the pointer was null -- gcc > omitted the test, because the dereference would cause undefined > behaviour for a null pointer, so only the non-null case needed to be > compiled.The behaviour of the compiler was correct - the bug was in the kernel code. It annoyed the kernel developers, but the mistake was in the source code.> The programmer apparently knew that, in the kernel context, a > null pointer can be dereferenced without harm. But what was done after > the test failed badly if the pointer was null.Dereferencing null pointers is undefined behaviour in C - this is well known. The compiler can therefore remove checks for null pointers that are run /after/ accesses through the pointer. The compiler's behaviour here is correct, and it can occasionally lead to improvements in the generated code. But it is not particularly helpful for testing or debugging. It is therefore a good idea to either disable this "-fdelete-null-pointer-checks" optimisation, or to change the environment (such as by avoiding mapping a real page to address 0). The ideal answer, of course, is to correct the error in the source code.> > I think there was also mention of gcc making loops eternal, or deleting > them entirely (I don't remember which) if loop termination depends on > signed-integer overflow.Again, the C standards are quite clear and well-known - signed-integer overflow is undefined behaviour, and you cannot rely on them to wrap around as two's complement values. And there are situations where the compiler can take good, correct code and generate smaller and faster object code by "knowing" that signed integer arithmetic does not wrap. gcc's warnings are usually quite good at telling you about these things - /if/ you use them properly.> > It was a long and bitterly argued thread, where the "traditionalist" > C-is-a-portable-assembler advocates essentially claimed that the C > standard committees and gcc maintainers are pushing C to become too much > a high-level language and are destroying its predictability for > low-level programming, except in the hands of very careful programmers.It is certainly arguable that the skills needed to make sure that the C code works as expected have changed over the years - I know I have written code over the years that would not work when compiled with a modern compiler and heavy optimisation enabled. It is also certainly the case that the C standards committees and the compiler maintainers don't always seem to live in the same world as the people actually /using/ the tools.> >> And what do you suggest gcc /should/ do about undefined behaviour? Make >> wild guesses about what it thinks the user actually intended? > > I don't have much of an opinion (I avoid using C when I can). > > I think I see the point on both sides of the argument. The > traditionalists want C compilers that emit machine code that "does the > same thing" as the source code: if there is a pointer dereference in the > source, there should be an indirect load/store in machine code, even if > the pointer might be null; if the source tests for a null pointer, there > should be a comparison instruction and conditional branch in the code, > even if earlier code has done something that makes the behaviour > undefined if the pointer is null.Such people are not looking for the C programming language - they are looking for a language they think C should be. To my knowledge, no such language actually exists - so they use C as the nearest they can get, and complain when it is not /their/ ideal language. They could get on quite well if they learned how to use "volatile" appropriately. I am not claiming C is an ideal language here - there are many things in it that I would change if I could. But we use it as the nearest practical choice, and write code to suit the language rather than expecting the language to suit our code.> > The modernists want C to have a well-defined standard and application > portability where possible, which unfortunately (given what C is like) > means that many things one can write in C, and even compile, will have > undefined behaviour across implementations -- but whether the behaviour > is defined or undefined (or something in between, such as implementation > defined) often depends on run-time dynamic things, so the compiler > cannot just reject such code. >The differences between undefined behaviour, unspecified behaviour, and implementation defined behaviour are subtle but important. "Undefined behaviour" means that there is no meaningful interpretation for the code, and the compiler can optimise based on the assumption that such behaviour will never happen, and also that /if/ such behaviour happens, the programmer doesn't care about the result. Running off the end of an array is undefined behaviour, so the compiler can assume it won't happen. "Unspecified behaviour" means that the standards don't say what will happen, nor does the compiler have to define the behaviour. The order of evaluation of function arguments is unspecified - the compiler can evaluate them in different orders at different times. "Implementation defined behaviour" is supposed to be documented and consistent for a given compiler. The size of "int", and the storage format, is implementation defined behaviour. (See Annex J of the C11 standards - or document N1570, which is the last freely available draft and is easily found on the web.)> What IMO is doubtful is for the compiler to latch on to the possibility > of undefined behaviour of some part of the code, under some > circumstances, and eagerly assume that in those circumstances it does > not matter what the code does, in that part or following parts. The > modernists claimed on comp.arch that this compiler behaviour follows > from the general code optimisation methods, and that it would be hard to > report, as warnings, optimisations that depend on the > "no-undefined-behaviour" assumption. I'm not quite convinced about that. >Would you think that the comparison (x + 1) > y is the same as x >= y, where x and y are int's ? Mathematically, they are clearly the same thing - and converting to the second comparison will mean smaller and faster code that the first expression. But the conversion is only valid if the compiler can assume that integer overflow will never occur. If it is possible for "x" to get so big that "x + 1" overflows, then the programmer will have made a mistake here. So the compiler can assume that the programmer is competent, and generate smaller and faster code assuming that the undefined behaviour never occurs (or that the programmer doesn't care about the results if it /does/ occur). The same is true for a lot of different undefined behaviour. Compilers don't go out of their way to spot possible undefined behaviour, and then maliciously generate garbage to spite you. They assume the programmer knows what he is doing, and has written correct code (with plenty of warnings available if enabled), and that the programmer wants the result to be as small and fast as possible with the specified behaviour.>> In some cases, the compiler will allow the user to define the behaviour >> - such as by compiler flags that make signed integers overflow as two's >> complement (even though code will almost never use such a "feature", and >> changing it reduces some optimisation opportunities in good code). In >> many other cases, undefined behaviour is fairly obvious if the >> programmer things about it (and programmers /should/ think!) - dividing >> by zero is undefined, so the compiler can assume that you don't care >> what will happen if you try it. > > But in some assembler programs, specific run-time errors such as > divide-by-zero are sometimes triggered on purpose. Those who see C as a > portable assembler would like the expression "1/0" to generate a > division causing this error, even if the behaviour is undefined in the C > standard.C does not support that sort of thing. If they want behaviour like that, they need to write in assembly (or inline assembly). You can't just make up your own rules about how you think C ought to behave.> > I agree that, formally speaking, C was never a "portable assembler". It > was just the simple compilers that made it appear so. >
Reply by ●December 8, 20142014-12-08
Dimiter_Popoff wrote:> On 06.12.2014 г. 21:59, Les Cargill wrote: >> Dimiter_Popoff wrote: >>> On 05.12.2014 г. 20:40, Les Cargill wrote: >>>> Dimiter_Popoff wrote: >>>>> On 05.12.2014 г. 15:41, Les Cargill wrote: >>>>>> Dimiter_Popoff wrote: >>>>>>> On 04.12.2014 г. 12:51, Oliver Betz wrote: >>>>>>>> Paul E Bennett wrote: >>>>>>>> >>>>>>>> [...] >>>>>>>> >>>>>>>>>> Could it be that today's sophisticated tools lead to more "try >>>>>>>>>> and >>>>>>>>>> error", less thinking before doing? >>>>>>>>> >>>>>>>>> Talk about cats amongst pigeons. >>>>>>>> >>>>>>>> causing the foreseeable defensiveness. >>>>>>>> >>>>>>>> [...] >>>>>>>> >>>>>>>>> Errors that creep into projects are quite language and technology >>>>>>>>> agnostic. >>>>>>>> >>>>>>>> Ganssle presented numbers: 50..100 errors/KLOC in C, 5..10 in ADA, >>>>>>>> zero with SPARK. >>>>>>> >>>>>>> It is the language, not the rest of the toolchain. >>>>>>> "C" is the major contributor to the decline in software quality >>>>>>> (where >>>>>>> there was some quality to decline of course). >>>>>> >>>>>> >>>>>> That's odd, since 'C' has been there since... well, the start. How >>>>>> can a thing-that-has-not-changed be the cause of decline? Some >>>>>> massive lag? Changes in the populations of practitioners? >>>>> >>>>> It is the popularity growth, not the birth date. Then C does not >>>>> prevent one from writing decent software, it only makes it more >>>>> difficult - >>>> >>>> I don't think that's ... demonstrable in any reasonable fashion. >>> >>> It is obvious enough for me. The fact is that C tries to be a >>> "universal assembler" as some people see it and >>> it does it poorly (too abstracted from any machine model). There >>> are a lot more details about my vpa which allow me to do things >>> people just can't do in C which are way too lengthy for me to explain >>> to myself let alone other people from the trade so I won't go >>> into it, neither would any sane person want me to :-). >>> >> >> >> That's just not been my experience. Since the mid 80s, I can count the >> number of times I've felt like going to assembly on one hand. > > I imagine I would have felt the same had my main assembly experience > been x86 or similar. 68k assembly - the language itself - has been > an excellent foundation to build on - (un?)fortunately I am the > only person being busy doing that I suppose :-). >I mean 68k, too. Nothing has driven more towards that for a considerable amount of time.>> >> <snip> >>>> Then I am not sure what to tell you - the idioms of 'C' are >>>> a pretty lengthy thing. I have committed many of the patterns to >>>> memory over 25 years but not all of them. >>> >>> Exactly. This is the basic flaw of high level languages. Instead of >>> dealing with text they deal with hieroglyphs - which is much less >>> efficient than just using an alphabet and design your words en route >>> to evolve the language to fit the whims of life. >> >> But there really is a problem using "English like" words. COBOL >> went that way and, while not exactly deprecated, isn't widely used >> outside of , say, banks. >> >> Seems like punctuation marks are pretty useful. > > You have misunderstood me. I am not advocating any "close to natural > language" thing, why would we want that. We want a language which > makes our brains more efficient at programming.To be fair, that's a pretty broad target. If we knew what that meant, we'd be more likely to have it.> So my analogy with > natural language is of the sort "when you use a low level language > you deal with words and when you use a HLL you deal with > predefined sentences". Thus high level languages deprive you from a > most basic feature of languages - the ability to design your own > sentences. Hence the eternal "C is an assembler" vs. "it is not one" > thing, language users just do need the low level to design the > higher one themselves according to what they want to say. > Predefined sentences are just for the general public which does > not author much in writing to enable them also to communicate :-). >I feel like I know less about your goal than I did before :) <snip>>> >> I feel like 'C' is a better choice. The set of programmers for it is >> larger and it's modestly more expressive. > > I agree that the set of programmers is larger of course and if by > "more expressive" you mean "packs more info into less text" I'd > have to work hard to check if I agree or not."More expressive" to my mind means it's inherently easier to read once you get the hang of it. I am biased by a (probably misunderstood) blurb from "Godel, Escher Bach" where he claims without exposing the proof that "there is no higher level language than FLOOP" - FLOOP being a cute metaphor for Algol, of which 'C' is a descendent.> But this does not make it better in many cases, for example for > me it would mean I would be more like the rest of the world but > it would degrade my efficiency to a fraction of what it is now > which simply would not work, I'd not survive the way I do now > (being unable to offer what I have on offer now). >Understood.>>>> >>>>> There is no such thing as "assembly" language really, there >>>>> are worlds of a difference between this or that "assembly". >>>> >>>> They're all essentially the same. There is a narcissism of small >>>> differences. >>> >>> Well if this is "the same" the way all human languages are "the same", >>> I could agree. Only if so. >>> >> >> >> Ah - well, it takes some digging and you have to be prepared to ignore >> differences that are smaller :) but all human languages can be arranged >> in a tree structure. Turns out there might be more in common than in >> difference. Differences tend to be things added after a population >> moved to a different place and the language evolved. > > Well like I said at that level of "same" I agree with you, > of course :-). >There's just a lot we don't know here.>>>>> And then there is my VPA (virtual processor assembly) which >>>>> makes me more efficient by at least an order of magnitude than >>>>> anyone who uses C when it comes to projects which take more >>>>> than a month to program (before you ask my code is in the >>>>> millions of lines, >50M sources over the past 20 years). >>>>> >>>> >>>> Those projects are arguably too large. An old saying is "by the time >>>> you get N=a million lines of FORTRAN to compile, you no longer >>>> care what it was supposed to do." >>> >>> If a project which takes over a month of programming is "too large" >>> in your book then OK, I will agree with you that copying this and >>> that and putting something together in a week or two is better done >>> using a high level language, yes. >> >> >> You won't get to a million lines in a month. Ten times >> what you get in a month won't take ten months; it'll likely >> take more - complexity is arguably O(n^2) or O(log(n)) of number >> of lines - using the term "complexity" to approximate cost. > > My average output is around 150 kilobytes of source text/month. > I have thrown away very little of what I have written over the > past 20 years, and of course the entire thing is subdivided into > separate "projects". E.g. when I added a tcp/ip stack to DPS it > took me about 6 months to get to basic tcp connect functionality > and another 2-3 months to do the basic higher level, DNS, ftp, > smtp etc. I needed. > Giving these figures to just make the picture of what we talk > about clearer, sounded too general. > > Dimiter > > ------------------------------------------------------ > Dimiter Popoff, TGI http://www.tgi-sci.com > ------------------------------------------------------ > http://www.flickr.com/photos/didi_tgi/ > >-- Les Cargill
Reply by ●December 8, 20142014-12-08
On 14-12-08 01:53 , David Brown wrote:> On 07/12/14 20:51, Niklas Holsti wrote: >> On 14-12-06 17:07 , David Brown wrote: >>> On 05/12/14 20:31, Niklas Holsti wrote: >>>> On 14-12-05 20:45 , Ed Prochak wrote: >>>>> I always viewed C as the universal assembly language. >>>> >>>> It may have been that, in the past, before the standardisation and >>>> before the compilers became ambitious about optimisation and code >>>> speed. >>>> Nowadays, standard C has more "gotchas" and hard-to-remember rules than >>>> a typical real assembly language (reference: recent discussions on >>>> comp.arch about gcc "miscompiling" typical C programs, because gcc >>>> assumes that C code with undefined behaviour, per the standard, can do >>>> anything.) >>>> >>> >>> I haven't seen the thread in comp.arch, but do you have any particular >>> situations in mind? >> >> The subject of the thread was "If It Were Easy...". >> >> IIRC, some of the things discussed in the thread were strict aliasing >> rules and other pointer-punning and type conversion issues -- apparently >> memcpy() is the only well-defined way, and the traditional "union" trick >> is not. But surprise! gcc may optimise away a memcpy() call, possibly >> just reusing the source data in situ. > > Type-punning through unions is defined in the standards, and works as > expected to my knowledge (though earlier C standards were not entirely > clear about this).In the comp.arch thread, there was considerable discussion about what the C standards really say, and whether or not they are internally consistent and clear. The impression I got from that discussion was that union punning does not always work in the standard. I'm not going to argue these points -- my understanding of the C standard is too poor for that. (In fact I found it frustrating that the comp.arch contributors who seem, to me, quite competent and even expert, could not agree in that discussion.)> But the compiler does not have to generate a call to a memcpy function - > it can generate the "copy" inline, and it is free to make as many or as > few copies as it wants, as long as the behaviour is /as if/ it called > memcpy().In the comp.arch discussion, the point was that the traditionalists felt that having to use memcpy() instead of their traditional pointer-casting method would be too inefficient, because they thought that memcpy() would copy data, even if inlined. The reply from the modernists was that the C compiler can treat a memcpy() call quite abstractly, as saying only that the name of destination variable afterwards refers to the same string of byte values as the name of the source variable; ergo, if subsequent data dependencies do not force a copy, the compiler need not implement a copy, by any means, and can just internally (and knowing it it safe) use the source data, in situ, where the source code specifies using the destination (copied) data. I guess this falls under the general /as if/ rule, but it is a rather wider interpretation of that rule than the traditionalists expected.>> Also discussed was one case in which code in the Linux kernel first >> dereferenced a pointer, and then tested if the pointer was null -- gcc >> omitted the test, because the dereference would cause undefined >> behaviour for a null pointer, so only the non-null case needed to be >> compiled. > > The behaviour of the compiler was correct - the bug was in the kernel > code. It annoyed the kernel developers, but the mistake was in the > source code.This, and your later replies, show that you agree with what I have called the "modernist" camp. I'm not saying that this camp is wrong; indeed I think this camp is correct in its interpretation of the C standard; but there are also the "traditionalists" who don't like the current C standard and its effect on current C compilers -- in particular, making C much less of a "portable assembler".>> What IMO is doubtful is for the compiler to latch on to the possibility >> of undefined behaviour of some part of the code, under some >> circumstances, and eagerly assume that in those circumstances it does >> not matter what the code does, in that part or following parts. The >> modernists claimed on comp.arch that this compiler behaviour follows >> from the general code optimisation methods, and that it would be hard to >> report, as warnings, optimisations that depend on the >> "no-undefined-behaviour" assumption. I'm not quite convinced about that. >> > > Would you think that the comparison (x + 1) > y is the same as x >= y, > where x and y are int's ? Mathematically, they are clearly the same > thing - and converting to the second comparison will mean smaller and > faster code that the first expression. But the conversion is only valid > if the compiler can assume that integer overflow will never occur. If > it is possible for "x" to get so big that "x + 1" overflows, then the > programmer will have made a mistake here. So the compiler can assume > that the programmer is competent, and generate smaller and faster code > assuming that the undefined behaviour never occurs (or that the > programmer doesn't care about the results if it /does/ occur).You are of course entirely right, from the formal point of view, but again, there are the C traditionalists who take the argument "the programmer knows what she wants" further, and say that if the programmer wrote x+1, then x+1 should be computed, and the programmer wants to be responsible for what happens -- overflow or no overflow. In this example, it seems simple for the compiler to warn (perhaps only under some option asking for such warnings) that it has generated code for "(x+1)>y" under the assumption that x+1 does not overflow. Such warnings would please the traditionalists, especially if the compiler has options to suppress optimisations, like this one, that assume no overflow. Understandably, the modernists are not eager to bloat the compiler's optimiser and code-generator with such warnings and options, which they feel are not in the modern C spirit. -- Niklas Holsti Tidorum Ltd niklas holsti tidorum fi . @ .
Reply by ●December 8, 20142014-12-08
Am 08.12.2014 um 00:53 schrieb David Brown:> On 07/12/14 20:51, Niklas Holsti wrote:>> IIRC, some of the things discussed in the thread were strict aliasing >> rules and other pointer-punning and type conversion issues -- apparently >> memcpy() is the only well-defined way, and the traditional "union" trick >> is not.> Type-punning through unions is defined in the standards,I'll have to insist on an explanation quoting chapter and verse before I accept that claim. All editions of the C standard that I've seen try rather strongly to state the exact opposite of what you say there. E.g. C99 6.7.2.1p15: "The value of at most one of the members can be stored in a union object at any time." I don't see any definition of what happens if try to you retrieve a value that's not currently stored in the object, which would make that, rather obviously, undefined behaviour.
Reply by ●December 9, 20142014-12-09
On 08/12/14 22:39, Hans-Bernhard Br�ker wrote:> Am 08.12.2014 um 00:53 schrieb David Brown: >> On 07/12/14 20:51, Niklas Holsti wrote: > >>> IIRC, some of the things discussed in the thread were strict aliasing >>> rules and other pointer-punning and type conversion issues -- apparently >>> memcpy() is the only well-defined way, and the traditional "union" trick >>> is not. > >> Type-punning through unions is defined in the standards, > > I'll have to insist on an explanation quoting chapter and verse before I > accept that claim. All editions of the C standard that I've seen try > rather strongly to state the exact opposite of what you say there. E.g. > C99 6.7.2.1p15: "The value of at most one of the members can be stored > in a union object at any time." I don't see any definition of what > happens if try to you retrieve a value that's not currently stored in > the object, which would make that, rather obviously, undefined behaviour.The key point is clarified in a footnote in the C11 standards (draft N1570 is easily and freely available on the web, and is therefore more common than the official final version which must be bought). Section 6.5.2.3 on page 83 (page 101 of the pdf) has a footnote: """ If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called ��type punning��). This might be a trap representation. """ According to the language lawyers in comp.lang.c, this footnote in C11 clarifies the behaviour but does not change it (a change in behaviour requires a change in the main text, not just the footnotes). Ergo, type punning through a union has always been allowed, though the standards were not clear on the matter before. Previously, it was certainly /possible/ to interpret the text as meaning the standards did not specify the behaviour of type punning (note that a lack of definition is /not/ the same as "undefined behaviour" - something is only undefined behaviour in C if the standards say so explicitly). However, the committee thought that the behaviour was clear since the representations of the members of a union were specified, and their addresses were guaranteed to be the same. Also see 6.5p7 (of N1570 - I don't have C99 handy) which says: """ An object shall have its stored value accessed only by an lvalue expression that has one of the following types:88) � a type compatible with the effective type of the object, � a qualified version of a type compatible with the effective type of the object, � a type that is the signed or unsigned type corresponding to the effective type of the object, � a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object, � an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or � a character type. 88) The intent of this list is to specify those circumstances in which an object may or may not be aliased. """ That's the best I can do, I think. If you want more here, then comp.lang.c would be better than comp.arch.embedded - where I suspect most people are already asleep! For comp.arch.embedded, the main issue is not what the standards say, but what compilers /do/ - and I haven't found a compiler that does not allow type-punning through unions. In particular, gcc makes it explicit: <https://gcc.gnu.org/onlinedocs/gcc/Structures-unions-enumerations-and-bit-fields-implementation.html> <https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Type-punning>
Reply by ●December 9, 20142014-12-09
On Sunday, December 7, 2014 2:52:00 PM UTC-5, Niklas Holsti wrote:> On 14-12-06 17:07 , David Brown wrote: > > On 05/12/14 20:31, Niklas Holsti wrote: > >> On 14-12-05 20:45 , Ed Prochak wrote: > >>> I always viewed C as the universal assembly language. > >> > >> It may have been that, in the past, before the standardisation and > >> before the compilers became ambitious about optimisation and code speed. > >> Nowadays, standard C has more "gotchas" and hard-to-remember rules than > >> a typical real assembly language (reference: recent discussions on > >> comp.arch about gcc "miscompiling" typical C programs, because gcc > >> assumes that C code with undefined behaviour, per the standard, can do > >> anything.) > >> > > > > I haven't seen the thread in comp.arch, but do you have any particular > > situations in mind? > > The subject of the thread was "If It Were Easy...". > > IIRC, some of the things discussed in the thread were strict aliasing > rules and other pointer-punning and type conversion issues -- apparently > memcpy() is the only well-defined way, and the traditional "union" trick > is not. But surprise! gcc may optimise away a memcpy() call, possibly > just reusing the source data in situ. > > Also discussed was one case in which code in the Linux kernel first > dereferenced a pointer, and then tested if the pointer was null -- gcc > omitted the test, because the dereference would cause undefined > behaviour for a null pointer, so only the non-null case needed to be > compiled. The programmer apparently knew that, in the kernel context, a > null pointer can be dereferenced without harm. But what was done after > the test failed badly if the pointer was null. > > I think there was also mention of gcc making loops eternal, or deleting > them entirely (I don't remember which) if loop termination depends on > signed-integer overflow. > > It was a long and bitterly argued thread, where the "traditionalist" > C-is-a-portable-assembler advocates essentially claimed that the C > standard committees and gcc maintainers are pushing C to become too much > a high-level language and are destroying its predictability for > low-level programming, except in the hands of very careful programmers. > > > And what do you suggest gcc /should/ do about undefined behaviour? Make > > wild guesses about what it thinks the user actually intended? > > I don't have much of an opinion (I avoid using C when I can). > > I think I see the point on both sides of the argument. The > traditionalists want C compilers that emit machine code that "does the > same thing" as the source code: if there is a pointer dereference in the > source, there should be an indirect load/store in machine code, even if > the pointer might be null; if the source tests for a null pointer, there > should be a comparison instruction and conditional branch in the code, > even if earlier code has done something that makes the behaviour > undefined if the pointer is null. > > The modernists want C to have a well-defined standard and application > portability where possible, which unfortunately (given what C is like) > means that many things one can write in C, and even compile, will have > undefined behaviour across implementations -- but whether the behaviour > is defined or undefined (or something in between, such as implementation > defined) often depends on run-time dynamic things, so the compiler > cannot just reject such code. > > What IMO is doubtful is for the compiler to latch on to the possibility > of undefined behaviour of some part of the code, under some > circumstances, and eagerly assume that in those circumstances it does > not matter what the code does, in that part or following parts. The > modernists claimed on comp.arch that this compiler behaviour follows > from the general code optimisation methods, and that it would be hard to > report, as warnings, optimisations that depend on the > "no-undefined-behaviour" assumption. I'm not quite convinced about that. > > > In some cases, the compiler will allow the user to define the behaviour > > - such as by compiler flags that make signed integers overflow as two's > > complement (even though code will almost never use such a "feature", and > > changing it reduces some optimisation opportunities in good code). In > > many other cases, undefined behaviour is fairly obvious if the > > programmer things about it (and programmers /should/ think!) - dividing > > by zero is undefined, so the compiler can assume that you don't care > > what will happen if you try it. > > But in some assembler programs, specific run-time errors such as > divide-by-zero are sometimes triggered on purpose. Those who see C as a > portable assembler would like the expression "1/0" to generate a > division causing this error, even if the behaviour is undefined in the C > standard. > > I agree that, formally speaking, C was never a "portable assembler". It > was just the simple compilers that made it appear so. > > -- > Niklas Holsti > Tidorum Ltd > niklas holsti tidorum fi > . @ .I don't see it as either/or. The balance should be able to be done using compiler options. yes that may mean turning off most optimizations when I want the compiler to emit machine code the way I wrote it in C code. So I have control. (This means I may have to make some careful programming to optimize at the source code level. But then when I am writing an application that I want to be portable I treat C as high level. This means letting the compiler writers learn all the tricks of the machine code and I take advantage of their knowledge by switching the optimizations on full power. The design of C allows both worlds to meet. It does create some friction, but to me it is this flexibility that is an advantage. You just have to know what you are doing. C will not hold your hand.
Reply by ●December 9, 20142014-12-09
On Saturday, December 6, 2014 2:58:11 PM UTC-5, Les Cargill wrote: []> > Ah - well, it takes some digging and you have to be prepared to ignore > differences that are smaller :) but all human languages can be arranged > in a tree structure. Turns out there might be more in common than in > difference. Differences tend to be things added after a population > moved to a different place and the language evolved. >[]> -- > Les Cargill(maybe we really need to start another thread on languages) Sorry but after trying to learn some Chinese recently I have to disagree on that the differences in human languages can be described as small. yes human languages can be arranged hierarchically, just as programming languages can. But there are some large divergences in both trees. English for the most part is based on vowels and consonants and is basically atonal. (Words mean the same thing whether you speak in a monotone or in a song.) Chinese and other oriental languages are tonal. The same consonant and vowel combination can mean vastly different things depending on the inflection. In programming languages the divide is between ones like C and languages like LISP. I'll say one last thing today, then I have to get back to work. (This is more directed to the entire group, not you Les.) There is no one language that can be used for all problems. Someone else posted about high level languages not being flexible enough. Maybe you chose the wrong language. A specialized language has great advantages over a general purpose language within its problem domain. You can write C programs to read and write databases, but it is much easier and clearer to express the program in SQL. You can write a compiler in COBOL, but it may be easier and clearer to use LEX and YACC. you can write a GUI in PERL and X, but maybe C# and XAML will be easier. In terms of programming, think like a mechanical engineer, and pick the right tool for the job.
Reply by ●December 9, 20142014-12-09
Ed Prochak wrote: [%X]> (maybe we really need to start another thread on languages) > > Sorry but after trying to learn some Chinese recently I have to disagree > on that the differences in human languages can be described as small. > > yes human languages can be arranged hierarchically, just as programming > languages can. But there are some large divergences in both trees. > > English for the most part is based on vowels and consonants and is > basically atonal. (Words mean the same thing whether you speak in a > monotone or in a song.) > > Chinese and other oriental languages are tonal. The same consonant and > vowel combination can mean vastly different things depending on the > inflection. > > In programming languages the divide is between ones like C and languages > like LISP. > > I'll say one last thing today, then I have to get back to work. (This is > more directed to the entire group, not you Les.) > > There is no one language that can be used for all problems. > > Someone else posted about high level languages not being flexible enough. > Maybe you chose the wrong language. A specialized language has great > advantages over a general purpose language within its problem domain. > > You can write C programs to read and write databases, but it is much > easier and clearer to express the program in SQL. You can write a compiler > in COBOL, but it may be easier and clearer to use LEX and YACC. you can > write a GUI in PERL and X, but maybe C# and XAML will be easier. > > In terms of programming, think like a mechanical engineer, and pick the > right tool for the job.From the basic machine languages (those understood directly by the electronic processor) the aim of every programmer should be to build a language that is specific to the application domain. You know you are getting that right when the client can begin to see how to do the stuff they know in the language you create for them. The machines need to be told how to cope with the constructs of the Application Specific Language. Once tat is done, the rest becomes much easier. As a Systems Engineer who uses Forth mainly High Integrity Systems, I find it gratifying when my clients really get comfortable with what grows from such a basis. -- ******************************************************************** Paul E. Bennett IEng MIET.....<email://Paul_E.Bennett@topmail.co.uk> Forth based HIDECS Consultancy.............<http://www.hidecs.co.uk> Mob: +44 (0)7811-639972 Tel: +44 (0)1235-510979 Going Forth Safely ..... EBA. www.electric-boat-association.org.uk.. ********************************************************************







