arm-gcc: pointer to constant string

Started by pozz September 14, 2018
On 15/09/18 01:59, Richard Damon wrote:
> On 9/14/18 6:26 AM, David Brown wrote: >> In particular, for the ARM it would >> be very strange for a compiler to do anything that would not involve >> having the string literal at a fixed place in read-only flash. > > I can easily think of cases where the string would not be accessed from > Read-Only flash on the ARM, the simplest is a system where the program > was stored in a non-execute-in-place media, and was copied by some from > of loader into RAM. The string would then exist in RAM, not flash, and > while it may not be enforced 'Read Only' the program (unless it has > performed UB) is not going to invalidate the string. >
Fair enough - there is no need for the string to be in flash or read-only memory. But it is likely that it would, at least logically, be at a fixed address in the memory map somewhere.
On 15/09/18 15:24, David Brown wrote:
> On 14/09/18 21:51, Tom Gardner wrote: >> On 14/09/18 19:37, David Brown wrote: >>> On 14/09/18 19:45, Tom Gardner wrote: >>>> On 14/09/18 16:02, David Brown wrote: > > <snip> > >>>>> With C, there are three standards to choose from - C90, C99 and C11. >>>>> (There is an upcoming C17, but it is basically a bug-fix and occasional >>>>> improved wording, so there is not much to be found there except nicer >>>>> typesetting.) >>>>> >>>>> And there are also implementation-specific extensions - both compiler >>>>> specific and target-specific. >>>> >>>> Don't forget the commercially important /subsets/! The need >>>> for subsets is a warning flag in itself. >>>> >>> >>> What need of "commercially important subsets" are you talking about? >> >> MISRA-C, for example. > > That is a set of rules for C, rather than a subset as such.
That's hairsplitting, which obscures rather than illuminates.
> (And 90% of it > could be replaced by three rules - 1. Learn the language properly and don't > write invalid code.&nbsp; 2. Write in a readable fashion.&nbsp; 3. Get proper
tools and
> learn how to use them.)
Nonsense.
> Of course if a specific task, project, employer or customer has rules for how > you work, you need to know them and follow them.&nbsp; But that is not "a subset
of
> the language", or the fault of the language.
I know of three languages where subsetting is a practical necessity: C, C++, and the SPARK subset of Ada. That's significant. However SPARK's objective and *ability* is something that C/C++ cannot even dream of: provably correct implementation for use in high integrity systems. To get that you would have to subset /any/ language (except possible NewSpeak!).
>>> There /are/ compilers that implement subsets of C standards.&nbsp; But I don't
see
>>> a /need/ for them. >>> >>>> >>>> >>>>> Generally, it is not hard to know which C version you are dealing with, >>>>> and there is very little code that is valid in one version that is not >>>>> equally valid in later versions. >>>> >>>> That statement, while correct, skims over a significant problem: >>>> unfortunately too few people understand and can /predict/ what is >>>> valid and invalid. >>> >>> Of course they can.&nbsp; Very few people ever write code that is valid C90 code
>>> and not valid C99 or C11 code.&nbsp; There are /very/ few incompatibilities. >>> About the only one of significance I can think of is that functions no longer >>> have implicit declarations.&nbsp; What examples can you give of reasonable,
valid
>>> C90 code that is not C99 compatible, or C99 code that is not C11 compatible? >>> Or C99 code that is not gnu99 compatible? >> >> You miss my point. > > Not unlikely! > >> >> I am not arguing about the /differences/ between, say C99 and C11, >> but C of /any/ standard. > > I still don't get it.&nbsp; Do you mean, say, the difference between C99 on gcc
and
> C99 on IAR ?&nbsp; Or the difference between C on an ARM and C on an AVR?
No. Very, /very/ few - if any - developers understand all the undefined and implementation defined behaviour of any standard of C or C++, and how to reconcile different parts of the same standard (or differing implementations of the same standard)! Statements from those that were involved in some of the standardisation efforts lead me to believe that's also the case for - commercial compiler writers, several of whom have had requested requested opinions about how standard clause X can be reconciled with standard clause Y - the committee itself, e.g. infamously with their refusal to believe C++ templates were Turing complete until somebody rubbed their faces in proof! No, I can't quote chapter and verse. While watching the shenanigans from the sidelines it became apparent to me that C/C++ was becoming (early/mid 90s) part of the problem as well as part of the solution.
Am 14.09.2018 um 20:56 schrieb David Brown:
> On 14/09/18 19:57, Stefan Reuther wrote: >> Am 14.09.2018 um 12:26 schrieb David Brown: >>> No, it is not - not for decades. In C, "string" is defined in 7.1.1 of >>> the current standard as "A string is a contiguous sequence of characters >>> terminated by and including the first null character." >> >> That is the library definition of "string". The language definition of a >> string literal is in &sect;6.4.5 which says: >> >> # In translation phase 7, a byte or code of value zero is appended to >> # each multibyte character sequence that results from a string literal >> # or literals. The multibyte character sequence is then used to >> # initialize an array of static storage duration and length just >> # sufficient to contain the sequence. [...] >> >> (paragraph 5 in ISO 9899:1999, paragraph 6 in n1548) > > Yes, I know the difference between a string and a string literal - I > said that in a bit you snipped. > > And the library definition of a "string" /is/ the C definition of > "string". The term is italicised in 7.1.1, denoting that it is defined > in that paragraph.
The library makes no mention of storage classes, meaning: it does not care about storage classes. The language defines what happens with a string literal in &sect;6.4.5p5/6, namely: construction of an array of static storage duration, with content that - lo and behold - fulfils the library's requirements of a string, by being a zero-terminated array of characters.
> I know that's what the OP's code needs in order to be correct - the > question is if the C standards guarantee that his code does what he wants.
The question was: is a string literal's address and content valid after the function containing it exits? The answer is: yes.
>> TL;DR: OP's code is safe, language-wise. > > Can you back that up with references to the standard?
See above. Stefan
On 16/09/18 02:28, Tom Gardner wrote:
> I know of three languages where subsetting is a practical > necessity: C, C++, and the SPARK subset of Ada. That's > significant. > > However SPARK's objective and *ability* is something that > C/C++ cannot even dream of: provably correct implementation > for use in high integrity systems. To get that you would have > to subset /any/ language (except possible NewSpeak!).
You must be unfamiliar with the *formally proven* C compiler that was part of the sel4 (formally-proven microkernel) project. I've met some of the people, who work here in Sydney. They proved the operating system guarantees *down to the silicon* for a specific ARM 9 chip (with MMU). Clifford Heath.
On 16/09/18 04:58, Clifford Heath wrote:
> On 16/09/18 02:28, Tom Gardner wrote: >> I know of three languages where subsetting is a practical >> necessity: C, C++, and the SPARK subset of Ada. That's >> significant. >> >> However SPARK's objective and *ability* is something that >> C/C++ cannot even dream of: provably correct implementation >> for use in high integrity systems. To get that you would have >> to subset /any/ language (except possible NewSpeak!). > > You must be unfamiliar with the *formally proven* C compiler > that was part of the sel4 (formally-proven microkernel) project. > I've met some of the people, who work here in Sydney. > > They proved the operating system guarantees *down to the silicon* > for a specific ARM 9 chip (with MMU).
Reference please, and has the claim withstood knowledgeable scrutiny? How did they formally prove the compiler? That in itself is "entertaining" for an incompletely defined language such as C. Does "down to the silicon" include the silicon itself? The A9s are complex devices with caches, interrupts etc. How was the effect of those taken into account in the proof?
On 16/09/18 08:42, Tom Gardner wrote:
> On 16/09/18 04:58, Clifford Heath wrote: >> On 16/09/18 02:28, Tom Gardner wrote: >>> I know of three languages where subsetting is a practical >>> necessity: C, C++, and the SPARK subset of Ada. That's >>> significant. >>> >>> However SPARK's objective and *ability* is something that >>> C/C++ cannot even dream of: provably correct implementation >>> for use in high integrity systems. To get that you would have >>> to subset /any/ language (except possible NewSpeak!). >> >> You must be unfamiliar with the *formally proven* C compiler >> that was part of the sel4 (formally-proven microkernel) project. >> I've met some of the people, who work here in Sydney. >> >> They proved the operating system guarantees *down to the silicon* >> for a specific ARM 9 chip (with MMU). > > Reference please, and has the claim withstood knowledgeable > scrutiny? > > How did they formally prove the compiler? That in itself is > "entertaining" for an incompletely defined language such as C. > > Does "down to the silicon" include the silicon itself? > > The A9s are complex devices with caches, interrupts etc. > How was the effect of those taken into account in the proof?
Just noticed you claim the /compiler/ was formally proven; too little coffee :( What are the caveats/limits to that proof? Was it a full standard C specification (which standard?) or just a subset of C. Basically I regard the claim of a formally specified C compiler as "extraordinary", and hence it requires "extraordinary proof".
On 16/09/18 17:42, Tom Gardner wrote:
> On 16/09/18 04:58, Clifford Heath wrote: >> On 16/09/18 02:28, Tom Gardner wrote: >>> I know of three languages where subsetting is a practical >>> necessity: C, C++, and the SPARK subset of Ada. That's >>> significant. >>> >>> However SPARK's objective and *ability* is something that >>> C/C++ cannot even dream of: provably correct implementation >>> for use in high integrity systems. To get that you would have >>> to subset /any/ language (except possible NewSpeak!). >> >> You must be unfamiliar with the *formally proven* C compiler >> that was part of the sel4 (formally-proven microkernel) project. >> I've met some of the people, who work here in Sydney. >> >> They proved the operating system guarantees *down to the silicon* >> for a specific ARM 9 chip (with MMU). > > Reference please, and has the claim withstood knowledgeable > scrutiny? > > How did they formally prove the compiler? That in itself is > "entertaining" for an incompletely defined language such as C. > > Does "down to the silicon" include the silicon itself? > > The A9s are complex devices with caches, interrupts etc. > How was the effect of those taken into account in the proof?
It it too hard for you to type "sel4" into Google? The first hit is <https://sel4.systems>, but there are many resources for this, including the backgrounder on L4 family microkernels on Wikipedia. You can download all the code, all the specifications (which are executable) and the compilers and provers, and run them yourself. This is the culmination of a massive multi-decade effort by some of the world's smartest people. It's not some maverick's hobby.
On 16/09/18 11:18, Clifford Heath wrote:
> On 16/09/18 17:42, Tom Gardner wrote: >> On 16/09/18 04:58, Clifford Heath wrote: >>> On 16/09/18 02:28, Tom Gardner wrote: >>>> I know of three languages where subsetting is a practical >>>> necessity: C, C++, and the SPARK subset of Ada. That's >>>> significant. >>>> >>>> However SPARK's objective and *ability* is something that >>>> C/C++ cannot even dream of: provably correct implementation >>>> for use in high integrity systems. To get that you would have >>>> to subset /any/ language (except possible NewSpeak!). >>> >>> You must be unfamiliar with the *formally proven* C compiler >>> that was part of the sel4 (formally-proven microkernel) project. >>> I've met some of the people, who work here in Sydney. >>> >>> They proved the operating system guarantees *down to the silicon* >>> for a specific ARM 9 chip (with MMU). >> >> Reference please, and has the claim withstood knowledgeable >> scrutiny? >> >> How did they formally prove the compiler? That in itself is >> "entertaining" for an incompletely defined language such as C. >> >> Does "down to the silicon" include the silicon itself? >> >> The A9s are complex devices with caches, interrupts etc. >> How was the effect of those taken into account in the proof? > > It it too hard for you to type "sel4" into Google?
Google orders results on what it thinks you are interested in :( It is easy for you to provide a definitive link.
> The first hit is <https://sel4.systems>, but there are > many resources for this, including the backgrounder > on L4 family microkernels on Wikipedia.
The sel4 FAQ includes "The reduced proof assumptions mean that we do not need to trust the compiler or linker, but there may still be faults remaining in specific low-level parts of the kernel (TLB, cache handling, handwritten assembly, boot code). These parts are thoroughly manually reviewed." and "From C code to binary code, we employ a set of automated widely-used verification tools. These are: SONOLAR, Z3, Isabelle/HOL, and HOL4." That looks like the *compiler* isn't formally verified. It also makes me ask the question as to whether C is "merely" an intermediate language autogenerated from a HOL spec. Having said that, the commercial deployment is certainly interesting and significant.
> You can download all the code, all the specifications > (which are executable) and the compilers and provers, > and run them yourself.
Sorry, life's too short!
> This is the culmination of a massive multi-decade effort > by some of the world's smartest people. It's not some > maverick's hobby.
I've seen many such activities since the early 80s, some of them also based on LCF. They went nowhere.
On 16/09/18 21:25, Tom Gardner wrote:
> On 16/09/18 11:18, Clifford Heath wrote: >> On 16/09/18 17:42, Tom Gardner wrote: >>> On 16/09/18 04:58, Clifford Heath wrote: >>>> On 16/09/18 02:28, Tom Gardner wrote: >>>>> I know of three languages where subsetting is a practical >>>>> necessity: C, C++, and the SPARK subset of Ada. That's >>>>> significant. >>>>> >>>>> However SPARK's objective and *ability* is something that >>>>> C/C++ cannot even dream of: provably correct implementation >>>>> for use in high integrity systems. To get that you would have >>>>> to subset /any/ language (except possible NewSpeak!). >>>> >>>> You must be unfamiliar with the *formally proven* C compiler >>>> that was part of the sel4 (formally-proven microkernel) project. >>>> I've met some of the people, who work here in Sydney. >>>> >>>> They proved the operating system guarantees *down to the silicon* >>>> for a specific ARM 9 chip (with MMU). >>> >>> Reference please, and has the claim withstood knowledgeable >>> scrutiny? >>> >>> How did they formally prove the compiler? That in itself is >>> "entertaining" for an incompletely defined language such as C. >>> >>> Does "down to the silicon" include the silicon itself? >>> >>> The A9s are complex devices with caches, interrupts etc. >>> How was the effect of those taken into account in the proof? >> >> It it too hard for you to type "sel4" into Google? > > Google orders results on what it thinks you are interested > in :( It is easy for you to provide a definitive link. > > >> The first hit is <https://sel4.systems>, but there are >> many resources for this, including the backgrounder >> on L4 family microkernels on Wikipedia. > > The sel4 FAQ includes "The reduced proof assumptions mean > that we do not need to trust the compiler or linker, but > there may still be faults remaining in specific low-level > parts of the kernel (TLB, cache handling, handwritten > assembly, boot code). These parts are thoroughly manually > reviewed." and "From C code to binary code, we employ a > set of automated widely-used verification tools. These > are: SONOLAR, Z3, Isabelle/HOL, and HOL4." > > That looks like the *compiler* isn't formally verified.
The C code is annotated with special comments that contain guidance for the automated provers. The provers analyse the C code, and the final binary, and (using knowledge of both C and the ARM instruction set) can formally prove the equivalence. So no, the compiler is not proven - but all of its specific outputs are proven to be an accurate translation of the respective inputs. That's good enough. It might not work for all possible C code, but it doesn't need to; it just needs to work for the actual code being compiled.
> It also makes me ask the question as to whether C is > "merely" an intermediate language autogenerated from a > HOL spec.
No. However the specification (the invariants that must be proved) are written in literate Haskell.
>> This is the culmination of a massive multi-decade effort >> by some of the world's smartest people. It's not some >> maverick's hobby. > > I've seen many such activities since the early 80s, some > of them also based on LCF. They went nowhere.
There has been no result like this anywhere before, period. Besides that, sel4 and its precursors are already in significant deployments, but guess what? They're all secure systems and don't tend to talk about it. Clifford Heath.
On 15/09/18 18:28, Tom Gardner wrote:
> On 15/09/18 15:24, David Brown wrote: >> On 14/09/18 21:51, Tom Gardner wrote: >>> On 14/09/18 19:37, David Brown wrote: >>>> On 14/09/18 19:45, Tom Gardner wrote: >>>>> On 14/09/18 16:02, David Brown wrote: >> >> <snip> >> >>>>>> With C, there are three standards to choose from - C90, C99 and C11. >>>>>> (There is an upcoming C17, but it is basically a bug-fix and >>>>>> occasional >>>>>> improved wording, so there is not much to be found there except nicer >>>>>> typesetting.) >>>>>> >>>>>> And there are also implementation-specific extensions - both compiler >>>>>> specific and target-specific. >>>>> >>>>> Don't forget the commercially important /subsets/! The need >>>>> for subsets is a warning flag in itself. >>>>> >>>> >>>> What need of "commercially important subsets" are you talking about? >>> >>> MISRA-C, for example. >> >> That is a set of rules for C, rather than a subset as such. > > That's hairsplitting, which obscures rather than illuminates.
It is no different than programming in Forth, Ada, Python, Fortran, or any other language. What makes you think C is special?
> > >> (And 90% of it could be replaced by three rules - 1. Learn the >> language properly and don't write invalid code.&nbsp; 2. Write in a >> readable fashion.&nbsp; 3. Get proper tools and learn how to use them.) > > Nonsense.
You haven't read the Misra rules, have you? I didn't mention that the remaining 10% of the rules go against the C standards and compiler norms by inventing their own non-standard type system. I suppose that /does/ mean it counts as another type of C, but it is certainly not a standard one.
> > >> Of course if a specific task, project, employer or customer has rules >> for how you work, you need to know them and follow them.&nbsp; But that is >> not "a subset of the language", or the fault of the language. > > I know of three languages where subsetting is a practical > necessity: C, C++, and the SPARK subset of Ada. That's > significant.
I think you'll find it applies to /all/ programming languages. For some languages, there is enough consensus in these "subsets" and enough popularity of them that they are known popularly - like SPARK and MISRA. For others, they will be smaller. If company guidelines say that when programming Python you must use capital letters for class names and never write "from module import *", then you have a subset of the language just as surely as if they say "use MISRA for C".
> > However SPARK's objective and *ability* is something that > C/C++ cannot even dream of: provably correct implementation > for use in high integrity systems. To get that you would have > to subset /any/ language (except possible NewSpeak!). >
I know that is SPARK's objective. Most sets of guidelines or coding standards aim towards more reliable and less error-prone coding (though few aspire to claiming they can be provably correct).
> > >>>> There /are/ compilers that implement subsets of C standards.&nbsp; But I >>>> don't see a /need/ for them. >>>> >>>>> >>>>> >>>>>> Generally, it is not hard to know which C version you are dealing >>>>>> with, >>>>>> and there is very little code that is valid in one version that is >>>>>> not >>>>>> equally valid in later versions. >>>>> >>>>> That statement, while correct, skims over a significant problem: >>>>> unfortunately too few people understand and can /predict/ what is >>>>> valid and invalid. >>>> >>>> Of course they can.&nbsp; Very few people ever write code that is valid >>>> C90 code and not valid C99 or C11 code.&nbsp; There are /very/ few >>>> incompatibilities. About the only one of significance I can think of >>>> is that functions no longer have implicit declarations.&nbsp; What >>>> examples can you give of reasonable, valid C90 code that is not C99 >>>> compatible, or C99 code that is not C11 compatible? Or C99 code that >>>> is not gnu99 compatible? >>> >>> You miss my point. >> >> Not unlikely! >> >>> >>> I am not arguing about the /differences/ between, say C99 and C11, >>> but C of /any/ standard. >> >> I still don't get it.&nbsp; Do you mean, say, the difference between C99 on >> gcc and C99 on IAR ?&nbsp; Or the difference between C on an ARM and C on >> an AVR? > > No. Very, /very/ few - if any - developers understand > all the undefined and implementation defined behaviour > of any standard of C or C++, and how to reconcile different > parts of the same standard (or differing implementations > of the same standard)!
I think I see what you are getting at now - you are saying there are things in C that are undefined (or implementation-defined) that many C programmers don't realise are a problem. Yes, I can agree with that - though I think the problem is often overrated. The great majority of undefined behaviour in C is obviously undefined because there is no sensible definition for it - and there is no way anyone could reasonably expect code to "work". That applies to things like buffer overruns, out of bounds pointers, etc. The most obvious category of undefined behaviours where people get things wrong are, I would say, accessing data through incompatible pointers and signed integer overflow. In the case of signed integer overflow, the code is almost certainly wrong regardless of any definition of the behaviour. The rest is just a matter of not knowing the language as well as they should, and thus getting things wrong. That happens in all languages. I am not in any way saying that C or C++ are perfect languages, or that their standards are ideal, or anything like that. I am merely saying they are not as bad as many people make out, and not different in principle from other languages.
> > Statements from those that were involved in some of the > standardisation efforts lead me to believe that's also > the case for > &nbsp;- commercial compiler writers, several of whom have had > &nbsp;&nbsp; requested requested opinions about how standard clause > &nbsp;&nbsp; X can be reconciled with standard clause Y > &nbsp;- the committee itself, e.g. infamously with their refusal to > &nbsp;&nbsp; believe C++ templates were Turing complete until somebody > &nbsp;&nbsp; rubbed their faces in proof! > > No, I can't quote chapter and verse. While watching the > shenanigans from the sidelines it became apparent to me > that C/C++ was becoming (early/mid 90s) part of the problem > as well as part of the solution.