arm-gcc: pointer to constant string| page 3

Reply by David Brown ●September 15, 20182018-09-15

On 15/09/18 01:59, Richard Damon wrote:
> On 9/14/18 6:26 AM, David Brown wrote:
>>   In particular, for the ARM it would
>> be very strange for a compiler to do anything that would not involve
>> having the string literal at a fixed place in read-only flash.
> 
> I can easily think of cases where the string would not be accessed from
> Read-Only flash on the ARM, the simplest is a system where the program
> was stored in a non-execute-in-place media, and was copied by some from
> of loader into RAM. The string would then exist in RAM, not flash, and
> while it may not be enforced 'Read Only' the program (unless it has
> performed UB) is not going to invalidate the string.
> 

Fair enough - there is no need for the string to be in flash or 
read-only memory.  But it is likely that it would, at least logically, 
be at a fixed address in the memory map somewhere.

Reply by Tom Gardner ●September 15, 20182018-09-15

On 15/09/18 15:24, David Brown wrote:
> On 14/09/18 21:51, Tom Gardner wrote:
>> On 14/09/18 19:37, David Brown wrote:
>>> On 14/09/18 19:45, Tom Gardner wrote:
>>>> On 14/09/18 16:02, David Brown wrote:
> 
> <snip>
> 
>>>>> With C, there are three standards to choose from - C90, C99 and C11.
>>>>> (There is an upcoming C17, but it is basically a bug-fix and occasional
>>>>> improved wording, so there is not much to be found there except nicer
>>>>> typesetting.)
>>>>>
>>>>> And there are also implementation-specific extensions - both compiler
>>>>> specific and target-specific.
>>>>
>>>> Don't forget the commercially important /subsets/! The need
>>>> for subsets is a warning flag in itself.
>>>>
>>>
>>> What need of "commercially important subsets" are you talking about?
>>
>> MISRA-C, for example.
> 
> That is a set of rules for C, rather than a subset as such.  

That's hairsplitting, which obscures rather than illuminates.


> (And 90% of it 
> could be replaced by three rules - 1. Learn the language properly and don't 
> write invalid code.&nbsp; 2. Write in a readable fashion.&nbsp; 3. Get proper tools and 
> learn how to use them.)

Nonsense.


> Of course if a specific task, project, employer or customer has rules for how 
> you work, you need to know them and follow them.&nbsp; But that is not "a subset of 
> the language", or the fault of the language.

I know of three languages where subsetting is a practical
necessity: C, C++, and the SPARK subset of Ada. That's
significant.

However SPARK's objective and *ability* is something that
C/C++ cannot even dream of: provably correct implementation
for use in high integrity systems. To get that you would have
to subset /any/ language (except possible NewSpeak!).



>>> There /are/ compilers that implement subsets of C standards.&nbsp; But I don't see 
>>> a /need/ for them.
>>>
>>>>
>>>>
>>>>> Generally, it is not hard to know which C version you are dealing with,
>>>>> and there is very little code that is valid in one version that is not
>>>>> equally valid in later versions.
>>>>
>>>> That statement, while correct, skims over a significant problem:
>>>> unfortunately too few people understand and can /predict/ what is
>>>> valid and invalid.
>>>
>>> Of course they can.&nbsp; Very few people ever write code that is valid C90 code 
>>> and not valid C99 or C11 code.&nbsp; There are /very/ few incompatibilities.  
>>> About the only one of significance I can think of is that functions no longer 
>>> have implicit declarations.&nbsp; What examples can you give of reasonable, valid 
>>> C90 code that is not C99 compatible, or C99 code that is not C11 compatible?  
>>> Or C99 code that is not gnu99 compatible?
>>
>> You miss my point.
> 
> Not unlikely!
> 
>>
>> I am not arguing about the /differences/ between, say C99 and C11,
>> but C of /any/ standard.
> 
> I still don't get it.&nbsp; Do you mean, say, the difference between C99 on gcc and 
> C99 on IAR ?&nbsp; Or the difference between C on an ARM and C on an AVR?

No. Very, /very/ few - if any - developers understand
all the undefined and implementation defined behaviour
of any standard of C or C++, and how to reconcile different
parts of the same standard (or differing implementations
of the same standard)!

Statements from those that were involved in some of the
standardisation efforts lead me to believe that's also
the case for
  - commercial compiler writers, several of whom have had
    requested requested opinions about how standard clause
    X can be reconciled with standard clause Y
  - the committee itself, e.g. infamously with their refusal to
    believe C++ templates were Turing complete until somebody
    rubbed their faces in proof!

No, I can't quote chapter and verse. While watching the
shenanigans from the sidelines it became apparent to me
that C/C++ was becoming (early/mid 90s) part of the problem
as well as part of the solution.

Reply by Stefan Reuther ●September 15, 20182018-09-15

Am 14.09.2018 um 20:56 schrieb David Brown:
> On 14/09/18 19:57, Stefan Reuther wrote:
>> Am 14.09.2018 um 12:26 schrieb David Brown:
>>> No, it is not - not for decades.  In C, "string" is defined in 7.1.1 of
>>> the current standard as "A string is a contiguous sequence of characters
>>> terminated by and including the first null character."
>>
>> That is the library definition of "string". The language definition of a
>> string literal is in &sect;6.4.5 which says:
>>
>> # In translation phase 7, a byte or code of value zero is appended to
>> # each multibyte character sequence that results from a string literal
>> # or literals. The multibyte character sequence is then used to
>> # initialize an array of static storage duration and length just
>> # sufficient to contain the sequence. [...]
>>
>> (paragraph 5 in ISO 9899:1999, paragraph 6 in n1548)
> 
> Yes, I know the difference between a string and a string literal - I
> said that in a bit you snipped.
> 
> And the library definition of a "string" /is/ the C definition of
> "string".  The term is italicised in 7.1.1, denoting that it is defined
> in that paragraph.

The library makes no mention of storage classes, meaning: it does not
care about storage classes.

The language defines what happens with a string literal in &sect;6.4.5p5/6,
namely: construction of an array of static storage duration, with
content that - lo and behold - fulfils the library's requirements of a
string, by being a zero-terminated array of characters.

> I know that's what the OP's code needs in order to be correct - the
> question is if the C standards guarantee that his code does what he wants.

The question was: is a string literal's address and content valid after
the function containing it exits? The answer is: yes.

>> TL;DR: OP's code is safe, language-wise.
> 
> Can you back that up with references to the standard?

See above.


  Stefan

Reply by Clifford Heath ●September 16, 20182018-09-16

On 16/09/18 02:28, Tom Gardner wrote:
> I know of three languages where subsetting is a practical
> necessity: C, C++, and the SPARK subset of Ada. That's
> significant.
> 
> However SPARK's objective and *ability* is something that
> C/C++ cannot even dream of: provably correct implementation
> for use in high integrity systems. To get that you would have
> to subset /any/ language (except possible NewSpeak!).

You must be unfamiliar with the *formally proven* C compiler
that was part of the sel4 (formally-proven microkernel) project.
I've met some of the people, who work here in Sydney.

They proved the operating system guarantees *down to the silicon*
for a specific ARM 9 chip (with MMU).

Clifford Heath.

Reply by Tom Gardner ●September 16, 20182018-09-16

On 16/09/18 04:58, Clifford Heath wrote:
> On 16/09/18 02:28, Tom Gardner wrote:
>> I know of three languages where subsetting is a practical
>> necessity: C, C++, and the SPARK subset of Ada. That's
>> significant.
>>
>> However SPARK's objective and *ability* is something that
>> C/C++ cannot even dream of: provably correct implementation
>> for use in high integrity systems. To get that you would have
>> to subset /any/ language (except possible NewSpeak!).
> 
> You must be unfamiliar with the *formally proven* C compiler
> that was part of the sel4 (formally-proven microkernel) project.
> I've met some of the people, who work here in Sydney.
> 
> They proved the operating system guarantees *down to the silicon*
> for a specific ARM 9 chip (with MMU).

Reference please, and has the claim withstood knowledgeable
scrutiny?

How did they formally prove the compiler? That in itself is
"entertaining" for an incompletely defined language such as C.

Does "down to the silicon" include the silicon itself?

The A9s are complex devices with caches, interrupts etc.
How was the effect of those taken into account in the proof?

Reply by Tom Gardner ●September 16, 20182018-09-16

On 16/09/18 08:42, Tom Gardner wrote:
> On 16/09/18 04:58, Clifford Heath wrote:
>> On 16/09/18 02:28, Tom Gardner wrote:
>>> I know of three languages where subsetting is a practical
>>> necessity: C, C++, and the SPARK subset of Ada. That's
>>> significant.
>>>
>>> However SPARK's objective and *ability* is something that
>>> C/C++ cannot even dream of: provably correct implementation
>>> for use in high integrity systems. To get that you would have
>>> to subset /any/ language (except possible NewSpeak!).
>>
>> You must be unfamiliar with the *formally proven* C compiler
>> that was part of the sel4 (formally-proven microkernel) project.
>> I've met some of the people, who work here in Sydney.
>>
>> They proved the operating system guarantees *down to the silicon*
>> for a specific ARM 9 chip (with MMU).
> 
> Reference please, and has the claim withstood knowledgeable
> scrutiny?
> 
> How did they formally prove the compiler? That in itself is
> "entertaining" for an incompletely defined language such as C.
> 
> Does "down to the silicon" include the silicon itself?
> 
> The A9s are complex devices with caches, interrupts etc.
> How was the effect of those taken into account in the proof?

Just noticed you claim the /compiler/ was formally proven;
too little coffee :(

What are the caveats/limits to that proof?

Was it a full standard C specification (which standard?)
or just a subset of C.

Basically I regard the claim of a formally specified C
compiler as "extraordinary", and hence it requires
"extraordinary proof".

Reply by Clifford Heath ●September 16, 20182018-09-16

On 16/09/18 17:42, Tom Gardner wrote:
> On 16/09/18 04:58, Clifford Heath wrote:
>> On 16/09/18 02:28, Tom Gardner wrote:
>>> I know of three languages where subsetting is a practical
>>> necessity: C, C++, and the SPARK subset of Ada. That's
>>> significant.
>>>
>>> However SPARK's objective and *ability* is something that
>>> C/C++ cannot even dream of: provably correct implementation
>>> for use in high integrity systems. To get that you would have
>>> to subset /any/ language (except possible NewSpeak!).
>>
>> You must be unfamiliar with the *formally proven* C compiler
>> that was part of the sel4 (formally-proven microkernel) project.
>> I've met some of the people, who work here in Sydney.
>>
>> They proved the operating system guarantees *down to the silicon*
>> for a specific ARM 9 chip (with MMU).
> 
> Reference please, and has the claim withstood knowledgeable
> scrutiny?
> 
> How did they formally prove the compiler? That in itself is
> "entertaining" for an incompletely defined language such as C.
> 
> Does "down to the silicon" include the silicon itself?
> 
> The A9s are complex devices with caches, interrupts etc.
> How was the effect of those taken into account in the proof?

It it too hard for you to type "sel4" into Google?

The first hit is <https://sel4.systems>, but there are
many resources for this, including the backgrounder
on L4 family microkernels on Wikipedia.

You can download all the code, all the specifications
(which are executable) and the compilers and provers,
and run them yourself.

This is the culmination of a massive multi-decade effort
by some of the world's smartest people. It's not some
maverick's hobby.

Reply by Tom Gardner ●September 16, 20182018-09-16

On 16/09/18 11:18, Clifford Heath wrote:
> On 16/09/18 17:42, Tom Gardner wrote:
>> On 16/09/18 04:58, Clifford Heath wrote:
>>> On 16/09/18 02:28, Tom Gardner wrote:
>>>> I know of three languages where subsetting is a practical
>>>> necessity: C, C++, and the SPARK subset of Ada. That's
>>>> significant.
>>>>
>>>> However SPARK's objective and *ability* is something that
>>>> C/C++ cannot even dream of: provably correct implementation
>>>> for use in high integrity systems. To get that you would have
>>>> to subset /any/ language (except possible NewSpeak!).
>>>
>>> You must be unfamiliar with the *formally proven* C compiler
>>> that was part of the sel4 (formally-proven microkernel) project.
>>> I've met some of the people, who work here in Sydney.
>>>
>>> They proved the operating system guarantees *down to the silicon*
>>> for a specific ARM 9 chip (with MMU).
>>
>> Reference please, and has the claim withstood knowledgeable
>> scrutiny?
>>
>> How did they formally prove the compiler? That in itself is
>> "entertaining" for an incompletely defined language such as C.
>>
>> Does "down to the silicon" include the silicon itself?
>>
>> The A9s are complex devices with caches, interrupts etc.
>> How was the effect of those taken into account in the proof?
> 
> It it too hard for you to type "sel4" into Google?

Google orders results on what it thinks you are interested
in :( It is easy for you to provide a definitive link.

> The first hit is <https://sel4.systems>, but there are
> many resources for this, including the backgrounder
> on L4 family microkernels on Wikipedia.

The sel4 FAQ includes "The reduced proof assumptions mean
that we do not need to trust the compiler or linker, but
there may still be faults remaining in specific low-level
parts of the kernel (TLB, cache handling, handwritten
assembly, boot code). These parts are thoroughly manually
reviewed." and "From C code to binary code, we employ a
set of automated widely-used verification tools. These
are: SONOLAR, Z3, Isabelle/HOL, and HOL4."

That looks like the *compiler* isn't formally verified.
It also makes me ask the question as to whether C is
"merely" an intermediate language autogenerated from a
HOL spec.

Having said that, the commercial deployment is certainly
interesting and significant.

> You can download all the code, all the specifications
> (which are executable) and the compilers and provers,
> and run them yourself.

Sorry, life's too short!

> This is the culmination of a massive multi-decade effort
> by some of the world's smartest people. It's not some
> maverick's hobby.

I've seen many such activities since the early 80s, some
of them also based on LCF. They went nowhere.

Reply by Clifford Heath ●September 16, 20182018-09-16

On 16/09/18 21:25, Tom Gardner wrote:
> On 16/09/18 11:18, Clifford Heath wrote:
>> On 16/09/18 17:42, Tom Gardner wrote:
>>> On 16/09/18 04:58, Clifford Heath wrote:
>>>> On 16/09/18 02:28, Tom Gardner wrote:
>>>>> I know of three languages where subsetting is a practical
>>>>> necessity: C, C++, and the SPARK subset of Ada. That's
>>>>> significant.
>>>>>
>>>>> However SPARK's objective and *ability* is something that
>>>>> C/C++ cannot even dream of: provably correct implementation
>>>>> for use in high integrity systems. To get that you would have
>>>>> to subset /any/ language (except possible NewSpeak!).
>>>>
>>>> You must be unfamiliar with the *formally proven* C compiler
>>>> that was part of the sel4 (formally-proven microkernel) project.
>>>> I've met some of the people, who work here in Sydney.
>>>>
>>>> They proved the operating system guarantees *down to the silicon*
>>>> for a specific ARM 9 chip (with MMU).
>>>
>>> Reference please, and has the claim withstood knowledgeable
>>> scrutiny?
>>>
>>> How did they formally prove the compiler? That in itself is
>>> "entertaining" for an incompletely defined language such as C.
>>>
>>> Does "down to the silicon" include the silicon itself?
>>>
>>> The A9s are complex devices with caches, interrupts etc.
>>> How was the effect of those taken into account in the proof?
>>
>> It it too hard for you to type "sel4" into Google?
> 
> Google orders results on what it thinks you are interested
> in :( It is easy for you to provide a definitive link.
> 
> 
>> The first hit is <https://sel4.systems>, but there are
>> many resources for this, including the backgrounder
>> on L4 family microkernels on Wikipedia.
> 
> The sel4 FAQ includes "The reduced proof assumptions mean
> that we do not need to trust the compiler or linker, but
> there may still be faults remaining in specific low-level
> parts of the kernel (TLB, cache handling, handwritten
> assembly, boot code). These parts are thoroughly manually
> reviewed." and "From C code to binary code, we employ a
> set of automated widely-used verification tools. These
> are: SONOLAR, Z3, Isabelle/HOL, and HOL4."
> 
> That looks like the *compiler* isn't formally verified.

The C code is annotated with special comments that contain
guidance for the automated provers. The provers analyse
the C code, and the final binary, and (using knowledge of
both C and the ARM instruction set) can formally prove
the equivalence.

So no, the compiler is not proven - but all of its
specific outputs are proven to be an accurate translation
of the respective inputs. That's good enough.

It might not work for all possible C code, but it doesn't
need to; it just needs to work for the actual code being
compiled.

> It also makes me ask the question as to whether C is
> "merely" an intermediate language autogenerated from a
> HOL spec.

No. However the specification (the invariants that must
be proved) are written in literate Haskell.

>> This is the culmination of a massive multi-decade effort
>> by some of the world's smartest people. It's not some
>> maverick's hobby.
> 
> I've seen many such activities since the early 80s, some
> of them also based on LCF. They went nowhere.

There has been no result like this anywhere before,
period.

Besides that, sel4 and its precursors are already in
significant deployments, but guess what? They're all
secure systems and don't tend to talk about it.

Clifford Heath.

Reply by David Brown ●September 16, 20182018-09-16

On 15/09/18 18:28, Tom Gardner wrote:
> On 15/09/18 15:24, David Brown wrote:
>> On 14/09/18 21:51, Tom Gardner wrote:
>>> On 14/09/18 19:37, David Brown wrote:
>>>> On 14/09/18 19:45, Tom Gardner wrote:
>>>>> On 14/09/18 16:02, David Brown wrote:
>>
>> <snip>
>>
>>>>>> With C, there are three standards to choose from - C90, C99 and C11.
>>>>>> (There is an upcoming C17, but it is basically a bug-fix and 
>>>>>> occasional
>>>>>> improved wording, so there is not much to be found there except nicer
>>>>>> typesetting.)
>>>>>>
>>>>>> And there are also implementation-specific extensions - both compiler
>>>>>> specific and target-specific.
>>>>>
>>>>> Don't forget the commercially important /subsets/! The need
>>>>> for subsets is a warning flag in itself.
>>>>>
>>>>
>>>> What need of "commercially important subsets" are you talking about?
>>>
>>> MISRA-C, for example.
>>
>> That is a set of rules for C, rather than a subset as such. 
> 
> That's hairsplitting, which obscures rather than illuminates.

It is no different than programming in Forth, Ada, Python, Fortran, or 
any other language.  What makes you think C is special?

> 
> 
>> (And 90% of it could be replaced by three rules - 1. Learn the 
>> language properly and don't write invalid code.&nbsp; 2. Write in a 
>> readable fashion.&nbsp; 3. Get proper tools and learn how to use them.)
> 
> Nonsense.

You haven't read the Misra rules, have you?  I didn't mention that the 
remaining 10% of the rules go against the C standards and compiler norms 
by inventing their own non-standard type system.  I suppose that /does/ 
mean it counts as another type of C, but it is certainly not a standard one.

> 
> 
>> Of course if a specific task, project, employer or customer has rules 
>> for how you work, you need to know them and follow them.&nbsp; But that is 
>> not "a subset of the language", or the fault of the language.
> 
> I know of three languages where subsetting is a practical
> necessity: C, C++, and the SPARK subset of Ada. That's
> significant.

I think you'll find it applies to /all/ programming languages.  For some 
languages, there is enough consensus in these "subsets" and enough 
popularity of them that they are known popularly - like SPARK and MISRA. 
  For others, they will be smaller.  If company guidelines say that when 
programming Python you must use capital letters for class names and 
never write "from module import *", then you have a subset of the 
language just as surely as if they say "use MISRA for C".

> 
> However SPARK's objective and *ability* is something that
> C/C++ cannot even dream of: provably correct implementation
> for use in high integrity systems. To get that you would have
> to subset /any/ language (except possible NewSpeak!).
> 

I know that is SPARK's objective.  Most sets of guidelines or coding 
standards aim towards more reliable and less error-prone coding (though 
few aspire to claiming they can be provably correct).

> 
> 
>>>> There /are/ compilers that implement subsets of C standards.&nbsp; But I 
>>>> don't see a /need/ for them.
>>>>
>>>>>
>>>>>
>>>>>> Generally, it is not hard to know which C version you are dealing 
>>>>>> with,
>>>>>> and there is very little code that is valid in one version that is 
>>>>>> not
>>>>>> equally valid in later versions.
>>>>>
>>>>> That statement, while correct, skims over a significant problem:
>>>>> unfortunately too few people understand and can /predict/ what is
>>>>> valid and invalid.
>>>>
>>>> Of course they can.&nbsp; Very few people ever write code that is valid 
>>>> C90 code and not valid C99 or C11 code.&nbsp; There are /very/ few 
>>>> incompatibilities. About the only one of significance I can think of 
>>>> is that functions no longer have implicit declarations.&nbsp; What 
>>>> examples can you give of reasonable, valid C90 code that is not C99 
>>>> compatible, or C99 code that is not C11 compatible? Or C99 code that 
>>>> is not gnu99 compatible?
>>>
>>> You miss my point.
>>
>> Not unlikely!
>>
>>>
>>> I am not arguing about the /differences/ between, say C99 and C11,
>>> but C of /any/ standard.
>>
>> I still don't get it.&nbsp; Do you mean, say, the difference between C99 on 
>> gcc and C99 on IAR ?&nbsp; Or the difference between C on an ARM and C on 
>> an AVR?
> 
> No. Very, /very/ few - if any - developers understand
> all the undefined and implementation defined behaviour
> of any standard of C or C++, and how to reconcile different
> parts of the same standard (or differing implementations
> of the same standard)!

I think I see what you are getting at now - you are saying there are 
things in C that are undefined (or implementation-defined) that many C 
programmers don't realise are a problem.  Yes, I can agree with that - 
though I think the problem is often overrated.  The great majority of 
undefined behaviour in C is obviously undefined because there is no 
sensible definition for it - and there is no way anyone could reasonably 
expect code to "work".  That applies to things like buffer overruns, out 
of bounds pointers, etc.  The most obvious category of undefined 
behaviours where people get things wrong are, I would say, accessing 
data through incompatible pointers and signed integer overflow.  In the 
case of signed integer overflow, the code is almost certainly wrong 
regardless of any definition of the behaviour.

The rest is just a matter of not knowing the language as well as they 
should, and thus getting things wrong.  That happens in all languages.

I am not in any way saying that C or C++ are perfect languages, or that 
their standards are ideal, or anything like that.  I am merely saying 
they are not as bad as many people make out, and not different in 
principle from other languages.

> 
> Statements from those that were involved in some of the
> standardisation efforts lead me to believe that's also
> the case for
>  &nbsp;- commercial compiler writers, several of whom have had
>  &nbsp;&nbsp; requested requested opinions about how standard clause
>  &nbsp;&nbsp; X can be reconciled with standard clause Y
>  &nbsp;- the committee itself, e.g. infamously with their refusal to
>  &nbsp;&nbsp; believe C++ templates were Turing complete until somebody
>  &nbsp;&nbsp; rubbed their faces in proof!
> 
> No, I can't quote chapter and verse. While watching the
> shenanigans from the sidelines it became apparent to me
> that C/C++ was becoming (early/mid 90s) part of the problem
> as well as part of the solution.