Engineering degree for embedded systems| page 7

Reply by Paul Rubin ●August 8, 20172017-08-08

David Brown <david.brown@hesbynett.no> writes:
> all bugs - be it UB, overflows, misunderstandings about the
> specifications, mistakes in the specifications, incorrect algorithms,
> incorrect functions - whatever.  UB is not special in that way.

Yes UB is special.  All those non-UB bugs you mention will have a
defined behaviour that just isn't the behaviour that you wanted.  UB, as
the name implies, has no defined behaviour at all: anything can happen,
including the proverbial nasal demons.

> And what do you mean by "this becomes political" ?

I can't speak for Les, but guaranteeing C programs to be free of UB is
so difficult that one can debate whether writing complex critical
programs in C is morally irresponsible.  That type of debate tends to
take on a political flavor like PC vs Mac, Emacs vs Vi, and other
similar burning issues.

Reply by Tom Gardner ●August 8, 20172017-08-08

On 08/08/17 08:22, Paul Rubin wrote:
> I can't speak for Les, but guaranteeing C programs to be free of UB is
> so difficult that one can debate whether writing complex critical
> programs in C is morally irresponsible.  That type of debate tends to
> take on a political flavor like PC vs Mac, Emacs vs Vi, and other
> similar burning issues.

Yes, in all respects.

And more people /think/ they can avoid UB than
can actually achieve that nirvana. That's
dangerous Dunning-Krueger territory.

Reply by David Brown ●August 8, 20172017-08-08

On 07/08/17 19:40, Tom Gardner wrote:
> On 07/08/17 17:36, Phil Hobbs wrote:

>> My stuff is all pthreads, because std::thread didn't exist at the time,
>> but it does now, so presumably Boehm's input has been taken into account.
> 
> I'm told C/C++12 /finally/ has a memory model, so perhaps
> that will (a few decades too late) ameliorate the problem.
> We'll see, but I'm not holding my breath.

You mean C11/C++11 ?  There are, I believe, very minor differences
between the memory models of C11 and C++11, but they are basically the
same.  And they provide the required synchronisation and barrier
mechanisms in a standard form.  Whether people will use them
appropriately or not, is another matter.  In the embedded world there
seems to be a fair proportion of people that still think C89 is a fine
standard to use.  Standard atomics and fences in embedded C basically
means gcc 4.9 or newer, when C11 support was complete.  For C++ it was a
little earlier.  I don't know what other C or C++ compilers for embedded
use have C11/C++11 support, but gcc is the main one, especially for
modern standards support.  GNU ARM Embedded had 4.9 at the end of 2014,
but it takes time for manufacturer-supplied toolchains to update.

So yes, C11/C++11 solves the problem in a standardised way - but it will
certainly take time before updated tools are in common use, and before
people make use of the new features.  I suspect this will happen mainly
in the C++ world, where C++11 is a very significant change from older
C++ and it can make sense to move to C++11 almost as a new language.
Even then, I expect most people will either rely on their OS primitives
to handle barriers and fences, or use simple full barriers:

C11:
	atomic_thread_fence(memory_order_seq_cst);

C++11:
	std::atomic_thread_fence(std::memory_order_seq_cst);

replacing

gcc Cortex-M:
	asm volatile("dmb" : : : "memory");

Linux:
	mb()

The tools have all existed, even though C and C++ did not have memory
models before C11/C++11.  cpus, OS's, and compilers all had memory
models, even though they might not have been explicitly documented.

And people got some things right, and some things wrong, at that time.
I think the same thing will apply now that they /have/ memory models.

Reply by Tom Gardner ●August 8, 20172017-08-08

On 08/08/17 09:23, David Brown wrote:
> On 07/08/17 19:40, Tom Gardner wrote:
>> On 07/08/17 17:36, Phil Hobbs wrote:
>
>>> My stuff is all pthreads, because std::thread didn't exist at the time,
>>> but it does now, so presumably Boehm's input has been taken into account.
>>
>> I'm told C/C++12 /finally/ has a memory model, so perhaps
>> that will (a few decades too late) ameliorate the problem.
>> We'll see, but I'm not holding my breath.
>
> You mean C11/C++11 ?

Oh.... picky picky picky :)

> Whether people will use them
> appropriately or not, is another matter.

My experience is that they won't. That's for two reasons:
1) not really understanding threading/synchronisation
issues, because they are only touched upon in schools.
Obviously that problem is language agnostic.
2) any subtleties in the C/C++ specification and
implementation "suboptimalities"; I expect those will
exist :(

Plus, of course, as you note below...

> In the embedded world there
> seems to be a fair proportion of people that still think C89 is a fine
> standard to use.

...

> So yes, C11/C++11 solves the problem in a standardised way - but it will
> certainly take time before updated tools are in common use, and before
> people make use of the new features.

ISTR that in the early-mid naughties there was a triumphant
announcement of the first /complete/ C or C++ compiler - 5
or 6 years after the standard was published! Of course many
compilers had implemented a usable subset before that.

No, didn't save a reference :(

> And people got some things right, and some things wrong, at that time.
> I think the same thing will apply now that they /have/ memory models.

Agreed.

I'm gobsmacked that it took C/C++ so long to get around to
that /fundamental/ requirement. The absence and the delay
reflects very badly on the C/C++ community.

Reply by David Brown ●August 8, 20172017-08-08

On 08/08/17 10:46, Tom Gardner wrote:
> On 08/08/17 09:23, David Brown wrote:
>> On 07/08/17 19:40, Tom Gardner wrote:
>>> On 07/08/17 17:36, Phil Hobbs wrote:
>>
>>>> My stuff is all pthreads, because std::thread didn't exist at the time,
>>>> but it does now, so presumably Boehm's input has been taken into
>>>> account.
>>>
>>> I'm told C/C++12 /finally/ has a memory model, so perhaps
>>> that will (a few decades too late) ameliorate the problem.
>>> We'll see, but I'm not holding my breath.
>>
>> You mean C11/C++11 ?
> 
> Oh.... picky picky picky :)

Well, if you decide to look this up on google, it should save you a bit
of false starts.

> 
>> Whether people will use them
>> appropriately or not, is another matter.
> 
> My experience is that they won't. That's for two reasons:
> 1) not really understanding threading/synchronisation
> issues, because they are only touched upon in schools.
> Obviously that problem is language agnostic.

Agreed.  This stuff is hard to understand if you want to get correct
/and/ optimally efficient.

> 2) any subtleties in the C/C++ specification and
> implementation "suboptimalities"; I expect those will
> exist :(

I have read through the specs and implementation information - quite a
bit of work has gone into making it possible to write safe code that is
more efficient than was previously possible (or at least practical).  It
is not so relevant for small embedded systems, where you generally have
a single core and little in the way of write buffers - there is not
much, if anything, to be gained by replacing blunt full memory barriers
with tuned load-acquire and store-release operations.  But for bigger
systems with multiple cpus, a full barrier can cost hundreds of cycles.

There is one "suboptimality" - the "consume" memory order.  It's a bit
weird, in that it is mainly relevant to the Alpha architecture, whose
memory model is so weak that in "x = *p;" it can fetch the contents of
*p before seeing the latest update of p.  Because the C11 and C++11
specs are not clear enough on "consume", all implementations (AFAIK)
bump this up to the stronger "acquire", which may be slightly slower on
some architectures.

> 
> Plus, of course, as you note below...
> 
>> In the embedded world there
>> seems to be a fair proportion of people that still think C89 is a fine
>> standard to use.
> 
> ...
> 
>> So yes, C11/C++11 solves the problem in a standardised way - but it will
>> certainly take time before updated tools are in common use, and before
>> people make use of the new features.
> 
> ISTR that in the early-mid naughties there was a triumphant
> announcement of the first /complete/ C or C++ compiler - 5
> or 6 years after the standard was published! Of course many
> compilers had implemented a usable subset before that.
> 

Things have changed a good deal since then.  The major C++ compilers
(gcc, clang, MSVC) have complete C++11 and C++14 support, with gcc and
clang basically complete on the C++17 final drafts.  gcc has "concepts",
slated for C++20 pretty much "as is", and MSVC and clang have prototype
"modules" which are also expected for C++20 (probably based on MSVC's
slightly better version).

<http://en.cppreference.com/w/cpp/compiler_support>

These days a feature does not make it into the C or C++ standards unless
there is a working implementation in at least one major toolchain to
test it out in practice.

> No, didn't save a reference :(
> 
> 
>> And people got some things right, and some things wrong, at that time.
>> I think the same thing will apply now that they /have/ memory models.
> 
> Agreed.
> 
> I'm gobsmacked that it took C/C++ so long to get around to
> that /fundamental/ requirement. The absence and the delay
> reflects very badly on the C/C++ community.

As I said, people managed fine without it.  Putting together a memory
model that the C folks and C++ folks could agree on for all the
platforms they support is not a trivial effort - and I am very glad they
agreed here.  Of course I agree that it would have been nice to have had
it earlier.  The thread support (as distinct from the atomic support,
including memory models) is far too little, far too late and I doubt if
it will have much use.

Reply by Tom Gardner ●August 8, 20172017-08-08

On 08/08/17 10:26, David Brown wrote:
> On 08/08/17 10:46, Tom Gardner wrote:
>> On 08/08/17 09:23, David Brown wrote:
>>> On 07/08/17 19:40, Tom Gardner wrote:
>>>> On 07/08/17 17:36, Phil Hobbs wrote:
>>>
>>>>> My stuff is all pthreads, because std::thread didn't exist at the time,
>>>>> but it does now, so presumably Boehm's input has been taken into
>>>>> account.
>>>>
>>>> I'm told C/C++12 /finally/ has a memory model, so perhaps
>>>> that will (a few decades too late) ameliorate the problem.
>>>> We'll see, but I'm not holding my breath.
>>>
>>> You mean C11/C++11 ?
>>
>> Oh.... picky picky picky :)
>
> Well, if you decide to look this up on google, it should save you a bit
> of false starts.

Unfortunately google doesn't prevent idiots from making
tyupos :) (Or is that fortunately?)

>>> Whether people will use them
>>> appropriately or not, is another matter.
>>
>> My experience is that they won't. That's for two reasons:
>> 1) not really understanding threading/synchronisation
>> issues, because they are only touched upon in schools.
>> Obviously that problem is language agnostic.
>
> Agreed.  This stuff is hard to understand if you want to get correct
> /and/ optimally efficient.
>
>> 2) any subtleties in the C/C++ specification and
>> implementation "suboptimalities"; I expect those will
>> exist :(
>
> I have read through the specs and implementation information - quite a
> bit of work has gone into making it possible to write safe code that is
> more efficient than was previously possible (or at least practical).  It
> is not so relevant for small embedded systems, where you generally have
> a single core and little in the way of write buffers - there is not
> much, if anything, to be gained by replacing blunt full memory barriers
> with tuned load-acquire and store-release operations.  But for bigger
> systems with multiple cpus, a full barrier can cost hundreds of cycles.

Agreed, with the caveat that "small" ain't what it used to be.
Consider Zynqs: dual-core ARMs with caches and, obviously, FPGA
fabric.
Consider single 32 core MCUs for &#4294967295;25 one-off. (xCORE)
There are many other examples, and that trend will continue.


> There is one "suboptimality" - the "consume" memory order.  It's a bit
> weird, in that it is mainly relevant to the Alpha architecture, whose
> memory model is so weak that in "x = *p;" it can fetch the contents of
> *p before seeing the latest update of p.  Because the C11 and C++11
> specs are not clear enough on "consume", all implementations (AFAIK)
> bump this up to the stronger "acquire", which may be slightly slower on
> some architectures.

One of C/C++'s problems is deciding to cater for, um,
weird and obsolete architectures. I see /why/ they do
that, but on Mondays Wednesdays and Fridays I'd prefer
a concentration on doing common architectures simply
and well.



>> ISTR that in the early-mid naughties there was a triumphant
>> announcement of the first /complete/ C or C++ compiler - 5
>> or 6 years after the standard was published! Of course many
>> compilers had implemented a usable subset before that.
>>
>
> Things have changed a good deal since then.  The major C++ compilers
> (gcc, clang, MSVC) have complete C++11 and C++14 support, with gcc and
> clang basically complete on the C++17 final drafts.  gcc has "concepts",
> slated for C++20 pretty much "as is", and MSVC and clang have prototype
> "modules" which are also expected for C++20 (probably based on MSVC's
> slightly better version).
>
> <http://en.cppreference.com/w/cpp/compiler_support>
>
> These days a feature does not make it into the C or C++ standards unless
> there is a working implementation in at least one major toolchain to
> test it out in practice.

Yes, but I presume that was also the case in the naughties.
(I gave up following the detailed C/C++ shenanigans during the
interminable "cast away constness" philosophical discussions)

The point was about the first compiler that (belatedly)
correctly implemented /all/ the features.


>>> And people got some things right, and some things wrong, at that time.
>>> I think the same thing will apply now that they /have/ memory models.
>>
>> Agreed.
>>
>> I'm gobsmacked that it took C/C++ so long to get around to
>> that /fundamental/ requirement. The absence and the delay
>> reflects very badly on the C/C++ community.
>
> As I said, people managed fine without it.  Putting together a memory
> model that the C folks and C++ folks could agree on for all the
> platforms they support is not a trivial effort - and I am very glad they
> agreed here.  Of course I agree that it would have been nice to have had
> it earlier.  The thread support (as distinct from the atomic support,
> including memory models) is far too little, far too late and I doubt if
> it will have much use.

While there is no doubt people /thought/ they managed,
it is less clear cut that it was "fine".

I'm disappointed that thread support might not be as
useful as desired, but memory model and atomic is more
important.

Reply by David Brown ●August 8, 20172017-08-08

On 08/08/17 09:22, Paul Rubin wrote:
> David Brown <david.brown@hesbynett.no> writes:
>> all bugs - be it UB, overflows, misunderstandings about the
>> specifications, mistakes in the specifications, incorrect algorithms,
>> incorrect functions - whatever.  UB is not special in that way.
> 
> Yes UB is special.  All those non-UB bugs you mention will have a
> defined behaviour that just isn't the behaviour that you wanted.  UB, as
> the name implies, has no defined behaviour at all: anything can happen,
> including the proverbial nasal demons.

Bugs are problems, no matter whether they have defined behaviour or
undefined behaviour.  But it is sometimes possible to limit the damage
caused by a bug, and it can certainly be possible to make it easier or
harder to detect.

The real question is, would it help to give a definition to typical C
"undefined behaviour" like signed integer overflows or access outside of
array bounds?

Let's take the first case - signed integer overflows.  If you want to
give a defined behaviour, you pick one of several mechanisms.  You could
use two's complement wraparound.  You could use saturated arithmetic.
You could use "trap representations" - like NaN in floating point.  You
could have an exception mechanism like C++.  You could have an error
handler mechanism.  You could have a software interrupt or trap.

Giving a defined "ordinary" behaviour like wrapping would be simple and
appear efficient.  However, it would mean that the compiler would be
unable to spot problems at compile time (the best time to spot bugs!),
and it would stop the compiler from a number of optimisations that let
the programmer write simple, clear code while relying on the compiler to
generate good results.

Any kind of trap or error handler would necessitate a good deal of extra
run-time costs, and negate even more optimisations.  The compiler could
not even simplify "x + y - y" to "x" because "x + y" might overflow.

It is usually a simple matter for a programmer to avoid signed integer
overflow.  Common methods include switching to unsigned integers, or
simply increasing the size of the integer types.

Debugging tools can help spot problems, such as the "sanitizers" in gcc
and clang, but these are of limited use in embedded systems.

Array bound checking would also involve a good deal of run-time
overhead, as well as re-writing of C code (since you would need to track
bounds as well as pointers).  And what do you do when you have found an
error?

C is like a chainsaw.  It is very powerful, and lets you do a lot of
work quickly - but it is also dangerous if you don't know what you are
doing.  Remember, however, that no matter how safe and idiot-proof your
tree-cutting equipment is, you are still at risk from the falling tree.

> 
>> And what do you mean by "this becomes political" ?
> 
> I can't speak for Les, but guaranteeing C programs to be free of UB is
> so difficult that one can debate whether writing complex critical
> programs in C is morally irresponsible.  That type of debate tends to
> take on a political flavor like PC vs Mac, Emacs vs Vi, and other
> similar burning issues.
> 

I would certainly agree that a good deal of code that is written in C,
should have been written in other languages.  It is not the right tool
for every job.  But it /is/ the right tool for many jobs - and UB is
part of what makes it the right tool.  However, you need to understand
what UB is, how to avoid it, and how the concept can be an advantage.

Reply by David Brown ●August 8, 20172017-08-08

On 08/08/17 12:09, Tom Gardner wrote:
> On 08/08/17 10:26, David Brown wrote:
>> On 08/08/17 10:46, Tom Gardner wrote:
>>> On 08/08/17 09:23, David Brown wrote:
>>>> On 07/08/17 19:40, Tom Gardner wrote:
>>>>> On 07/08/17 17:36, Phil Hobbs wrote:
>>>>
>>>>>> My stuff is all pthreads, because std::thread didn't exist at the
>>>>>> time,
>>>>>> but it does now, so presumably Boehm's input has been taken into
>>>>>> account.
>>>>>
>>>>> I'm told C/C++12 /finally/ has a memory model, so perhaps
>>>>> that will (a few decades too late) ameliorate the problem.
>>>>> We'll see, but I'm not holding my breath.
>>>>
>>>> You mean C11/C++11 ?
>>>
>>> Oh.... picky picky picky :)
>>
>> Well, if you decide to look this up on google, it should save you a bit
>> of false starts.
> 
> Unfortunately google doesn't prevent idiots from making
> tyupos :) (Or is that fortunately?)
> 
>>>> Whether people will use them
>>>> appropriately or not, is another matter.
>>>
>>> My experience is that they won't. That's for two reasons:
>>> 1) not really understanding threading/synchronisation
>>> issues, because they are only touched upon in schools.
>>> Obviously that problem is language agnostic.
>>
>> Agreed.  This stuff is hard to understand if you want to get correct
>> /and/ optimally efficient.
>>
>>> 2) any subtleties in the C/C++ specification and
>>> implementation "suboptimalities"; I expect those will
>>> exist :(
>>
>> I have read through the specs and implementation information - quite a
>> bit of work has gone into making it possible to write safe code that is
>> more efficient than was previously possible (or at least practical).  It
>> is not so relevant for small embedded systems, where you generally have
>> a single core and little in the way of write buffers - there is not
>> much, if anything, to be gained by replacing blunt full memory barriers
>> with tuned load-acquire and store-release operations.  But for bigger
>> systems with multiple cpus, a full barrier can cost hundreds of cycles.
> 
> Agreed, with the caveat that "small" ain't what it used to be.
> Consider Zynqs: dual-core ARMs with caches and, obviously, FPGA
> fabric.

True.  I'd be happy to see people continue to use full memory barriers -
they may not be speed optimal, but they will lead to correct code.  Let
those who understand the more advanced synchronisation stuff use
acquire-release.  And of course a key point is for people to use RTOS
features when they can - again, using a mutex or semaphore might not be
as efficient as a fancy lock-free algorithm, but it is better to be safe
than fast.

> Consider single 32 core MCUs for &#4294967295;25 one-off. (xCORE)

The xCORE is a bit different, as is the language you use and the style
of the code.  Message passing is a very neat way to swap data between
threads or cores, and is inherently safer than shared memory.

> There are many other examples, and that trend will continue.
> 

Yes.

> 
>> There is one "suboptimality" - the "consume" memory order.  It's a bit
>> weird, in that it is mainly relevant to the Alpha architecture, whose
>> memory model is so weak that in "x = *p;" it can fetch the contents of
>> *p before seeing the latest update of p.  Because the C11 and C++11
>> specs are not clear enough on "consume", all implementations (AFAIK)
>> bump this up to the stronger "acquire", which may be slightly slower on
>> some architectures.
> 
> One of C/C++'s problems is deciding to cater for, um,
> weird and obsolete architectures. I see /why/ they do
> that, but on Mondays Wednesdays and Fridays I'd prefer
> a concentration on doing common architectures simply
> and well.
> 

In general, I agree.  In this particular case, the Alpha is basically
obsolete - but it is certainly possible that future cpu designs would
have equally weak memory models.  Such a weak model is easier to make
faster in hardware - you need less synchronisation, cache snooping, and
other such details.

> 
> 
>>> ISTR that in the early-mid naughties there was a triumphant
>>> announcement of the first /complete/ C or C++ compiler - 5
>>> or 6 years after the standard was published! Of course many
>>> compilers had implemented a usable subset before that.
>>>
>>
>> Things have changed a good deal since then.  The major C++ compilers
>> (gcc, clang, MSVC) have complete C++11 and C++14 support, with gcc and
>> clang basically complete on the C++17 final drafts.  gcc has "concepts",
>> slated for C++20 pretty much "as is", and MSVC and clang have prototype
>> "modules" which are also expected for C++20 (probably based on MSVC's
>> slightly better version).
>>
>> <http://en.cppreference.com/w/cpp/compiler_support>
>>
>> These days a feature does not make it into the C or C++ standards unless
>> there is a working implementation in at least one major toolchain to
>> test it out in practice.
> 
> Yes, but I presume that was also the case in the naughties.

No, not to the same extent.  Things move faster now, especially in the
C++ world.  C++ is on a three year update cycle now.  The first ISO
standard was C++98, with C++03 being a minor update 5 years later.  It
took until C++11 to get a real new version (with massive changes) - and
now we are getting real, significant improvements every 3 years.

> (I gave up following the detailed C/C++ shenanigans during the
> interminable "cast away constness" philosophical discussions)
> 
> The point was about the first compiler that (belatedly)
> correctly implemented /all/ the features.
> 
> 
>>>> And people got some things right, and some things wrong, at that time.
>>>> I think the same thing will apply now that they /have/ memory models.
>>>
>>> Agreed.
>>>
>>> I'm gobsmacked that it took C/C++ so long to get around to
>>> that /fundamental/ requirement. The absence and the delay
>>> reflects very badly on the C/C++ community.
>>
>> As I said, people managed fine without it.  Putting together a memory
>> model that the C folks and C++ folks could agree on for all the
>> platforms they support is not a trivial effort - and I am very glad they
>> agreed here.  Of course I agree that it would have been nice to have had
>> it earlier.  The thread support (as distinct from the atomic support,
>> including memory models) is far too little, far too late and I doubt if
>> it will have much use.
> 
> While there is no doubt people /thought/ they managed,
> it is less clear cut that it was "fine".
> 
> I'm disappointed that thread support might not be as
> useful as desired, but memory model and atomic is more
> important.

The trouble with thread support in C11/C++11 is that it is limited to
very simple features - mutexes, condition variables and simple threads.
 But real-world use needs priorities, semaphores, queues, timers, and
many other features.  Once you are using RTOS-specific API's for all
these, you would use the RTOS API's for thread and mutexes as well
rather than <thread.h> calls.

Reply by Tom Gardner ●August 8, 20172017-08-08

On 08/08/17 11:56, David Brown wrote:
> On 08/08/17 12:09, Tom Gardner wrote:
>> On 08/08/17 10:26, David Brown wrote:
>>> On 08/08/17 10:46, Tom Gardner wrote:
>>>> On 08/08/17 09:23, David Brown wrote:
>>>>> On 07/08/17 19:40, Tom Gardner wrote:
>>>>>> On 07/08/17 17:36, Phil Hobbs wrote:
>>>>>
>>>>>>> My stuff is all pthreads, because std::thread didn't exist at the
>>>>>>> time,
>>>>>>> but it does now, so presumably Boehm's input has been taken into
>>>>>>> account.
>>>>>>
>>>>>> I'm told C/C++12 /finally/ has a memory model, so perhaps
>>>>>> that will (a few decades too late) ameliorate the problem.
>>>>>> We'll see, but I'm not holding my breath.
>>>>>
>>>>> You mean C11/C++11 ?
>>>>
>>>> Oh.... picky picky picky :)
>>>
>>> Well, if you decide to look this up on google, it should save you a bit
>>> of false starts.
>>
>> Unfortunately google doesn't prevent idiots from making
>> tyupos :) (Or is that fortunately?)
>>
>>>>> Whether people will use them
>>>>> appropriately or not, is another matter.
>>>>
>>>> My experience is that they won't. That's for two reasons:
>>>> 1) not really understanding threading/synchronisation
>>>> issues, because they are only touched upon in schools.
>>>> Obviously that problem is language agnostic.
>>>
>>> Agreed.  This stuff is hard to understand if you want to get correct
>>> /and/ optimally efficient.
>>>
>>>> 2) any subtleties in the C/C++ specification and
>>>> implementation "suboptimalities"; I expect those will
>>>> exist :(
>>>
>>> I have read through the specs and implementation information - quite a
>>> bit of work has gone into making it possible to write safe code that is
>>> more efficient than was previously possible (or at least practical).  It
>>> is not so relevant for small embedded systems, where you generally have
>>> a single core and little in the way of write buffers - there is not
>>> much, if anything, to be gained by replacing blunt full memory barriers
>>> with tuned load-acquire and store-release operations.  But for bigger
>>> systems with multiple cpus, a full barrier can cost hundreds of cycles.
>>
>> Agreed, with the caveat that "small" ain't what it used to be.
>> Consider Zynqs: dual-core ARMs with caches and, obviously, FPGA
>> fabric.
>
> True.  I'd be happy to see people continue to use full memory barriers -
> they may not be speed optimal, but they will lead to correct code.  Let
> those who understand the more advanced synchronisation stuff use
> acquire-release.  And of course a key point is for people to use RTOS
> features when they can - again, using a mutex or semaphore might not be
> as efficient as a fancy lock-free algorithm, but it is better to be safe
> than fast.

Agreed.


>> Consider single 32 core MCUs for &#4294967295;25 one-off. (xCORE)
>
> The xCORE is a bit different, as is the language you use and the style
> of the code.  Message passing is a very neat way to swap data between
> threads or cores, and is inherently safer than shared memory.

Well, you can program xCOREs in C/C++, but I haven't
investigated that on the principle that I want to "kick
the tyres" of xC.

ISTR seeing that the "interface" mechanisms in xC are
shared memory underneath, optionally involving memory
copies. That is plausible since xC interfaces have an
"asynchronous nonblocking" "notify" and "clear
notification" annotations on methods. Certainly they
are convenient to use and get around some pain points
in pure CSP message passing.

I'm currently in two minds as to whether I like
any departure from CSP purity :)


>>> There is one "suboptimality" - the "consume" memory order.  It's a bit
>>> weird, in that it is mainly relevant to the Alpha architecture, whose
>>> memory model is so weak that in "x = *p;" it can fetch the contents of
>>> *p before seeing the latest update of p.  Because the C11 and C++11
>>> specs are not clear enough on "consume", all implementations (AFAIK)
>>> bump this up to the stronger "acquire", which may be slightly slower on
>>> some architectures.
>>
>> One of C/C++'s problems is deciding to cater for, um,
>> weird and obsolete architectures. I see /why/ they do
>> that, but on Mondays Wednesdays and Fridays I'd prefer
>> a concentration on doing common architectures simply
>> and well.
>>
>
> In general, I agree.  In this particular case, the Alpha is basically
> obsolete - but it is certainly possible that future cpu designs would
> have equally weak memory models.  Such a weak model is easier to make
> faster in hardware - you need less synchronisation, cache snooping, and
> other such details.

Reasonable, but given the current fixation on the mirage
of globally-coherent memory, I wonder whether that is a
lost cause.

Sooner or later people will have to come to terms with
non-global memory and multicore processing and (preferably)
message passing. Different abstractions and tools /will/
be required. Why not start now, from a good sound base?
Why hobble next-gen tools with last-gen problems?


>> I'm disappointed that thread support might not be as
>> useful as desired, but memory model and atomic is more
>> important.
>
> The trouble with thread support in C11/C++11 is that it is limited to
> very simple features - mutexes, condition variables and simple threads.
>  But real-world use needs priorities, semaphores, queues, timers, and
> many other features.  Once you are using RTOS-specific API's for all
> these, you would use the RTOS API's for thread and mutexes as well
> rather than <thread.h> calls.

That makes a great deal of sense to me, and it
brings into question how much it is worth bothering
about it in C/C++. No doubt I'll come to my senses
before too long :)

Reply by David Brown ●August 8, 20172017-08-08

On 08/08/17 16:56, Tom Gardner wrote:
> On 08/08/17 11:56, David Brown wrote:
>> On 08/08/17 12:09, Tom Gardner wrote:
>>> On 08/08/17 10:26, David Brown wrote:

> 
> 
>>> Consider single 32 core MCUs for &#4294967295;25 one-off. (xCORE)
>>
>> The xCORE is a bit different, as is the language you use and the style
>> of the code.  Message passing is a very neat way to swap data between
>> threads or cores, and is inherently safer than shared memory.
> 
> Well, you can program xCOREs in C/C++, but I haven't
> investigated that on the principle that I want to "kick
> the tyres" of xC.
> 
> ISTR seeing that the "interface" mechanisms in xC are
> shared memory underneath, optionally involving memory
> copies. That is plausible since xC interfaces have an
> "asynchronous nonblocking" "notify" and "clear
> notification" annotations on methods. Certainly they
> are convenient to use and get around some pain points
> in pure CSP message passing.

The actual message passing can be done in several ways.  IIRC, it will
use shared memory within the same cpu (8 logical cores), and channels
("real" message passing) between cpus.

However, as long as it logically uses message passing then it is up to
the tools to get the details right - it frees the programmer from having
to understand about ordering, barriers, etc.

> 
> I'm currently in two minds as to whether I like
> any departure from CSP purity :)
> 
> 
>>>> There is one "suboptimality" - the "consume" memory order.  It's a bit
>>>> weird, in that it is mainly relevant to the Alpha architecture, whose
>>>> memory model is so weak that in "x = *p;" it can fetch the contents of
>>>> *p before seeing the latest update of p.  Because the C11 and C++11
>>>> specs are not clear enough on "consume", all implementations (AFAIK)
>>>> bump this up to the stronger "acquire", which may be slightly slower on
>>>> some architectures.
>>>
>>> One of C/C++'s problems is deciding to cater for, um,
>>> weird and obsolete architectures. I see /why/ they do
>>> that, but on Mondays Wednesdays and Fridays I'd prefer
>>> a concentration on doing common architectures simply
>>> and well.
>>>
>>
>> In general, I agree.  In this particular case, the Alpha is basically
>> obsolete - but it is certainly possible that future cpu designs would
>> have equally weak memory models.  Such a weak model is easier to make
>> faster in hardware - you need less synchronisation, cache snooping, and
>> other such details.
> 
> Reasonable, but given the current fixation on the mirage
> of globally-coherent memory, I wonder whether that is a
> lost cause.
> 
> Sooner or later people will have to come to terms with
> non-global memory and multicore processing and (preferably)
> message passing. Different abstractions and tools /will/
> be required. Why not start now, from a good sound base?
> Why hobble next-gen tools with last-gen problems?
> 

That is /precisely/ the point - if you view it from the other side.  A
key way to implement message passing, is to use shared memory underneath
- but you isolate the messy details from the ignorant programmer.  If
you have write the message passing library correctly, using features
such as "consume" orders, then the high-level programmer can think of
passing messages while the library and the compiler conspire to give
optimal correct code even on very weak memory model cpus.

You are never going to get away from shared memory systems - for some
kind of multi-threaded applications, it is much, much more efficient
than memory passing.  But it would be good if multi-threaded apps used
message passing more often, as it is easier to get correct.

> 
>>> I'm disappointed that thread support might not be as
>>> useful as desired, but memory model and atomic is more
>>> important.
>>
>> The trouble with thread support in C11/C++11 is that it is limited to
>> very simple features - mutexes, condition variables and simple threads.
>>  But real-world use needs priorities, semaphores, queues, timers, and
>> many other features.  Once you are using RTOS-specific API's for all
>> these, you would use the RTOS API's for thread and mutexes as well
>> rather than <thread.h> calls.
> 
> That makes a great deal of sense to me, and it
> brings into question how much it is worth bothering
> about it in C/C++. No doubt I'll come to my senses
> before too long :)