EmbeddedRelated.com
Forums
The 2024 Embedded Online Conference

C++ threads versus PThreads for embedded Linux on ARM micro

Started by gp.k...@gmail.com July 20, 2018
We're starting an embedded Linux C++ project with an ARM micro and using GCC V7.  Can anyone suggest pros and cons of using C++ Threads versus PThreads (Posix threads).

On 20/07/18 13:01, gp.kiwi@gmail.com wrote:
> We're starting an embedded Linux C++ project with an ARM micro and using GCC V7. Can anyone suggest pros and cons of using C++ Threads versus PThreads (Posix threads). >
C++ threads are always a wrapper around an underlying library. So if you are using C++ on Linux, the C++ threads /are/ pthreads. These are the points I can think of for preferring C++ threads: + You have a nice class/template library with C++ threads, instead of a C function interface. + You have have RAII classes for locks and other synchronisation objects. + You have consistency with other C++ thread systems. + Your compiler may understand that your code is threaded. - You need at least C++11 (but that has huge advantages anyway, compared to older C++). - It is marginally more fiddly if you need the underlying thread details for features not supported by the C++ thread library.
On Saturday, July 21, 2018 at 6:19:31 AM UTC+12, David Brown wrote:
> On 20/07/18 13:01, gp...@gmail.com wrote: > > We're starting an embedded Linux C++ project with an ARM micro and using GCC V7. Can anyone suggest pros and cons of using C++ Threads versus PThreads (Posix threads). > > > > C++ threads are always a wrapper around an underlying library. So if > you are using C++ on Linux, the C++ threads /are/ pthreads. These are > the points I can think of for preferring C++ threads: > > + You have a nice class/template library with C++ threads, instead of a > C function interface. > > + You have have RAII classes for locks and other synchronisation objects. > > + You have consistency with other C++ thread systems. > > + Your compiler may understand that your code is threaded. > > - You need at least C++11 (but that has huge advantages anyway, compared > to older C++). > > - It is marginally more fiddly if you need the underlying thread details > for features not supported by the C++ thread library.
That's great, thanks.
graeme.prentice@gmail.com writes:
> That's great, thanks.
Also, ask yourself if you really need threads in the first place. Depending on what you're doing, you may be better off with multiple processes. That gets rid of a lot of lock and race hazards, and if the processes can communicate through sockets, that improves scalability by making it easier for you to distribute your program across multiple machines if you run out of cpu cores on your original machine.
On Saturday, July 21, 2018 at 1:28:02 PM UTC+12, Paul Rubin wrote:
> > Also, ask yourself if you really need threads in the first place. > Depending on what you're doing, you may be better off with multiple > processes. That gets rid of a lot of lock and race hazards, and if the > processes can communicate through sockets, that improves scalability by > making it easier for you to distribute your program across multiple > machines if you run out of cpu cores on your original machine.
Thanks for the suggestion. The micro is an ARM9 LPC3250 SOM (we're forced to use this at the moment) which I believe is single core (it's hard to find out for some reason) but it could easily change in future. Based on a previous project, race conditions and deadlocks are a major headache so I'm hoping the core data will be written to by one thread only, maybe with lock-free queues. The CPU data cache is 32KB and it's probably "write through". We would have to do some performance tests to see if multi-processes and sockets is viable.
On 21/07/18 08:58, graeme.prentice@gmail.com wrote:
> On Saturday, July 21, 2018 at 1:28:02 PM UTC+12, Paul Rubin wrote: >> >> Also, ask yourself if you really need threads in the first place. >> Depending on what you're doing, you may be better off with multiple >> processes.
There are certainly tasks that are better handled as multiple processes rather than multiple threads. (But note that it is not an either/or choice - often the best solution uses both.)
>> That gets rid of a lot of lock and race hazards,
No, it does not - it merely changes them. If your separate threads of execution need to synchronise, communicate, or agree about shared resources, then there is no theoretical difference about the types of hazards, races, or other such problems if you use multiple threads or multiple processes. The details change, and the types of synchronisation objects use can change, but they do no not go away. Some may be handled by the OS rather than the application, however - for example, a pipe between processes will let you communicate without worry about locks for the underlying shared data structure, at the cost of being a lot less efficient than shared memory in threads. Multiple processes have higher resource costs, and they make it a lot harder to use tools such as "-fsanitize=thread" to find problems. On the other hand, they make it easier to break the problem down into separate tasks that are handled independently and tested independently. That helps if you have different developers - or even different programming languages.
>> and if the >> processes can communicate through sockets, that improves scalability by >> making it easier for you to distribute your program across multiple >> machines if you run out of cpu cores on your original machine.
True. This can also be useful during development when you might have some of the bits running on your target system, and other bits running on your host computer (perhaps under a debugger).
> > Thanks for the suggestion. The micro is an ARM9 LPC3250 SOM (we're > forced to use this at the moment) which I believe is single core (it's > hard to find out for some reason) but it could easily change in future.
That doesn't matter for the choice of threads, processes or both.
> Based on a previous project, race conditions and deadlocks are a major > headache so I'm hoping the core data will be written to by one thread > only, maybe with lock-free queues. The CPU data cache is 32KB and it's > probably "write through". We would have to do some performance tests to > see if multi-processes and sockets is viable.
Multiple processes are slower than multiple threads, and sockets are much slower than in-process queues. But the sockets are more flexible. You might find you want an abstraction that can use either method as a backend, and change during different stages of development.
On 21.7.18 09:58, graeme.prentice@gmail.com wrote:
> Thanks for the suggestion. The micro is an ARM9 LPC3250 SOM (we're forced to use this at the moment) which I believe is single core (it's hard to find out for some reason) but it could easily change in future. Based on a previous project, race conditions and deadlocks are a major headache so I'm hoping the core data will be written to by one thread only, maybe with lock-free queues. The CPU data cache is 32KB and it's probably "write through". We would have to do some performance tests to see if multi-processes and sockets is viable.
For number of cores on your system, if you have Linux running on the target, have a look at /proc/cpuinfo: tauno@pi2:~ $ cat /proc/cpuinfo processor : 0 model name : ARMv7 Processor rev 5 (v7l) BogoMIPS : 38.40 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xc07 CPU revision : 5 processor : 1 model name : ARMv7 Processor rev 5 (v7l) BogoMIPS : 38.40 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xc07 CPU revision : 5 processor : 2 model name : ARMv7 Processor rev 5 (v7l) BogoMIPS : 38.40 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xc07 CPU revision : 5 processor : 3 model name : ARMv7 Processor rev 5 (v7l) BogoMIPS : 38.40 Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm CPU implementer : 0x41 CPU architecture: 7 CPU variant : 0x0 CPU part : 0xc07 CPU revision : 5 Hardware : BCM2835 Revision : a01041 Serial : 0000000064d34ba1 --- The above is from a Raspberry Pi 2. -- -TV
gp.kiwi@gmail.com wrote:
> We're starting an embedded Linux C++ project with an ARM micro and > using GCC V7. Can anyone suggest pros and cons of using C++ Threads > versus PThreads (Posix threads). >
I'd suggest thinking about a design for how you'd measure which works in context for you. If I run pthreads on a big Linux machine it'll be different from running them in a VM. Similarly, it'll be different on a RasPi 3 sized ARM computer. -- Les Cargill
On 07/20/2018 11:58 PM, graeme.prentice@gmail.com wrote:
> On Saturday, July 21, 2018 at 1:28:02 PM UTC+12, Paul Rubin wrote: >> >> Also, ask yourself if you really need threads in the first place. >> Depending on what you're doing, you may be better off with multiple >> processes. That gets rid of a lot of lock and race hazards, and if the >> processes can communicate through sockets, that improves scalability by >> making it easier for you to distribute your program across multiple >> machines if you run out of cpu cores on your original machine. > > Thanks for the suggestion. The micro is an ARM9 LPC3250 SOM (we're forced to use this at the moment) which I believe is single core (it's hard to find out for some reason) but it could easily change in future. Based on a previous project, race conditions and deadlocks are a major headache so I'm hoping the core data will be written to by one thread only, maybe with lock-free queues. The CPU data cache is 32KB and it's probably "write through". We would have to do some performance tests to see if multi-processes and sockets is viable. >
3250 is single-core. Happens to be a part we use a lot around here, although we always go bare-metal rather than running Linux. Cache is programmable through the page-table as to whether it's write-through or not. The reason you're having trouble determining much of this information is that NXP bought large chunks of that chip wholesale from ARM without anyone there actually understanding it. So the NXP documentation is spotty and occasionally wrong (let me tell you of our I2C-based woes). There's a document available directly from ARM, ARM DDI 0198E, that is specifically the ARM926EJ-S Technical Reference Manual. Getting into the details on the 3250 is nearly impossible without it. -- Rob Gaddi, Highland Technology -- www.highlandtechnology.com Email address domain is currently out of order. See above to fix.
You can also communicate among processes through shared memory (e.g. mmap).

To look at a other way, processes require explicit sharing, threads share
implicitly.

On an embedded system, the heavier cost of process switching may be
important.


The 2024 Embedded Online Conference