EmbeddedRelated.com
Forums

C++ threads versus PThreads for embedded Linux on ARM micro

Started by gp.k...@gmail.com July 20, 2018
On Thu, 02 Aug 2018 12:24:49 +0200, David Brown
<david.brown@hesbynett.no> wrote:

>On 01/08/18 21:11, StateMachineCOM wrote: >> On Wednesday, August 1, 2018 at 2:46:08 PM UTC-4, >> upsid...@downunder.com wrote: >>> It seems Cummings has reinvented the wheel :-). Those principles >>> were used already in the 1970's to implement real time systems >>> under RSX-11 on PDP-11. Later on these principles were also used on >>> real time systems under RMX-80 for 8080 and similar kernels. >> >> I recommended Cummings's article only because most of the >> participants of this discussion seem to be firmly in the "sequential >> programming with sharing and blocking" camp. > >I can't speak for anyone else in this discussion, but I am in the >"Sequential programming with sharing and blocking is /one/ way to handle >things, but not the only way. Indeed, there /is/ no single right way" camp. > >> For this traditional way >> of thinking, the full immersion into the reactive programming >> principles might be too big of a shock. Cummings arrives at these >> principles by trial-and-error, intuition, and experience. He also >> does not call the concepts by strange names like "active object >> (actor) pattern" or "reactive programming". > >He also gets something /seriously/ wrong. His model of "a typical >thread" on page 4 is completely incorrect - and his whole argument >breaks down because of that. > >/Some/ threads are event driven services - they wait for messages coming >in, process them, and then go back to waiting for another message. This >is basically a cooperative multi-tasking system - the sort of thing we >had to use before multi-threading and multi-tasking OS's were suitable >for small embedded systems. (It's also what we had in Windows for long >/after/ every other "big" OS was multi-tasking.) > >Threads like that can be a very useful structure, and can be very >efficient ways to model certain types of problem. They are excellent >for handling user interaction, and are thus very popular for the main >thread in gui programs on desktops. They are also marvellous as pretty >much the only thread (baring perhaps timers and interrupts) on many >embedded systems - in particular, you know that when the thread is in >the "Process message 1" part, it can't be in the "Process message 2" >part, and thus you can avoid a great deal of locking or other >synchronisation. > >But a key point is that a thread like that should avoid (or at least >minimise) any blocking or locking, and usually it should avoid any code >that takes a long time to run (possibly by breaking it up, using a state >machine).
To be fair, message passing does not force a traditional cooperative tasking model. It just requires that any mutable object only be accessible from a single message queue handler thread.
On Thu, 02 Aug 2018 16:14:47 +0300, upsidedown@downunder.com wrote:

>On Thu, 02 Aug 2018 12:24:49 +0200, David Brown ><david.brown@hesbynett.no> wrote: > >>/Some/ threads are event driven services - they wait for messages coming >>in, process them, and then go back to waiting for another message. This >>is basically a cooperative multi-tasking system - the sort of thing we >>had to use before multi-threading and multi-tasking OS's were suitable >>for small embedded systems. (It's also what we had in Windows for long >>/after/ every other "big" OS was multi-tasking.) > >Windows NT 3.5 and later on was a full blown priority based >multitasking system with close resemblance to RSX-11 and VMS.
Windows NT 3.1, actually.
On Thu, 02 Aug 2018 12:24:49 +0200, David Brown
<david.brown@hesbynett.no> wrote:

>But a key point is that a [event driven thread] should avoid (or at least >minimise) any blocking or locking, and usually it should avoid any code >that takes a long time to run (possibly by breaking it up, using a state >machine). > >Other threads can have completely different structures. In particular, >they /can/ have blocking operations because it is fine for them to block >- the blocking operations are part of the normal flow of the sequential >processes that are clear in the code and easy to track. That might mean >you have more threads than you would otherwise need, and a corresponding >loss in overall system efficiency, but that is the price you pay for >better modularisation and clearer code flow. And you can have a range >of different types of synchronisation - using whatever makes the best >balance between efficiency, clear coding, and provably correct >synchronisation (i.e., you can be absolutely sure you have no deadlocks, >livelocks, etc.).
One serious problem is that too many programmers are much better at figuring out what CAN be done in parallel than they are at figuring out what SHOULD be done in parallel. Having too many threads generally is worse than having too few.
>[Cummings] also discusses the idea of the call to thread B being essentially >non-blocking, and with B sending its response as a message back to A. >But for some reason he says that should be a different message queue >that A would block on, giving the same problem but with a far more >complex structure - I can't see why anyone would suggest that as a >possible solution.
I [am not sure but] am thinking that Cummings was intending to demonstrate blocking RPC by forcing B's reply to be a higher priority than other messages waiting for A.
>The response from B should either go into the main >event queue for A (perhaps this queue would need to be prioritised), or >(if the OS has the support) it could go in a separate queue with A >waiting for /either/ queue, maybe with a priority mechanism.
As you alluded, the difficulty is that not all systems offer priority aware message queues. E.g., VxWorks for a long time offered only FIFO message queues. If you needed priority messaging, you had to use multiple queues or implement something yourself. Moreover, at the time I was fighting with it [circa v4.x], message queues could not be monitored using select(). It was a real PITA juggling multiple queues and multiple network connections.
>Really, most or all of the problems he sees could be solved by saying >"/Design/ your multi-threaded system carefully, and don't be afraid of >splitting tasks into new threads" rather coding first and thinking >later, programming by trial and error, and using hacks instead of >re-structuring as needed.
As I said above, the problem is figuring out what should be split off into a thread vs what can be.
>Exchanging your toolbox full of hammers for a toolbox full of >screwdrivers does not make for good software development.
+1 George
On 02/08/18 19:15, George Neuner wrote:
> One serious problem is that too many programmers are much better at > figuring out what CAN be done in parallel than they are at figuring > out what SHOULD be done in parallel. > > Having too many threads generally is worse than having too few.
:) For "embarrassingly parallel" applications such as telecom systems, 1-4 "worker" threads per core is a good starting point.
(Please get a real newsreader and a real newsserver, instead of using 
Google Groups for posting.  GG screws up Usenet formatting and line 
endings, so the result is that when replying to you, other people have 
to choose between messed up text quotation, or messed up code quotation. 
  You'll also find it far better for Usenet access.)

On 02/08/18 18:12, StateMachineCOM wrote:
> On Thursday, August 2, 2018 at 6:24:52 AM UTC-4, David Brown wrote: >> Other threads can have completely different structures. In >> particular, they /can/ have blocking operations because it is fine >> for them to block - the blocking operations are part of the normal >> flow of the sequential processes that are clear in the code and >> easy to track. > > There is no denying that sequential solution to a sequential problem > is the simplest and most efficient. For example (which I provide here > for the benefit of the whole NG), if you have a thread that must > handle a sequence of events ABC, you might hard-code the sequence in > the following pseudo-code: > > wait_for_semaphore_signaling_evt_A(); process_evt_A(); > wait_for_queue_signaling_evt_B(); process_evt_B(); > wait_for_evt_flag_signaling_evt_C(); process_evt_c();. > > But the problem is that *most* of real-life problems are NOT > sequential.
So use sequential-style coding for sequential problems, and use other styles in other situations.
> So, in the example above, later in the development cycle > it might become clear that the thread also needs to handle a (perhaps > rare) sequence of events ABBA. At this point, the thread has to be > completely re-designed, perhaps in the following way: > > wait_for_semaphore_signaling_evt_A(); process_evt_A(); > wait_for_queue_signaling_evt_B(); process_evt_B(); > wait_for_queue_signaling_evt_BC(); switch (evt_type) case B: > process_evt_B(); break; case C: process_evt_C(); break; } > > At this point, the thread structure becomes a "hybrid" of sequential > and "event-driven".
If the logic is hybrid, then use a hybrid code structure. The idea that there is only one correct way of writing your code is simply /wrong/.
> Specifically, B can be followed by another B or > C, which requires a more generic OS mechanism to wait for both B and > C *simultaneously* (which most likely is a queue rather than > event-flag).
Lots of RTOS's have support for waiting for multiple objects. Failing that, it is not hard to solve such challenges in other ways. You don't have to use a queue, and you don't have to re-structure everything around the possibility that one part of your code might have to wait for two different things.
> Moreover, downstream of the generic wait for both B and > C, the code needs to check which one actually arrived (hence the > 'switch'). > > The main point is that people (including Dr. Cummings) have observed > that sequential code almost always degenerates that way, so they > propose a simple, generic thread structure that is *flexible* to > accommodate any event sequence, which is the event-loop structure. > Cummings' article stops at this, but of course real-life threads must > "remember" certain event sequences, because the behavior depends on > it. For example, if you build a vending machine, events A, B, and C > might represent "product selection", "payment", "product dispense". > The sequence is obviously important and other sequences should not be > allowed (e.g., AC -- selection and dispensing, but without payment). > Here is where state machines come in, but this discussion is perhaps > for another time.
The standard "vending machine" solution is done in a /completely/ different manner - it is the running example in the classic book "Communicating Sequential Processes" which has a far neater, clearer and more efficient handling of multiple threads than that paper. The book is available freely online - I would recommend it.
> >> The "problem" described on pages 5-6 of the (Cummings') article >> stem from two errors by the author. One is to mix up the >> structures and responsibilities of the threads... > > This is misunderstanding of the main premise of the article. Having > found a generic thread structure (event-loop), Cummings (implicitly) > assumes that this, and only this, generic structure is allowed.
Why? There is no justification for such an assumption - implicitly or not. He has picked a single thread structure and somehow decided that's the only one to use, and further that there should only be two threads in his system no matter how he changes the tasks at hand. All the problems he gets are due to his poor choice of structure and stubbornness in sticking to it. It's not that I misunderstand the premise of the article - it's simply that I think it is a pointless premise. It would be like deciding that I will program in C, but all my loops will be "do {...} while ()" loops - and then complaining that simple counting loops look ugly.
> He > then moves on to explaining how to use the thread structure correctly > and how NOT to use it incorrectly.
No, he goes on to talk about how to get something working - ugly, unclear and separate from the logical flow of the task at hand, but working - despite the self-imposed limitations.
> The event-driven thread structure > is so superior to sequential code that he doesn't even consider that > someone might still revert to the "old".
When I first looked at the paper, I thought maybe he had a point. More thought, and the discussion here, has convinced me that he does not. No, event-driven thread structures are /not/ superior. They do have their uses and there are cases where they work well - fortunately I knew that before looking at the paper, because his efforts to force their usage inappropriately could easily convince people that they are always a poor choice.
> It is a bit like a > programmer once exposed to structured programming will typically not > consider going back to GOTOs.
Think a little bit harder, and you will see that it is the event-driven model that is ending up as GOTOs. It is not even managing a good old-fashioned GOSUB from BASIC years - you have to use global state variables to keep track of where you have been and where you are going.
> >> This is basically a cooperative multi-tasking system - the sort of >> thing we had to use before multi-threading and multi-tasking OS's >> were suitable for small embedded systems. (It's also what we had >> in Windows for long /after/ every other "big" OS was >> multi-tasking.) > > This might be another misconception, which might be coming from the > historical baggage surrounding event-driven systems.
I know what event-driven systems are.
> The generic > thread structure recommended in Cummings' article *combines* > preemptive multitasking with event-driven thread structure. Threads > don't need to "cooperate" to yield the CPU to each other.
I realise that. But it is much the same structure as for pre-emptive multi-tasking.
> Instead, > any longer processing in a low-priority thread can be preempted (many > times if need by) by higher-priority threads. This is determined by > the preemptive kernel and its scheduling policy, without any explicit > coding on the application developer part. > > >> The most amazing thing to me is that so many people think they know >> the "best" method. > > Indeed, if you stick with the "sequential programming based on > shared-state concurrency and blocking", there is no "best" method.
No, if you do proper development you understand there is no "best" method. Sometimes event-driven threads /are/ the best choice, or part of the best choice - but they are most certainly not /always/ the best choice. Please understand what I am saying here - I have nothing against event-driven threads used in the right place. I have everything against using them in the wrong place, or thinking that they are always the right choice.
> You need to devise the structure of each and every thread from > scratch, carefully choosing your blocking mechanisms (semaphore vs. > queue vs. event-flags vs. select, etc.).
Yes.
> You then need to worry about > race conditions and carefully apply mutual exclusion.
Yes. You need to /design/ your code - you need to plan it, you need to be willing to change it to handle different requirements if that becomes necessary. You need to separate your code into clearly defined parts, and understand their interactions. This is /always/ the case. You can't just say "This guy said in a paper that event-driven threads were magic. So we'll just use them instead of thinking".
> The blocking > threads tend to be unresponsive,
Nonsense. If your threads are not as responsive as you need, you will have to fix the design (or perhaps it can't be handled by the hardware you have). That applies whatever kind of blocking your threads have.
> so to evolve the system you need to > keep adding new threads to be able to handle new event sequences. > This proliferation of threads leads to more sharing, because now two > threads that wanted to be one need to share large data structures.
You are inventing problems faster than that paper's author did.
> > Alternatively, you can choose to work at a *higher level of > abstraction*, with encapsulated event-driven threads (active > objects). The threads wait generically on event-queue at the top of > the loop and don't block otherwise. This allows the threads to remain > *responsive*, so adding new events is easy. This also means that such > threads can handle much more functionality than sequential threads. > This reduces the need for sharing ("share nothing" principle). And > finally, this thread structure offers high-enough level of > abstraction and the *right* abstraction to apply event-driven state > machines, graphical modeling, code generation, and other such modern > programming techniques. >
Nonsense. If you think your magic thread structures have eliminated sharing and synchronisation, or the need to think carefully about your design, you have misunderstood everything about multi-threaded coding. Using different structures, or different synchronisation primitives, does not change /anything/ about the fundamentals of what data passes around, what synchronisation is needed, and what parts need to wait for which other parts. It changes the details, and the choices can make a big difference on how clear and simple the code is, how efficient it is, and how much effort it takes to write, read and maintain the code. Event-driven threads are the best choice in some cases, and the worst choice in others as your mess of state variables means your logic is spread out all over the place instead of a clear, neat code flow.
On 02/08/18 19:14, Tom Gardner wrote:
> On 02/08/18 17:56, upsidedown@downunder.com wrote: >> All you need is the ability to have stacks in RAM and in addition >> instructions for loading and storing the stack pointer from/to memory. > > Unfortunately that is difficult to implement in C, so youngsters > don't think of it. >
This particular youngster grew up on small microcontrollers programmed in assembly. "All you need is the RAM" is not helpful when you have 512 bytes in total, and the stack pointer is limited to accessing the first 128 bytes of that. I have worked on microcontrollers where the context switches involved in handling an interrupt could easily take 50 &micro;s or more - a proper RTOS would be far too high overhead. Of course you can have more minimal OS's with very limited features (all the way down to "protothreads"), and perhaps cooperative multi-tasking rather than pre-emptive. But then you are not going to have threads with multiple message queues like the ones under discussion.
On 02/08/18 19:23, Robert Wessel wrote:
> On Thu, 02 Aug 2018 16:14:47 +0300, upsidedown@downunder.com wrote: > >> On Thu, 02 Aug 2018 12:24:49 +0200, David Brown >> <david.brown@hesbynett.no> wrote: >> >>> /Some/ threads are event driven services - they wait for messages coming >>> in, process them, and then go back to waiting for another message. This >>> is basically a cooperative multi-tasking system - the sort of thing we >>> had to use before multi-threading and multi-tasking OS's were suitable >>> for small embedded systems. (It's also what we had in Windows for long >>> /after/ every other "big" OS was multi-tasking.) >> >> Windows NT 3.5 and later on was a full blown priority based >> multitasking system with close resemblance to RSX-11 and VMS. > > > Windows NT 3.1, actually. >
Yes, of course. Win NT 3.1 was rarely seen in the wild, but it did exist.
On 02/08/18 20:21, Tom Gardner wrote:
> On 02/08/18 19:15, George Neuner wrote: >> One serious problem is that too many programmers are much better at >> figuring out what CAN be done in parallel than they are at figuring >> out what SHOULD be done in parallel. >> >> Having too many threads generally is worse than having too few. > > :) > > For "embarrassingly parallel" applications such as telecom > systems, 1-4 "worker" threads per core is a good starting > point.
Ah, that's a different matter. Here you are talking about threads that do a job, send out a result, and then close down. (Usually for efficiency you have a thread pool, and it is a "job" object that is activated, run, and closed down, rather than the whole thread. But logically, it is the same.) It doesn't matter which of these worker threads is running at any time, you simply want to make efficient use of the cpu resources and have everything completed in the end. In an RTOS we are usually talking about threads that need to be alive at the same time, spend most of their time blocked somewhere, and which need to communicate and be able to wake each other. You typically only have one cpu core, but you might have dozens of threads.
@David Brown: As they say: "you can lead a horse to water, but you can't make it drink". I rest my case.


On 02/08/18 17:12, StateMachineCOM wrote:
> At this point, the thread structure becomes a "hybrid" of sequential and "event-driven". Specifically, B can be followed by another B or C, which requires a more generic OS mechanism to wait for both B and C*simultaneously* (which most likely is a queue rather than event-flag). Moreover, downstream of the generic wait for both B and C, the code needs to check which one actually arrived (hence the 'switch').
Of course some processors have that capability in their hardware, e.g. select { // suspend until one of these events occurs and then // resume with 10ns latency message from channel A: do_this (); break; message from channel B: do_that(); break; timeout: scram_reactor(); break; input on port C: read it and record time it arrived break; output completed on port D: do next output on port D at 12.96us after last output on port D break; } (RTOS? What RTOS?)