EmbeddedRelated.com
Forums
Memfault Beyond the Launch

Task, process, thread, etc.

Started by Don Y March 31, 2021
On 4/1/2021 5:59 AM, Dimiter_Popoff wrote:
> Back then I called a "task" what is a ...task, code running > with its own stacks (user/system) and being put into use > by the scheduler.
This is what I've called a thread. Thread's are the only things that execute code. A thread executes code on a hardware *processor*. (a processor only supports a single thread -- more later).
> Whether it runs on this or that core is irrelevant.
Ditto. Threads are schedulable entities, regardless of which *processor* they run on (again, using my terminology)
> And I took a decision to call a "process" > a group of tasks having the same common data section > (each task points to one).
This is roughly similar to what I call a task -- resources exist in RAM (memory, stack, thread state, "file handles", etc.). But, additionally, I consider resources things like scheduling parameters, access permissions (capabilities), etc.
> Obviously this is very different > to what people would think of as a "process" on other > systems where they call a process what I call a task, > I think interchangeably. So I try to phase that term out > by not using it.
The problem with "process" is that many folks think of the one-thread, one-process computing model (of days gone by). Hence, my desire to avoid "process" for fear of it conjuring up a single execution entity.
> A thread is a dangerous term to use, to me it means a thread > in a multi-threaded processor core. Which is much more > hardware than software related, a virtual core can just run > yet another task to another virtual core within the same > physical one.
I call the hardware the "runs" a thread's code a processor. A "core" can contain multiple processors. A "host" can contain multiple cores. A node can contain multiple hosts. E.g., a node is likened to a PCB with some number of "CPUs" on it. Each CPU "chip" is a host. Each host can contain more than one core -- each of which can contain more than one processor. The distinction is important because I treat each of these things as formal objects. Code can manipulate those objects if the code has the proper "permissions" (capabilities). So, a thread running *somewhere* can diddle with the scheduler for a particular *processor* -- even if it doesn't reside on the same core, host or node! In this way, code can bind specific "threads" to specific processors in specific cores on specific hosts on specific nodes, etc. Likewise, if I want to shutdown a node and migrate all of its resources and threads to some *other* node, I can just stall the scheduler(s) on the node and wait for everything to naturally (or, maybe UNnaturally?) idle. Then, while quiescent, move the resources over to some other node that I've preselected. And, finally, remove power (or otherwise repurpose) the original node.
> Other than that I see no need for other names. Keeping it > simple is nice, especially when it comes down to the > basics like the ones I talk about here.
See above.
> There are more complex things to think and talk about > of course, but these I leave to being various "objects" > (dps has its inherent runtime system of objects) and to > whatever "actions" you can "do" by/with these (I don't > use the words "methods" and "apply", I just did not know > these when I wrote the first implementation of the dps > object system back around 1995, and my words seem to > better describe what I have written anyway).
In my world, EVERYTHING is an object! Threads, tasks, processors, cores, nodes, hosts, memory, etc. I deal with them via "handles" that the OS manages for tasks (tasks own resources, the threads don't!). In each handle is a reference to a particular object and a set of methods that the handle-holder can invoke on the object (capabilities). This is enforced client-side, by the OS. So, you can't waste an object's "time" by sending spurious ILLEGAL requests to it (DoS attack) and expecting it to decide if your request is "allowed"; the kernel will ensure that you waste *your* time doing that (and time is a resource that the kernel tracks; abuse yours and you cease to exist -- but the intended "victim" object is unaffected)
> Wait a second, you may be asking from an end user perspective. > Well, my reply does not apply then... :-). I thought if > people like the population of this group while I wrote it...
No, you answered in the spirit I intended! :>
On 4/1/2021 6:23 AM, antispam@math.uni.wroc.pl wrote:
> Don Y <blockedofcourse@foo.invalid> wrote: >> I've been refactoring some of my RTOS documentation. Comments from >> the reviewers suggest there's still some confusion as to terms >> (despite the fact that I explicitly define them! :< ) >> >> All seem to understand the notion of a "thread". >> >> And, to a lesser extent, an "application" (this one's a bit >> harder as there's often no clear-cut distinctions; do you >> tie it to a "pre-packaged set of algorithms"). >> >> I had opted to use "task" instead of "process" to describe >> resource containers. Too many folks with single-threaded >> process experience brought that baggage to their understanding. >> "Task" lets me avoid that. > > There are still a lot of folks who were introduced to > basic OS concepts on MVS.
But, are they writing code for embedded systems? Have they not noticed the *differences* between the execution environments encountered on those "devices" and big iron? Like most, I learned to code in an environment that made it LOOK like I had sole control of the machine; if other jobs were running on it, I didn't know, or care -- my code was entirely self contained; it didn't try to reach out to other co-executing "programs". Nor did I realize that there were other "routines" executing that enabled disk access, user I/O, etc. OTOH, moving into the embedded world, you "saw" EVERYTHING! Because you *wrote* everything! The notion of just reading a UART when you needed a character from the user was quickly dispelled by the reality of *losing* characters! (oops!)
> And there "task" is what > currently is called "thread". I dare to say that > fraction of people who saw "multihtread task" is > much lower than fraction of people who saw "multithreaded > process". Anyway, IMO "Task" much stronger suggest > single thread than process.
The fact that multithreaded process "makes sense" as a notion indicates folks are used to NON-multithreaded process! :> Regardless, you (all) can see the sort of confusion that exists among terms that we tend to think are "standardized". I don't see any way out other than just formal definition and relying on the user (developer) to read the documentation. "Standards are great! Everyone should have one!" :-)
Don Y <blockedofcourse@foo.invalid> wrote:
> On 4/1/2021 6:23 AM, antispam@math.uni.wroc.pl wrote: > > Don Y <blockedofcourse@foo.invalid> wrote: > >> I've been refactoring some of my RTOS documentation. Comments from > >> the reviewers suggest there's still some confusion as to terms > >> (despite the fact that I explicitly define them! :< ) > >> > >> All seem to understand the notion of a "thread". > >> > >> And, to a lesser extent, an "application" (this one's a bit > >> harder as there's often no clear-cut distinctions; do you > >> tie it to a "pre-packaged set of algorithms"). > >> > >> I had opted to use "task" instead of "process" to describe > >> resource containers. Too many folks with single-threaded > >> process experience brought that baggage to their understanding. > >> "Task" lets me avoid that. > > > > There are still a lot of folks who were introduced to > > basic OS concepts on MVS. > > But, are they writing code for embedded systems? Have they > not noticed the *differences* between the execution environments > encountered on those "devices" and big iron?
Significant part in sucessfull solution of new problem is using similarities to old, solved problems. And of course dealing with differences. And low-level problems did not change that much from mainframe times.
> Like most, I learned to code in an environment that made it > LOOK like I had sole control of the machine; if other jobs > were running on it, I didn't know, or care -- my code was > entirely self contained; it didn't try to reach out to > other co-executing "programs". > > Nor did I realize that there were other "routines" executing > that enabled disk access, user I/O, etc. > > OTOH, moving into the embedded world, you "saw" EVERYTHING! > Because you *wrote* everything! The notion of just reading a UART > when you needed a character from the user was quickly dispelled > by the reality of *losing* characters! (oops!)
I had grown in environment where getting access to computer books was much easier than getting access to computers. One of my first introductions was translation of IBM book from IIRC 1962. The reality described in this book was long gone whan I read it, but since book described concepts it applied reasonably well to changed world. Significant part of my early experience was on ZX Spectrum: it had no hardware UART and it was pretty clear that to get characters you need to poll at right time, otherwise bits will be lost (I did not try real UART, but I played a bit with tape reading procedure). -- Waldek Hebisch
On 4/2/2021 3:46 PM, antispam@math.uni.wroc.pl wrote:
> Don Y <blockedofcourse@foo.invalid> wrote: >> On 4/1/2021 6:23 AM, antispam@math.uni.wroc.pl wrote: >>> Don Y <blockedofcourse@foo.invalid> wrote: >>>> I've been refactoring some of my RTOS documentation. Comments from >>>> the reviewers suggest there's still some confusion as to terms >>>> (despite the fact that I explicitly define them! :< ) >>>> >>>> All seem to understand the notion of a "thread". >>>> >>>> And, to a lesser extent, an "application" (this one's a bit >>>> harder as there's often no clear-cut distinctions; do you >>>> tie it to a "pre-packaged set of algorithms"). >>>> >>>> I had opted to use "task" instead of "process" to describe >>>> resource containers. Too many folks with single-threaded >>>> process experience brought that baggage to their understanding. >>>> "Task" lets me avoid that. >>> >>> There are still a lot of folks who were introduced to >>> basic OS concepts on MVS. >> >> But, are they writing code for embedded systems? Have they >> not noticed the *differences* between the execution environments >> encountered on those "devices" and big iron? > > Significant part in sucessfull solution of new problem > is using similarities to old, solved problems. And of > course dealing with differences. And low-level problems > did not change that much from mainframe times.
"Similarities". I.e., things change in subtle ways. This is fine for a bird's eye view of a problem/solution. But, when you have to be specific (e.g., in documenting the solution), subtleties make a difference. You don't want folks coming away with the wrong understanding: "Oh, I thought you meant..."
>> Like most, I learned to code in an environment that made it >> LOOK like I had sole control of the machine; if other jobs >> were running on it, I didn't know, or care -- my code was >> entirely self contained; it didn't try to reach out to >> other co-executing "programs". >> >> Nor did I realize that there were other "routines" executing >> that enabled disk access, user I/O, etc. >> >> OTOH, moving into the embedded world, you "saw" EVERYTHING! >> Because you *wrote* everything! The notion of just reading a UART >> when you needed a character from the user was quickly dispelled >> by the reality of *losing* characters! (oops!) > > I had grown in environment where getting access to computer > books was much easier than getting access to computers.
My first "hands on" experience was with an ASR-33 and acoutic modem -- saving programs on paper tape. From there, I moved *up* (?) to punching cards for batch submission on a small IBM box. At school ("university"), a wide collection of machines which had been designed by professors and grad students -- so, there were lots of similarities... and lots of differences. Then, to i4004 in industry. 8080 quickly followed by 8085. Then, Z80, 6809, 1802, 2650, 68000, 16032, etc. Again, lots of similarities between them -- but, also, many differences. And, of course, the type of code I was writing was no longer at a very high level of abstraction (as it was on a Hollerith card).
> One of my first introductions was translation of IBM book > from IIRC 1962. The reality described in this book was > long gone whan I read it, but since book described concepts > it applied reasonably well to changed world. Significant
But concepts that were "novel" are now not just passe but actually obsolete in the day-to-day vernacular. When was the last time someone talked about a DASD? VTOC? IPL? WCS? etc.
> part of my early experience was on ZX Spectrum: it had no > hardware UART and it was pretty clear that to get characters > you need to poll at right time, otherwise bits will be lost > (I did not try real UART, but I played a bit with tape > reading procedure).
So, for example, trying to recount your experiences to a "youngster" today would land on deaf ears; they wouldn't understand the concepts nor issues you faced.
On 4/1/2021 6:39 AM, Don Y wrote:
> On 4/1/2021 5:59 AM, Dimiter_Popoff wrote: >> Back then I called a "task" what is a ...task, code running >> with its own stacks (user/system) and being put into use >> by the scheduler. > > This is what I've called a thread. Thread's are the only things > that execute code. A thread executes code on a hardware *processor*. > (a processor only supports a single thread -- more later). > >> Whether it runs on this or that core is irrelevant. > > Ditto. Threads are schedulable entities, regardless of which > *processor* they run on (again, using my terminology) > >> And I took a decision to call a "process" >> a group of tasks having the same common data section >> (each task points to one). > > This is roughly similar to what I call a task -- resources > exist in RAM (memory, stack, thread state, "file handles", > etc.). But, additionally, I consider resources things like > scheduling parameters, access permissions (capabilities), etc. > >> Obviously this is very different >> to what people would think of as a "process" on other >> systems where they call a process what I call a task, >> I think interchangeably. So I try to phase that term out >> by not using it. > > The problem with "process" is that many folks think of the one-thread, > one-process computing model (of days gone by). Hence, my desire to > avoid "process" for fear of it conjuring up a single execution entity.
(sigh) I've been overruled (despite it being *my* codebase! :< ) OTOH, there are more of *them* than there are of *me* so it's probably easier for me to adapt to their desired terminology than stick with mine! As long as they are willing to take responsibility for understanding the distinction(s)! Process = resource container. Now, "task" has no meaning. (I'm not sure introducing it in place of "thread" would be well received; But, I can try!) I'll probably wait to edit all the references, function labels, etc. until I get a final verdict on all this...
On 4/4/2021 23:15, Don Y wrote:
> On 4/1/2021 6:39 AM, Don Y wrote: >> On 4/1/2021 5:59 AM, Dimiter_Popoff wrote: >>> Back then I called a "task" what is a ...task, code running >>> with its own stacks (user/system) and being put into use >>> by the scheduler. >> >> This is what I've called a thread.&nbsp; Thread's are the only things >> that execute code.&nbsp; A thread executes code on a hardware *processor*. >> (a processor only supports a single thread -- more later). >> >>> Whether it runs on this or that core is irrelevant. >> >> Ditto.&nbsp; Threads are schedulable entities, regardless of which >> *processor* they run on (again, using my terminology) >> >>> And I took a decision to call a "process" >>> a group of tasks having the same common data section >>> (each task points to one). >> >> This is roughly similar to what I call a task -- resources >> exist in RAM (memory, stack, thread state, "file handles", >> etc.).&nbsp; But, additionally, I consider resources things like >> scheduling parameters, access permissions (capabilities), etc. >> >>> Obviously this is very different >>> to what people would think of as a "process" on other >>> systems where they call a process what I call a task, >>> I think interchangeably. So I try to phase that term out >>> by not using it. >> >> The problem with "process" is that many folks think of the one-thread, >> one-process computing model (of days gone by).&nbsp; Hence, my desire to >> avoid "process" for fear of it conjuring up a single execution entity. > > (sigh)&nbsp; I've been overruled (despite it being *my* codebase!&nbsp; :< ) > > OTOH, there are more of *them* than there are of *me* so it's probably > easier for me to adapt to their desired terminology than stick with mine! > As long as they are willing to take responsibility for understanding > the distinction(s)! > > Process = resource container. > > Now, "task" has no meaning.&nbsp; (I'm not sure introducing it in place > of "thread" would be well received; But, I can try!) > > I'll probably wait to edit all the references, function labels, etc. > until I get a final verdict on all this...
Don, I think that since you see everything as an object you can afford more descriptive names, not necessarily made of one word. I do that with objects in DPS. Here is how they can go. The simplest object - the root of all objects - is called "something", i.e. everything is something, LOL. Then there is a "piece of memory", which carries more information about itself. Then there is a "generic object", which carries some more information which one will need anyway - like how to set/get a parameter in a standardized way, where it is allocated/who has allocated it (that would be its "container" - for example, a "memory pool" which is a "piece of memory"). And from there you go further, you have a "file reference", "directory reference", "directory view details", "directory view icons" etc. etc., the point is the names can be self explanatory. Now I never had the need to treat a task as an object itself but obviously one can think of a more self explanatory word for that, too. I have tasks referred to by objects - say, the ip_link refers to the IP input task (defragmentation etc. sort of thing), the tcp_connection object refers to a tcp input task (reordering/linking incoming segments etc.). I suppose "tcp_connection" is a nice example of a self explanatory name. You could call what you have as a "task" (which you call a "thread") say a "running program" or something (just trying to give an example, I don't like it particularly well, just can't come up with anything better but I am sure you can manage that if you give it some more time). Dimiter ====================================================== Dimiter Popoff, TGI http://www.tgi-sci.com ====================================================== http://www.flickr.com/photos/didi_tgi/
Hi Dimiter,

On 4/4/2021 2:03 PM, Dimiter_Popoff wrote:
> On 4/4/2021 23:15, Don Y wrote:
>>> The problem with "process" is that many folks think of the one-thread, >>> one-process computing model (of days gone by). Hence, my desire to >>> avoid "process" for fear of it conjuring up a single execution entity. >> >> (sigh) I've been overruled (despite it being *my* codebase! :< ) >> >> OTOH, there are more of *them* than there are of *me* so it's probably >> easier for me to adapt to their desired terminology than stick with mine! >> As long as they are willing to take responsibility for understanding >> the distinction(s)! >> >> Process = resource container. >> >> Now, "task" has no meaning. (I'm not sure introducing it in place >> of "thread" would be well received; But, I can try!) >> >> I'll probably wait to edit all the references, function labels, etc. >> until I get a final verdict on all this... > > Don, > I think that since you see everything as an object you can afford > more descriptive names, not necessarily made of one word.
Yes, of course. But, some things (objects) already have a "naming history" that people are comfortable with. Call a thread an "execution unit"? The processor on which it executes an "executor unit"? <grin> Would it be strained to call a process a "Resource Container"?
> I do that with objects in DPS. Here is how they can go. > The simplest object - the root of all objects - is called > "something", i.e. everything is something, LOL. > Then there is a "piece of memory", which carries more information > about itself.
I have "memory objects" which are treated as entities. E.g., I can pass a memory object to a function (or process; that is, thread who will act on it in a process) in much the same way that I can pass an "int". There are "exception handlers", "deadline handlers", "deadlines", "exceptions", "scheduling criteria", etc. These are easier to name because there's no "legacy" that you have to overcome.
> Then there is a "generic object", which carries some more information > which one will need anyway - like how to set/get a parameter in a > standardized way, where it is allocated/who has allocated it (that > would be its "container" - for example, a "memory pool" which is > a "piece of memory"). And from there you go further, you have a > "file reference", "directory reference", "directory view details", > "directory view icons" etc. etc., the point is the names can be > self explanatory. Now I never had the need to treat a task as an > object itself but obviously one can think of a more self explanatory > word for that, too. I have tasks referred to by objects - say, > the ip_link refers to the IP input task (defragmentation etc. sort > of thing), the tcp_connection object refers to a tcp input > task (reordering/linking incoming segments etc.).
A process is referenced as: process_t MyProcess; Using MyProcess in a method that is defined for a process_t will invoke the code associated with the method *on* that process_t.
> I suppose "tcp_connection" is a nice example of a self explanatory > name. You could call what you have as a "task" (which you call > a "thread") say a "running program" or something (just trying to > give an example, I don't like it particularly well, just can't > come up with anything better but I am sure you can manage that > if you give it some more time).
I've found coming up with MEANINGFUL names for things is tiring; There are many concepts/objects that are incredibly similar. So, you end up finessing terms that could easily be interchanged for each other (yet can't as they are actually different things) The biggest problem comes in documentation (once folks are USING things, they learn what those things mean and how they are to be used). You don't want to have to employ qualifiers on generic terms: - an RPC can take no arguments and NOT return a result - an RPC can take no arguments and return a result - an RPC can take arguments and NOT return a result - an RPC can take arguments and return a result - an RPC that doesn't return a result can "return" without confirmation that the remote procedure has actually been invoked! [And, substitute IPC for RPC] Do you add these qualifiers to each use of the "RPC" term in the documentation? Or, do you come up with different terms for each *type* of RPC? A process (Resource Container) can contain any number of threads, including zero. If you're discussing the process in the context of being a server for a particular class of objects, do you refer to it as a single-threaded server? Multiple-threaded? etc. Etc. BTW, Happy Easter! (I think you may still have a few hours left...)
On 4/4/2021 2:42 PM, Don Y wrote:

> Yes, of course. But, some things (objects) already have a > "naming history" that people are comfortable with. > > Call a thread an "execution unit"? The processor on > which it executes an "executor unit"? <grin> > > Would it be strained to call a process a "Resource Container"?
I floated that idea... and it was **promptly** shot down! Amusingly, folks like to talk about a process as if it is an active entity - even KNOWING that the process is the container FOR the active entities and not the executing code! And, the notion of using "Resource Container" in a sentence as if it was in any way "active" fell on deaf ears. <shrug> It's amusing to see the implicit baggage that comes with our choices of terms! (admit defeat; tweak the code/docs and be done with it!)
On 4/5/2021 8:29, Don Y wrote:
> On 4/4/2021 2:42 PM, Don Y wrote: > >> Yes, of course.&nbsp; But, some things (objects) already have a >> "naming history" that people are comfortable with. >> >> Call a thread an "execution unit"?&nbsp; The processor on >> which it executes an "executor unit"?&nbsp; <grin> >> >> Would it be strained to call a process a "Resource Container"? > > I floated that idea... and it was **promptly** shot down! > > Amusingly, folks like to talk about a process as if it is an > active entity - even KNOWING that the process is the container > FOR the active entities and not the executing code! > > And, the notion of using "Resource Container" in a sentence > as if it was in any way "active" fell on deaf ears. > > <shrug>&nbsp; It's amusing to see the implicit baggage that comes > with our choices of terms! > > (admit defeat; tweak the code/docs and be done with it!)
I am not surprised at that :-). My reaction was similar. I call a "container" objects which have allocated the memory for another object - so when an object is told to "getlost" it asks its container to deallocate it apart from whatever else it has to do (close file(s), connection(s) etc.). But I think of "container" as of a jar with a lid you know. If I understand what you want to name is a group of tasks (which you call threads); why not just call it "group of tasks" or something. Then gradually let the language migrate towards just "group"on its own, that is in a natural way. Dimiter
On 4/5/2021 12:12 PM, Dimiter_Popoff wrote:
> On 4/5/2021 8:29, Don Y wrote: >> On 4/4/2021 2:42 PM, Don Y wrote: >> >>> Yes, of course. But, some things (objects) already have a >>> "naming history" that people are comfortable with. >>> >>> Call a thread an "execution unit"? The processor on >>> which it executes an "executor unit"? <grin> >>> >>> Would it be strained to call a process a "Resource Container"? >> >> I floated that idea... and it was **promptly** shot down! >> >> Amusingly, folks like to talk about a process as if it is an >> active entity - even KNOWING that the process is the container >> FOR the active entities and not the executing code! >> >> And, the notion of using "Resource Container" in a sentence >> as if it was in any way "active" fell on deaf ears. >> >> <shrug> It's amusing to see the implicit baggage that comes >> with our choices of terms! >> >> (admit defeat; tweak the code/docs and be done with it!) > > I am not surprised at that :-). My reaction was similar.
Yes. But, amusing as the *container* isn't an active entity! Yet, we speak of it as if it was -- as a proxy for the threads it contains! So, it's important that folks discussing an implementation have a shared lexicon; someone talking about "processes" may or may not be thinking in terms of "threads"!
> I call a "container" objects which have allocated the memory > for another object - so when an object is told to "getlost" > it asks its container to deallocate it apart from whatever > else it has to do (close file(s), connection(s) etc.). > But I think of "container" as of a jar with a lid you know. > > If I understand what you want to name is a group of tasks > (which you call threads); why not just call it "group of > tasks" or something. Then gradually let the language migrate towards > just "group"on its own, that is in a natural way.
It's not just a "group of threads" (trying to avoid mixing terms); it's a group of threads executing in an isolated address space sharing a set of capabilities for a set of objects and access to specific code/data... [By contrast, a (non-specific) "group of threads" can span multiple "processes" to solve a particular problem (hence "job").] It is the sharing and coupling of the threads in that "container" to which you want to draw attention. E.g., two such threads can stomp on each other's data -- hence the need for mutexes/locks or some other cooperative sharing algorithm. Two such threads can access the same set of resources (e.g., thread A can acquire a resource yet thread B can actually be the one who uses it without the need for any special "protocol" to exchange "ownership"). In the context of an object server, *any* thread *could* service a request for any object contained in its "process"; or, specific threads could be assigned to specific objects' requests; etc. But, a thread in ANOTHER *process* couldn't service that request (for THAT object)! *Within* a process, you have to be more disciplined, as a developer. But, you have greater leeway in terms of what you can do -- and get away with! The OS isn't playing policeman *inside* the process as it would *between* processes. Communication can be nearly instantaneous; you just agree as to how each thread will access a particular shared structure/buffer. No need for the OS to get involved in moving information across protection boundaries. All threads in a process are colocated on the same *host*. OTOH, a "group of threads" (cuz threads are the only things that actually *do* anything!) executing in different processes can reside on different hosts to achieve a common goal. It's all the hidden assumptions (what I call baggage) that leads to misunderstanding. If you have to resort to overly precise language, then it becomes difficult reading. :< Hence, trying to find terms that "feel" natural to people. If I was describing a particular *algorithm* (that is likely to evolve, over time), I could be a bit looser in my prose. But, when discussing fundamentals (which are likely invariant), there's less wiggle room. If I say "5", I mean *5*... not 4 or 6! <shrug> I'll post you a draft copy when I get some of the illustrations finished (busy picking oranges this past week+)

Memfault Beyond the Launch