Hi Don,
Thanks for your quick reply !
That was indeed a pretty long & an interesting explanation !!
Karthik
On Monday, 22 September 2014 02:34:45 UTC+5:30, Don Y wrote:
> Hi Karthik,
>
>
>
> On 9/21/2014 5:40 AM, Karthik Balaguru wrote:
>
> > Hi, Have few queries on the best possible software architecture.
>
>
>
> None -- without a clear definition of the application domain! :>
>
>
>
> > Processes are heavy weight and they appear to occupy more memory, more time
>
> > to create/start, increased latency during context switches and separate
>
> > memory space that necessitates heavy IPC mechanisms. Threads are light
>
> > weight and share memory space.
>
>
>
> The easiest way to think of the distinction is: threads are active entities
>
> (i.e., they are the "things" that "execute code"). Processes are containers
>
> that hold resources -- which can include (one or more) *threads*!
>
>
>
> I.e., a process is like its own little "machine" -- with its own memory,
>
> access priviledges, priorities (in the context of the "machine" in which it
>
> resides), etc.
>
>
>
> So, if the "system"/machine has certain shared resources (I/O devices, etc),
>
> it is the *process* that requests (by the actions of one of its threads)
>
> those resources and, eventually, gains ownership/access to it. (I.e., thread
>
> #1 in process A can request a resource and, when made available, thread #5
>
> in process A can *use* that resource -- but none of the threads in process B
>
> can, at that time)
>
>
>
> Given that processes contain threads (in this conceptualization), you can see
>
> why it is "more expensive" to switch processes than it is to switch threads.
>
>
>
> You can also see why two threads in a process can compete to access a resource
>
> THAT THE PROCESS OWNS (either because it was explicitly requested from "the
>
> system" by "some thread" in that process; OR it was implicitly granted to that
>
> process when the process was instantiated: e.g., "shared memory" IN the
>
> process's address space). You are glossing over the potential case where
>
> two or more PROCESSES have to compete in "the system" for other "shared
>
> resources".
>
>
>
> > However, I realized that threads also enter
>
> > into contention for resources/memory due to the shared resources among them
>
> > that inturn becomes a kind of bottle neck for multi-threaded architecture
>
> > but not for multiple process based architecture.
>
>
>
> It's still a bottleneck. If two or more processes want to share some data,
>
> they either do so via "shared memory" (assuming the OS supports this between
>
> processes) -- which requires SOME form of "access/contention resolution" -- or
>
> by a more expensive solution (e.g., IPC/RPC). In each case, SOMETHING is
>
> handling the fact that contention can exist.
>
>
>
> > Also the workaround for
>
> > having thread local storage does not seem to be straight forward. This also
>
> > makes me believe that maintaining multi-threaded application can be bit
>
> > complex compared to that of multiple process architecture.
>
>
>
> There is no concept of a thread's "(private) memory space" -- though you
>
> can easily arrange for this (e.g., each thread has its own pushdown stack!
>
> anything thread #1 does that is implemented via the stack is effectively
>
> private -- though a rogue thread can still scribble on it!).
>
>
>
> By contrast, each (single-threaded) process has a unique, disjoint memory
>
> space "guaranteed" by the OS at the process's instantiation (I am assuming
>
> you have a "real/nonTOY OS").
>
>
>
> > Also that the
>
> > performance of multi process architecture will be better due to separate
>
> > memory space (This avoids locking or serialization of execution in case of
>
> > multi process architecture) and this seems to take away the advantage of
>
> > less context switch time in case of multi-threaded application !!
>
>
>
> No. If there is no contention, there is no locking required beyond what
>
> is implicitly present when "thread #1" is scheduled to execute while the
>
> other threads are (temporarily) blocked.
>
>
>
> Contention has costs, period. You can structure your code so that these
>
> costs are minimized. E.g., in a consumer/producer model of sharing,
>
> the two threads never actually compete for the same "object" -- an
>
> object that is being produced is invisible to a consumer waiting to
>
> consume it! Likewise, an object that has BEEN produced is no longer
>
> of interest to its producer!
>
>
>
> Process model gives an (incorrect) illusion of greater separation
>
> only because it "makes sharing (between PROCESSES) harder". If
>
> you similarly impose the restrictions that different process spaces
>
> impose on your code (i.e., never compete for data for which you have
>
> no NEED to compete -- as if it was NOT POSSIBLE), then the costs
>
> of sharing are the same -- none.
>
>
>
> > Kindly let me know if this understanding is correct or correct with appropriate
>
> > inputs.
>
> >
>
> > I understand that the software architecture is mainly based on the type of
>
> > application/requirement. Considering the development environment as Linux OS
>
> > with C language on single core/multi-core processors, i would like to know
>
> > for which type of applications should we need to go in for multi-threaded
>
> > software architecture and for which type of applications should we need to
>
> > go in for multiple process based software architecture ? Is there any matrix
>
> > sheet that maps/lists the type of requirements/applications and the
>
> > possible software architecture for it ?
>
>
>
> (sigh) *BIG* (complex) question. Essentially, you have to look at the
>
> benefits of "tightly coupled" execution (threads) vs. more "loosely
>
> coupled" (processes). And, the overhead involved in each sharing case.
>
> Likewise, the potential for (the illusion of) concurrency and the
>
> periods involved.
>
>
>
> E.g., any time an "execution context" (threaded or single-thread) has to
>
> block on <something> (resource, user, i/o, etc.), then there is an
>
> opportunity for some other execution context to "do meaningful work"
>
> (note that this is not the same thing as a GUARANTEE that they will
>
> be able to do meaningful work!).
>
>
>
> How often this occurs and the amount (percent?) of time that the
>
> blocked process is suspended -- relative to the rate at which "new
>
> work" arrives -- determines how much time you can afford to "waste"
>
> in the overhead of your model (thread v. process).
>
>
>
> E.g., if work is represented by cars arriving at a toll booth
>
> (your job being to monitor the presence of individual cars, the
>
> receipt of appropriate payment from each and the control of the
>
> "gate" allowing paid vehicles to pass), you could (all else being
>
> equal) create a single-threaded process that:
>
> wait for car;
>
> wait for payment;
>
> raise gate;
>
> lather, rinse, repeat
>
> And, spawn N instances of this process -- one for each "lane"
>
> at the toll booth (binding the appropriate instances of "car
>
> sensor", "coin counter", "gate actuator" to each instance).
>
> The "procedure" (I am trying to avoid using the word "process")
>
> is inherently serial -- easily handled by a single thread.
>
>
>
> [A car doesn't arrive at lane 4, pay at gate 7 and then exit
>
> at gate 2!]
>
>
>
> THE PROCESSES HAVE NOTHING TO SAY TO EACH OTHER! So, there is
>
> no contention *between* them.
>
>
>
> Most of the time, a process is waiting for (the next thing)
>
> to happen. I.e., while waiting for payment, it doesn't have
>
> to deal with "another car" -- even though another car *may*
>
> be arriving in some other lane! So, the cost of multiple
>
> processes (time) is largely hidden in that "wait time".
>
>
>
> You can, similarly, design this as a set of THREADS in the
>
> exact same way! Each thread has nothing to share with the
>
> other threads!
>
>
>
> [Keep this in mind as reading each of the following examples.
>
> "Thread" can often be replaced by "process"; but, you will
>
> have to think of everything else going on in the particular
>
> example to evaluate how (in)effective that solution might be!]
>
>
>
> Imagine, instead, writing this process as a set of threads:
>
> one that waits for the car; another that waits for payment;
>
> a third that raises the gate (and, presumably, ensures the
>
> car has passed successfully). These threads need to
>
> share information -- you don't want the gate_raiser thread
>
> to raise the gate before the payment_received thread has
>
> vouched for the vehicle's compliance!
>
>
>
> That shared information can be as simple as a shared "state"
>
> variable: {AWAITING_CAR, AWAITING_PAYMENT, RAISING_GATE}.
>
> Each thread can be responsible for monitoring the variable
>
> to determine when it is appropriate to "start" AND updating
>
> the variable when it has finished its assigned chore.
>
> I.e., only one thread is ever "holding" the variable
>
> (able to write to it!).
>
>
>
> [threads could also directly start/unblock each other in
>
> succession... lots of ways to skin this cat]
>
>
>
> You could likewise use a set of *processes* to do this:
>
> each process (pedantically, the single thread *in* each
>
> process) responsible for blocking on a particular condition,
>
> etc. But, processes cost more and are heavier-footed than
>
> threads.
>
>
>
> In the "process" implementation, the sharing has to happen
>
> through some OS-supported mechanism -- *if* processes
>
> are prohibited with accessing each other's (or *SYSTEM*!)
>
> resources. In the thread implementation, threads within a
>
> shared "container" can freely exchange information
>
> (relying on synchronization primitives provided by the OS
>
> *or* by constraints inherent in the algorithm: "YOU set
>
> this, I will CLEAR it")
>
>
>
> You could, also, have one giant "process" with lots of
>
> threads -- that handles the entire toll-booth. (again,
>
> lots of ways to skin this cat... I'll let you sort out the
>
> "more obvious" ones)
>
>
>
> Threads could sit "awaiting events". A set of "accepting payment"
>
> threads (responsible for verifying proper payment) can sit
>
> waiting for "CAR_ARRIVED" events (messages). When such an event
>
> is detected, the first WAITING/blocked thread consumes it and
>
> begins execution (the event obviously has to specify the lane
>
> on which the waiting car was detected!).
>
>
>
> [The next "accepting payment" thread -- IF ANY (possibly a configurable
>
> option... you might have fewer threads than lanes, etc. depends on the
>
> expected interarrival times of "cars") -- then steps up and awaits
>
> the NEXT "CAR_ARRIVED" event. This may be on the same lane as the
>
> immediately preceding event -- or, another lane entirely!]
>
>
>
> The "accepting payment" thread recently activated (above), now
>
> sits waiting for "coin received" events (from the specific lane
>
> that it is monitoring!). When it has processed enough of these
>
> to indicate proper payment, it generates a PAYMENT_RECEIVED event
>
> (tagged with the corresponding lane number) and then goes back to
>
> waiting for another "CAR_ARRIVED" event.
>
>
>
> [I.e., this flavor thread can only handle CAR_ARRIVED events!]
>
>
>
> Similarly, another (set of one or more) "raising gate" threads
>
> sit waiting for PAYMENT_RECEIVED events and act accordingly.
>
>
>
> Here, you need as many "raising gate" threads as there are gates
>
> that you want to be able to raise CONCURRENTLY! (e.g., if you
>
> don't mind letting other "paid customers" wait while you raise
>
> the gate for customer X, then you only need enough of those
>
> threads to raise *on* gate at a time!
>
>
>
> Yet another way of doing this is to have N copies of generic threads
>
> that are capable of processing *any* sort of event -- i.e., having
>
> a dispatch table (switch statement) at the start to route the
>
> event to the appropriate processing code fragment. In this case,
>
> you need only enough threads to handle the total number of "things"
>
> that can be happening at one time (i.e., one thing on each lane).
>
>
>
> Ah, but what is to prevent a cock-up in The System (or, an exploit
>
> by a savvy user?) from preventing the vehicle's initial arrival
>
> to be immediately followed by a PAYMENT_RECEIVED event? I.e.,
>
> BEFORE the "accepting payment" thread has even been activated!
>
> (we have a technical term for this: "bug")
>
>
>
> In the initial "serial process", this wasn't possible: the code
>
> that was executed after payment was received COULDN'T run until
>
> a car had been detected AND coins counted. The design of the
>
> code precluded that possibility. To "exploit" the system,
>
> a user would have to synthesize all of the preceding events
>
> to "advance" the algorithm to the point where it was ready to lift
>
> the gate.
>
>
>
> OK, let's build a SHARED OBJECT that indicates the "state" of each
>
> of the lanes! That way, the "raising gate" thread won't invoke
>
> the actuator unless it sees all of the required prerequisites in
>
> place -- even if "signaled" by a PAYMENT_RECEIVED event!
>
>
>
> Now, you have several entities trying to update that state AT THE
>
> SAME TIME THAT OTHERS ARE TRYING TO EXAMINE IT. "Contention"
>
> that affects the entire application's performance -- ONE bottleneck
>
> (instead of a "bottleneck per lane" -- or NO bottlenecks!)
>
>
>
> Imagine if the cost of ATOMICLY accessing this object was a fat
>
> system call -- because it resided somewhere that all PROCESSES
>
> could access (contrast with THREADS)!
>
>
>
> In each case, you decide how much information you are sharing and
>
> who you are sharing it with. A single thread that runs a single
>
> lane from start to finish IMPLICITLY is sharing data with itself:
>
> it saw the car arrive on its assigned lane, it watched as the
>
> coins were deposited in the coin acceptor on that lane, then it
>
> raised the gate for that lane -- before returning to await the
>
> next arrival.
>
>
>
> As you split the "chore" into finer pieces -- or, split the
>
> handling of it into different/disjoint "execution contexts" -- you
>
> need to pass more information between those objects. E.g., passing
>
> events of the form (<lane>, <event_type>) to a set of generic
>
> "handlers" moves the sharing into the "event system".
>
>
>
> [whether this is a fifo, shared memory, IPC, etc.]
>
>
>
> OTOH, you increase the possibilities for concurrency and more
>
> efficient use of resources (why have N "raise gate" processes
>
> if drivers can afford to wait for THEIR gate to be lifted?
>
> Perhaps the gate lift mechanism can ONLY lift a single gate at
>
> a time (motor and gears/clutches).
>
>
>
> Sorry for the long-winded explanation. I will promptly be derided
>
> for it. But, hopefully it shows you different approaches (that
>
> exploit "potential parallelism/decomposition" in different ways)
>
> and the potential consequences of different approaches.
>
>
>
> You have to look at your workload and see what approach makes the
>
> most sense. Interconnections are expensive in any algorithm!