My application runs 24/7/365. There's no "reset" switch. So, components (hardware AND software) that are upgraded are done so while the system is live. [I suspect there are six nines services on-line that behave similarly. But, if inspected "in the small", I wonder how clean the switchovers are? And, how durable the connections?] I can instantiate a new instance of a server and replace the bindings to the old server with bindings to the new. So, any *new* attempts to use the service are automatically routed away from the old server and handled, instead, by the new instance. Any connections/transactions to the old server can continue to be handled, there. When each client closes its connection to the server, the old server will see one less client to service. This is sort of equivalent to a zombie process (even though the process is actually still functioning as intended) And, no *new* clients will ever appear (because the new instance is handling them). So, eventually, the server will detect "no more clients" and exit(). This frees up the resources that the old service was using. [This is true even if an "old" client tries to reopen a connection to the service!] The problem comes with connections that are more durable/persistent. There comes a time when you want to more aggressively *kill* the "zombie", even though it is performing as intended, etc. -- if only to reclaim those resources! Doing so WITHOUT INTERRUPTING THE SERVICE is, of course, the desired way of meeting this goal (crashing the service is almost always a sign that you got lazy in the implementation). So, the old service needs to be told to "migrate clients" to the new instance of the service. Of course, the mechanics of migrating a specific client for a specific service are highly dependent on that service, etc. Any internal state associated with the client has to be abstracted to a form that the new server can interpret and map to its new implementation. This, then, has to be conveyed to the new service along with the endpoints for the existing client(s). The first question: should this be implemented as a "force clients to the new service" (i.e., shed ALL clients)? Or, as "force THIS client to the new service" -- iterated over the set of existing current clients? [The net result in each case should be the same] The former gives finer-grained control over the reallocation of resources/connections/activities. But, I can't really see how it would be used with *less* than the full set of current clients (so, is it needless detail?). The second question goes to who should affect this decision. Imposing it "from on high" ignores the particular requirements of the service in question -- it assumes all services are equally easy/difficult from which to migrate clients. OTOH, the only place that has global knowledge of what's happening (resource-wise) in the system is "on high". My current thinking is between two alternatives: - signal the new service that it should *acquire* the old clients (from OLD_SERVICE) and let "it" sort out the most expeditious way of doing so; - signal the old service that it should *shed* the old clients (to NEW_SERVICE) and let it figure out how best to affect that change I think I prefer the former as it lets the new service assume the agency implied by "on high"; *it* acts to affect the desired change but with a better idea of what is involved in doing that ("on high" being ignorant of the specifics of this service). So, it can ensure its own house is in order before contacting the old service and imposing the directive on it. In this way, the old service can make decisions as to which clients should be migrated first (there is always the threat of an unceremonious process shutdown imposed without consent "at some time", hereafter). And, it can negotiate with the new service as to the best way of exchanging state information for those connections (e.g., define the protocol as well as the mechanism)
Dynamic upgrading/Hot-swapping a service
Started by ●April 20, 2017
Reply by ●April 20, 20172017-04-20
On 4/20/2017 4:05 PM, Don Y wrote:> The first question: should this be implemented as a "force clients > to the new service" (i.e., shed ALL clients)? Or, as "force THIS client > to the new service" -- iterated over the set of existing current clients?> The former gives finer-grained control over the reallocation ofs/former/latter/> resources/connections/activities. But, I can't really see how it > would be used with *less* than the full set of current clients > (so, is it needless detail?).
Reply by ●April 21, 20172017-04-21
On Thu, 20 Apr 2017 16:05:00 -0700, Don Y <blockedofcourse@foo.invalid> wrote:>My application runs 24/7/365. There's no "reset" switch. So, >components (hardware AND software) that are upgraded are done >so while the system is live. > >[I suspect there are six nines services on-line that behave >similarly. But, if inspected "in the small", I wonder how >clean the switchovers are? And, how durable the connections?]Before (re)inventing the wheel, take a look how VAXcluster (now VMScluster) has done it since the 1980's.
Reply by ●April 21, 20172017-04-21
On 4/21/2017 12:45 AM, upsidedown@downunder.com wrote:> On Thu, 20 Apr 2017 16:05:00 -0700, Don Y > <blockedofcourse@foo.invalid> wrote: > >> My application runs 24/7/365. There's no "reset" switch. So, >> components (hardware AND software) that are upgraded are done >> so while the system is live. >> >> [I suspect there are six nines services on-line that behave >> similarly. But, if inspected "in the small", I wonder how >> clean the switchovers are? And, how durable the connections?] > > Before (re)inventing the wheel, take a look how VAXcluster (now > VMScluster) has done it since the 1980's.In cluster environments, nodes tend to be indivisible entities. When you update the software on a node, you update the node, itself. You "kick off" the processes that are running on the node just prior to the upgrade (even if that means migrating them to another node -- with an OLD copy of the service they are using at the time) and summarily replace the node (and the services it provides). [If you've migrated the existing connections to services that WERE running on that node to some other node, then you still have those clients running on that other node -- potentially indefinitely with the OLD server code!] Imagine, instead, that some processes are using the "file service" (whatever THAT is!) on *a* node. You want to upgrade the file service code (without affecting any of the other services that are running on the node) WHILE the file service remains in use on that node. I.e., install the new service and start it running. Change the service registry to reference the new server instance for *new* service requests (i.e., any files that are accessed AFTER this point will be handled by the NEW file service). Allow the old file service instance to remain active to finish servicing any existing connections. *Eventually*, the preexisting connections will be completely serviced (those files closed, etc.). Because all NEW requests are handled by the NEW service, the old service will eventually find itself with no work to do -- no active connections (clients). At that point, it can terminate itself with no deleterious impact on the system. The problem comes with clients that "linger" on the old service longer than you'd like. E.g., imagine a process that opens a dribble file and leaves it open FOREVER. That would stake a continuous claim on the old service preventing it from ever being "replaced". Or, you might be in a *hurry* to replace a service -- before the clients currently using it are naturally *done* with it. So, you need a way of migrating the *active* connections to another server ALONG WITH THE INTERNAL STATE associated with each of those connections. For a file service, that state might include an inode number, access mode (R/W), current file offset (for read or write), any buffered data (to be written or already read-ahead), any media I/O actions "in progress", etc. But, the *new* service may associate different state with each connection as dictated by *its* implementation. So, simply "copying" the state from the old service to the new service won't suffice; there needs to be some "state translation" that takes place to ensure the client's connection remains semantically intact across the transition between servers. Or, a modification of the server contract that allows any server to simply state, "I quit" and let the clients figure out how to recover or restore their use of that service (boo, hiss!). I'll be meeting up with some local colleagues, tonight, for 12oz curls. I'll see if any of the guys who work in "enterprises" can shed some light on the approach they take to this sort of thing. Though I suspect their users are more "transient" than persistent. So, will leave a service in short order as a natural consequence of their operation (in which case, just registering the new service and waiting would suffice).
Reply by ●April 21, 20172017-04-21
On Fri, 21 Apr 2017 10:43:28 -0700, Don Y <blockedofcourse@foo.invalid> wrote:>On 4/21/2017 12:45 AM, upsidedown@downunder.com wrote: >> On Thu, 20 Apr 2017 16:05:00 -0700, Don Y >> <blockedofcourse@foo.invalid> wrote: >> >>> My application runs 24/7/365. There's no "reset" switch. So, >>> components (hardware AND software) that are upgraded are done >>> so while the system is live. >>> >>> [I suspect there are six nines services on-line that behave >>> similarly. But, if inspected "in the small", I wonder how >>> clean the switchovers are? And, how durable the connections?] >> >> Before (re)inventing the wheel, take a look how VAXcluster (now >> VMScluster) has done it since the 1980's. > >In cluster environments, nodes tend to be indivisible entities. >When you update the software on a node, you update the node, >itself. You "kick off" the processes that are running on the >node just prior to the upgrade (even if that means migrating them >to another node -- with an OLD copy of the service they are using >at the time) and summarily replace the node (and the services it >provides). > >[If you've migrated the existing connections to services that WERE >running on that node to some other node, then you still have those >clients running on that other node -- potentially indefinitely with >the OLD server code!] > >Imagine, instead, that some processes are using the "file service" >(whatever THAT is!) on *a* node. You want to upgrade the file service >code (without affecting any of the other services that are running >on the node) WHILE the file service remains in use on that node. > >I.e., install the new service and start it running. Change the >service registry to reference the new server instance for *new* >service requests (i.e., any files that are accessed AFTER this >point will be handled by the NEW file service). Allow the old >file service instance to remain active to finish servicing any >existing connections. > >*Eventually*, the preexisting connections will be completely serviced >(those files closed, etc.). Because all NEW requests are handled by >the NEW service, the old service will eventually find itself with no >work to do -- no active connections (clients). At that point, it >can terminate itself with no deleterious impact on the system. > >The problem comes with clients that "linger" on the old service longer >than you'd like. E.g., imagine a process that opens a dribble file and >leaves it open FOREVER. That would stake a continuous claim on the >old service preventing it from ever being "replaced". > >Or, you might be in a *hurry* to replace a service -- before the >clients currently using it are naturally *done* with it. > >So, you need a way of migrating the *active* connections to another >server ALONG WITH THE INTERNAL STATE associated with each of those >connections. > >For a file service, that state might include an inode number, access >mode (R/W), current file offset (for read or write), any buffered data >(to be written or already read-ahead), any media I/O actions "in progress", >etc. > >But, the *new* service may associate different state with each connection >as dictated by *its* implementation. So, simply "copying" the state >from the old service to the new service won't suffice; there needs to >be some "state translation" that takes place to ensure the client's >connection remains semantically intact across the transition between >servers. > >Or, a modification of the server contract that allows any server to >simply state, "I quit" and let the clients figure out how to recover >or restore their use of that service (boo, hiss!). > >I'll be meeting up with some local colleagues, tonight, for 12oz curls. >I'll see if any of the guys who work in "enterprises" can shed some light >on the approach they take to this sort of thing. Though I suspect their >users are more "transient" than persistent. So, will leave a service in >short order as a natural consequence of their operation (in which case, >just registering the new service and waiting would suffice).It is more than a quarter of a century since I have been running a large VAXcluster with a dozen cabinet size CPUs, but I try to remember some of the details. If you have multiple CPUs with shared (and mirrored) disks, switching from an active process from one CPU to an other is quite easy. As long as the OS supports process checkpointing or swapping out a complete process to disk, things are easy. Instead of swapping in a process from disk back into the original CPU, just swap it in to an another CPU :-) In a VAX cluster, application programs refer to resources, such as disks by logical names. It is the responsibility of the system manager to maintain the day to day mapping between the logical disk names and the physical disk names. Some logical name lists (one logical name translates to multiple physical resources and the OS selected the first physical device available from the logical name list). In those days, dumb terminals were used. With the Ethernet/serial converters (DecServer xxx) running the LAT protocol, it was quite easy to automatically connect a dumb terminal user from one CPU to an other.
Reply by ●April 21, 20172017-04-21
On 4/21/2017 1:06 PM, upsidedown@downunder.com wrote:> On Fri, 21 Apr 2017 10:43:28 -0700, Don Y > <blockedofcourse@foo.invalid> wrote: > >> On 4/21/2017 12:45 AM, upsidedown@downunder.com wrote: >>> On Thu, 20 Apr 2017 16:05:00 -0700, Don Y >>> <blockedofcourse@foo.invalid> wrote: >>> >>>> My application runs 24/7/365. There's no "reset" switch. So, >>>> components (hardware AND software) that are upgraded are done >>>> so while the system is live. >>>> >>>> [I suspect there are six nines services on-line that behave >>>> similarly. But, if inspected "in the small", I wonder how >>>> clean the switchovers are? And, how durable the connections?] >>> >>> Before (re)inventing the wheel, take a look how VAXcluster (now >>> VMScluster) has done it since the 1980's. >> >> In cluster environments, nodes tend to be indivisible entities. >> When you update the software on a node, you update the node, >> itself. You "kick off" the processes that are running on the >> node just prior to the upgrade (even if that means migrating them >> to another node -- with an OLD copy of the service they are using >> at the time) and summarily replace the node (and the services it >> provides). >> >> [If you've migrated the existing connections to services that WERE >> running on that node to some other node, then you still have those >> clients running on that other node -- potentially indefinitely with >> the OLD server code!] >> >> Imagine, instead, that some processes are using the "file service" >> (whatever THAT is!) on *a* node. You want to upgrade the file service >> code (without affecting any of the other services that are running >> on the node) WHILE the file service remains in use on that node. >> >> I.e., install the new service and start it running. Change the >> service registry to reference the new server instance for *new* >> service requests (i.e., any files that are accessed AFTER this >> point will be handled by the NEW file service). Allow the old >> file service instance to remain active to finish servicing any >> existing connections. >> >> *Eventually*, the preexisting connections will be completely serviced >> (those files closed, etc.). Because all NEW requests are handled by >> the NEW service, the old service will eventually find itself with no >> work to do -- no active connections (clients). At that point, it >> can terminate itself with no deleterious impact on the system. >> >> The problem comes with clients that "linger" on the old service longer >> than you'd like. E.g., imagine a process that opens a dribble file and >> leaves it open FOREVER. That would stake a continuous claim on the >> old service preventing it from ever being "replaced". >> >> Or, you might be in a *hurry* to replace a service -- before the >> clients currently using it are naturally *done* with it. >> >> So, you need a way of migrating the *active* connections to another >> server ALONG WITH THE INTERNAL STATE associated with each of those >> connections. >> >> For a file service, that state might include an inode number, access >> mode (R/W), current file offset (for read or write), any buffered data >> (to be written or already read-ahead), any media I/O actions "in progress", >> etc. >> >> But, the *new* service may associate different state with each connection >> as dictated by *its* implementation. So, simply "copying" the state >>from the old service to the new service won't suffice; there needs to >> be some "state translation" that takes place to ensure the client's >> connection remains semantically intact across the transition between >> servers. >> >> Or, a modification of the server contract that allows any server to >> simply state, "I quit" and let the clients figure out how to recover >> or restore their use of that service (boo, hiss!). >> >> I'll be meeting up with some local colleagues, tonight, for 12oz curls. >> I'll see if any of the guys who work in "enterprises" can shed some light >> on the approach they take to this sort of thing. Though I suspect their >> users are more "transient" than persistent. So, will leave a service in >> short order as a natural consequence of their operation (in which case, >> just registering the new service and waiting would suffice). > > It is more than a quarter of a century since I have been running a > large VAXcluster with a dozen cabinet size CPUs, but I try to > remember some of the details. > > If you have multiple CPUs with shared (and mirrored) disks, switching > from an active process from one CPU to an other is quite easy. As long > as the OS supports process checkpointing or swapping out a complete > process to disk, things are easy. Instead of swapping in a process > from disk back into the original CPU, just swap it in to an another > CPU :-)That's not the same thing. That's *migrating* a process to a different CPU. You're moving the entire state of the process to resume execution on another CPU. All the "variables" AND all the instructions that interpret those variables! I want to "alter the executable" while it is running -- change the instructions and (somehow) tweak the variables so their current values "make sense" when interpreted by a different set of instructions! I'm typing a "followup" to your message using Thunderbird. I (the human) can be regarded as a client of Thunderbird. I am engaged in an interaction with it -- my CONNECTION to it persists continuously as I am typing this message. WHILE I AM TYPING, I want something to be able to sneak in and REPLACE the copy of Thunderbird that is executing in my computer's memory -- not just the copy that resides on the disk (which Windows won't examine until the next time I *load* Thunderbird) -- and to do so such that this message ends up intact as it is eventually posted to the NNTP server. I.e., to do this, you'd need to capture a copy of what I've typed up to the instant the upgrade is switched in *under* me. It would have to know how the windows that the old Thunderbird instance was using were maintained by the OS, and the source of keystrokes and other user interface events. It would be messy and tedious to get it "right" -- but not impossible. A far easier goal would be to swap the executable bound to "Thunderbird.exe" so that the next time I invoked Thunderbird, I'd get the NEW executable; let my current interaction run to completion with the *old* executable! But, there's no guarantee that I will terminate this Thunderbird session anytime soon. Or *ever*! The OS can forcibly move the user interface connections to another process running on the same -- or different -- node. But, that doesn't mean the client's (i.e., user's) experience will be "continuous" or coherent.> In a VAX cluster, application programs refer to resources, such as > disks by logical names. It is the responsibility of the system manager > to maintain the day to day mapping between the logical disk names and > the physical disk names. Some logical name lists (one logical name > translates to multiple physical resources and the OS selected the > first physical device available from the logical name list). > > In those days, dumb terminals were used. With the Ethernet/serial > converters (DecServer xxx) running the LAT protocol, it was quite easy > to automatically connect a dumb terminal user from one CPU to an > other.
Reply by ●April 22, 20172017-04-22
On Fri, 21 Apr 2017 17:51:42 -0700, Don Y <blockedofcourse@foo.invalid> wrote:>On 4/21/2017 1:06 PM, upsidedown@downunder.com wrote: >> On Fri, 21 Apr 2017 10:43:28 -0700, Don Y >> <blockedofcourse@foo.invalid> wrote: >> >>> On 4/21/2017 12:45 AM, upsidedown@downunder.com wrote: >>>> On Thu, 20 Apr 2017 16:05:00 -0700, Don Y >>>> <blockedofcourse@foo.invalid> wrote: >>>> >>>>> My application runs 24/7/365. There's no "reset" switch. So, >>>>> components (hardware AND software) that are upgraded are done >>>>> so while the system is live. >>>>> >>>>> [I suspect there are six nines services on-line that behave >>>>> similarly. But, if inspected "in the small", I wonder how >>>>> clean the switchovers are? And, how durable the connections?] >>>> >>>> Before (re)inventing the wheel, take a look how VAXcluster (now >>>> VMScluster) has done it since the 1980's. >>> >>> In cluster environments, nodes tend to be indivisible entities. >>> When you update the software on a node, you update the node, >>> itself. You "kick off" the processes that are running on the >>> node just prior to the upgrade (even if that means migrating them >>> to another node -- with an OLD copy of the service they are using >>> at the time) and summarily replace the node (and the services it >>> provides). >>> >>> [If you've migrated the existing connections to services that WERE >>> running on that node to some other node, then you still have those >>> clients running on that other node -- potentially indefinitely with >>> the OLD server code!] >>> >>> Imagine, instead, that some processes are using the "file service" >>> (whatever THAT is!) on *a* node. You want to upgrade the file service >>> code (without affecting any of the other services that are running >>> on the node) WHILE the file service remains in use on that node. >>> >>> I.e., install the new service and start it running. Change the >>> service registry to reference the new server instance for *new* >>> service requests (i.e., any files that are accessed AFTER this >>> point will be handled by the NEW file service). Allow the old >>> file service instance to remain active to finish servicing any >>> existing connections. >>> >>> *Eventually*, the preexisting connections will be completely serviced >>> (those files closed, etc.). Because all NEW requests are handled by >>> the NEW service, the old service will eventually find itself with no >>> work to do -- no active connections (clients). At that point, it >>> can terminate itself with no deleterious impact on the system. >>> >>> The problem comes with clients that "linger" on the old service longer >>> than you'd like. E.g., imagine a process that opens a dribble file and >>> leaves it open FOREVER. That would stake a continuous claim on the >>> old service preventing it from ever being "replaced". >>> >>> Or, you might be in a *hurry* to replace a service -- before the >>> clients currently using it are naturally *done* with it. >>> >>> So, you need a way of migrating the *active* connections to another >>> server ALONG WITH THE INTERNAL STATE associated with each of those >>> connections. >>> >>> For a file service, that state might include an inode number, access >>> mode (R/W), current file offset (for read or write), any buffered data >>> (to be written or already read-ahead), any media I/O actions "in progress", >>> etc. >>> >>> But, the *new* service may associate different state with each connection >>> as dictated by *its* implementation. So, simply "copying" the state >>>from the old service to the new service won't suffice; there needs to >>> be some "state translation" that takes place to ensure the client's >>> connection remains semantically intact across the transition between >>> servers. >>> >>> Or, a modification of the server contract that allows any server to >>> simply state, "I quit" and let the clients figure out how to recover >>> or restore their use of that service (boo, hiss!). >>> >>> I'll be meeting up with some local colleagues, tonight, for 12oz curls. >>> I'll see if any of the guys who work in "enterprises" can shed some light >>> on the approach they take to this sort of thing. Though I suspect their >>> users are more "transient" than persistent. So, will leave a service in >>> short order as a natural consequence of their operation (in which case, >>> just registering the new service and waiting would suffice). >> >> It is more than a quarter of a century since I have been running a >> large VAXcluster with a dozen cabinet size CPUs, but I try to >> remember some of the details. >> >> If you have multiple CPUs with shared (and mirrored) disks, switching >> from an active process from one CPU to an other is quite easy. As long >> as the OS supports process checkpointing or swapping out a complete >> process to disk, things are easy. Instead of swapping in a process >> from disk back into the original CPU, just swap it in to an another >> CPU :-) > >That's not the same thing. That's *migrating* a process to a different >CPU. You're moving the entire state of the process to resume execution >on another CPU. All the "variables" AND all the instructions that >interpret those variables! > >I want to "alter the executable" while it is running -- change the >instructions and (somehow) tweak the variables so their current >values "make sense" when interpreted by a different set of instructions! > >I'm typing a "followup" to your message using Thunderbird. I >(the human) can be regarded as a client of Thunderbird. I am engaged >in an interaction with it -- my CONNECTION to it persists continuously >as I am typing this message. > >WHILE I AM TYPING, I want something to be able to sneak in and >REPLACE the copy of Thunderbird that is executing in my computer's >memory -- not just the copy that resides on the disk (which Windows >won't examine until the next time I *load* Thunderbird) -- and to >do so such that this message ends up intact as it is eventually >posted to the NNTP server. > >I.e., to do this, you'd need to capture a copy of what I've typed >up to the instant the upgrade is switched in *under* me. It >would have to know how the windows that the old Thunderbird >instance was using were maintained by the OS, and the source >of keystrokes and other user interface events. > >It would be messy and tedious to get it "right" -- but not impossible. > >A far easier goal would be to swap the executable bound to "Thunderbird.exe" >so that the next time I invoked Thunderbird, I'd get the NEW executable; >let my current interaction run to completion with the *old* executable! > >But, there's no guarantee that I will terminate this Thunderbird session >anytime soon. Or *ever*! > >The OS can forcibly move the user interface connections to another >process running on the same -- or different -- node. But, that doesn't >mean the client's (i.e., user's) experience will be "continuous" >or coherent.I still do not see what your actual problem is. Just swap the MAC addresses between the activating and passivating server and the client nor the client application doesn't noticing anything special. On the server side with stateless protocols such as UDP and LAT things are quite straight forward. With state full protocols like TCP, things get hairy, if the protocol state is maintained in kernel mode, if it is not swapped out and in into an other process with the user mode code. With the TCP stack in user mode, this should not be a big problem.
Reply by ●April 22, 20172017-04-22
On 4/22/2017 4:09 AM, upsidedown@downunder.com wrote:> I still do not see what your actual problem is.Find a piece of software that is currently executing: your microwave oven controller, your PC (consider it a *collection* of software), your calculator, your .... Now, WHILE it is "solving some particular problem for which it was designed", pause the clock and replace all the INSTRUCTIONS in the program(s) with a new, revised program (it does <whatever> only "better" (the 8 digit calculator now handles 12 digits; the microwave oven now has 6 other types of cycles; the PC is now running Windows 11 instead of DOS 3.3; etc.) Let the clock resume. None of the actions that were running at the time the clock was PAUSED should have been affected by the upgrade. I.e., if the calculator was in the middle of computing "14!", it should continue to completion -- from wherever it happened to have been, at the time -- yielding the correct result. Note, however, that the result should now be displayed as 8.71782912*10^10 or 87178291200 and NOT as 8.7178291*10^10 to reflect the extra precision that it has internally as well as the extended "display"/reporting capability (assuming, of course, that the original executable was interrupted before any loss of precision). Put something in your microwave oven. Set the timer to X. After an arbitrary amount of time, pause the process (processor) and replace the ROMs. Resume the process. EXPECT the entire process -- start to finish -- to proceed exactly as it would have had you not replaced the ROMs!> Just swap the MAC addresses between the activating and passivating > server and the client nor the client application doesn't noticing > anything special. > > On the server side with stateless protocols such as UDP and LAT things > are quite straight forward.Communication protocols aren't the only places where state is involved. Start counting out loud. The next time you encounter a person, switch to another language. I.e., the algorithm by which you determine the next ordinal to speak has changed. But, you've still got to remember which was the *last* previously spoken!> With state full protocols like TCP, things get hairy, if the protocol > state is maintained in kernel mode, if it is not swapped out and in > into an other process with the user mode code. With the TCP stack in > user mode, this should not be a big problem. >
Reply by ●April 23, 20172017-04-23
On 4/22/2017 11:55, Don Y wrote:> On 4/22/2017 4:09 AM, upsidedown@downunder.com wrote: >> I still do not see what your actual problem is. > > Find a piece of software that is currently executing: your > microwave oven controller, your PC (consider it a *collection* > of software), your calculator, your .... > > Now, WHILE it is "solving some particular problem for which it > was designed", pause the clock and replace all the INSTRUCTIONS > in the program(s) with a new, revised program (it does <whatever> > only "better" (the 8 digit calculator now handles 12 digits; the > microwave oven now has 6 other types of cycles; the PC is now > running Windows 11 instead of DOS 3.3; etc.) > > Let the clock resume. None of the actions that were running > at the time the clock was PAUSED should have been affected by > the upgrade. I.e., if the calculator was in the middle of > computing "14!", it should continue to completion -- from > wherever it happened to have been, at the time -- yielding > the correct result. > > Note, however, that the result should now be displayed as > 8.71782912*10^10 or 87178291200 and NOT as 8.7178291*10^10 > to reflect the extra precision that it has internally > as well as the extended "display"/reporting capability > (assuming, of course, that the original executable was > interrupted before any loss of precision).Since this group is for embedded processing, it is fair to ask why the original calculator would have a display with more that 8 significant figures.> > Put something in your microwave oven. Set the timer to X. > After an arbitrary amount of time, pause the process (processor) > and replace the ROMs. Resume the process. EXPECT the entire > process -- start to finish -- to proceed exactly as it would > have had you not replaced the ROMs!This assumes that you can replace the ROMs by some hot-swap process that does not kill power to the RAM/registers that hold the state and quickly enough that the food will not cool substantially. Also, the old program state must be coded so that the new ROMs read and operate on it properly. It sounds like a lot of work.> >> Just swap the MAC addresses between the activating and passivating >> server and the client nor the client application doesn't noticing >> anything special. >> >> On the server side with stateless protocols such as UDP and LAT things >> are quite straight forward. > > Communication protocols aren't the only places where state is involved. > Start counting out loud. The next time you encounter a person, switch > to another language. I.e., the algorithm by which you determine the > next ordinal to speak has changed. But, you've still got to remember > which was the *last* previously spoken! > >> With state full protocols like TCP, things get hairy, if the protocol >> state is maintained in kernel mode, if it is not swapped out and in >> into an other process with the user mode code. With the TCP stack in >> user mode, this should not be a big problem. >> >-- Best wishes, --Phil pomartel At Comcast(ignore_this) dot net
Reply by ●April 23, 20172017-04-23
On 4/23/2017 10:01 AM, Phil Martel wrote:> On 4/22/2017 11:55, Don Y wrote: >> On 4/22/2017 4:09 AM, upsidedown@downunder.com wrote: >>> I still do not see what your actual problem is. >> >> Find a piece of software that is currently executing: your >> microwave oven controller, your PC (consider it a *collection* >> of software), your calculator, your .... >> >> Now, WHILE it is "solving some particular problem for which it >> was designed", pause the clock and replace all the INSTRUCTIONS >> in the program(s) with a new, revised program (it does <whatever> >> only "better" (the 8 digit calculator now handles 12 digits; the >> microwave oven now has 6 other types of cycles; the PC is now >> running Windows 11 instead of DOS 3.3; etc.) >> >> Let the clock resume. None of the actions that were running >> at the time the clock was PAUSED should have been affected by >> the upgrade. I.e., if the calculator was in the middle of >> computing "14!", it should continue to completion -- from >> wherever it happened to have been, at the time -- yielding >> the correct result. >> >> Note, however, that the result should now be displayed as >> 8.71782912*10^10 or 87178291200 and NOT as 8.7178291*10^10 >> to reflect the extra precision that it has internally >> as well as the extended "display"/reporting capability >> (assuming, of course, that the original executable was >> interrupted before any loss of precision). > Since this group is for embedded processing, it is fair to ask why the original > calculator would have a display with more that 8 significant figures.Why does the calculator *function* have to be implemented in a calculator *package*? Do you not use <math.h> in your embedded applications? With the tiniest bit of imagination, one should be able to consider a new math library that had greater precision *or* different algorithms that converged faster than the previous implementation. Given that you (I) can not shut the application down "for maintenance", how would you replace the library (used by multiple modules) in the application while the system was powered up and operating? (see my previous examples for steps) Replace "library" with "service" and you have my original question (i.e., most libraries can be implemented *as* services with the re-formalization of the interface communication overhead)>> Put something in your microwave oven. Set the timer to X. >> After an arbitrary amount of time, pause the process (processor) >> and replace the ROMs. Resume the process. EXPECT the entire >> process -- start to finish -- to proceed exactly as it would >> have had you not replaced the ROMs! > This assumes that you can replace the ROMs by some hot-swap process that does > not kill power to the RAM/registers that hold the state and quickly enough that > the food will not cool substantially.Again, imagination suggests you could implement the ROMs (i.e., the program TEXT) in other media that *can* be (effectively) replaced "between one clock cycle and the next". This is all old technology. The problem lies in doing so while some consumer (client) might be ACTIVELY executing within that block of program TEXT.> Also, the old program state must be > coded so that the new ROMs read and operate on it properly.No, that isn't necessary. In fact, different algorithms may use inconsistent state vectors so that mapping from one algorithm to another is not possible. That doesn't preclude "interrupting" existing processing, replacing the TEXT and finishing the processing with the "new" algorithm.> It sounds like a lot of work.That's why things like Windows want you to reboot so often! :> OTOH, web sites and enterprise systems regularly roll out updates WHILE still providing services -- because the cost of shutting the systems/services down for that update can be substantial ("We're sorry, but the on-line banking transaction that you are engaged in AT THIS MOMENT will be aborted. Please try again later.") (Would you want to have to *stop* your car to have the code in the ABS system updated -- given that stopping the car might not be possible, reliably, given the current state of the ABS code? :> )