Hi, Is there any good literature out there pertaining to software design of redundant systems? I have some ideas, but I am not sure if they are adequate, or even correct. I am thinking along the lines of separating the redundancy logic from the business logic. In today's world no software application is an island unto itself. Software applications communicate with each other via some sort of IPC, or by accessing some shared data (e.g. shared memory and file). Also, software applications start timers, and a lot of processing is triggered by the triggering of the timers. I am thinking of maintaining the notion of redundancy state in the supporting software, outside the business logic of the application. What I mean specifically is this: I will create wrapper functions around IPC system calls, IO calls and timer calls. Inside those wrapper functions, I will maintain the notion of redundancy state. For example, if the redundancy state is standby, the wrapper functions for IPC will not send out any messages, the wrapper functions for shared memory access will not access the shared memory, the wrapper functions for file access will not access shared files and the wrapper functions for timers will not set any timers. The advantage that I see with this approach is that the business logic of the application is completely oblivious to the redundancy state. When the redundancy state switches to active, lo and behold all these wrapper functions are turned ON, and they begin to work normally. An alternate approach is to make a call to a function which returns immediately on the active side, but blocks on the standby side, up until the redundancy state changes to active. While easier to implement, the disadvantage of this approach is that upon switchover, control will resume only from this point onwards. No discussion on redundancy is complete without a discussion on data synchronization and the need for checkpointing. Data synchronization of persistent data seems to be a lot easier than data synchronization of memory-resident data. In the former case we could potentially rely on external utilities and operating system capabilities (e.g. timestamps on files) maintaining this synchronization, using some criteria (e.g. time based or number of updates). For synchronization of memory-resident data, I have the following in mind. I "register" a certain region of process memory with a "memory duplication service". This service runs on the active and standby side in its own thread. Any data that is written anywhere in this region of memory on the active side gets copied to the standby side. Of course the physical memory address values inside the two instances of the applications (primary and secondary) will be different, but within these address spaces relative offsets will be the same (after all it is the same software that runs in both active and standby mode). To duplicate some data from active to standby, you merely need to provide its offset from the beginning and its size. If more than one region of memory are "registered", the memory region identifier may also need to be provided. I have tried to look far and wide to see if there are any standards for redundancy management. The only standard that I have found so far is X.751 from ITU-T. However, this standard only deals with the management aspect of redundancy management. Unfortunately this document reads like scripture ---- extremely cryptic that takes at least a few readings before you get it. For example it took me a long time to realize that PRIMARY and SECONDARY are roles in the fallback relationship, while BACKEDUP and BACKUP are roles in the backup relationship. I had initially assumed them to be synonymous. To wrap up, I would appreciate if someone could provide some software strategies for building redundant systems. Regards, Bhat
Software design for redundant systems
Started by ●January 14, 2005
Reply by ●January 15, 20052005-01-15