To provide high availability and multi-level resilience against equipment failures, the NJGS will employ a subsystem-level redundancy scheme, in which two or more independent instances of each subsystem provide fully redundant functionality, for the various subsystems within the NJGS. The use of this scheme requires the implementation of several safeguards to ensure that a subsystem failure and the resulting system recovery operations do not result in any loss of operational data.
This paper discusses the key elements of the subsystem-level redundancy scheme and the mechanism through which the NJGS recovers from a subsystem failure. The paper focuses on the potential failure scenarios present in the recovery process, and the technical and procedural safeguards necessary to ensure data integrity across subsystem instances.
Supplementary URL: