“Breaking the Four 9s Barrier” as Published in The Connection

Published by Connect (Formerly ITUG), written by Dr. Bill Highleyman, Paul J. Holenstein, and Dr. Bruce Holenstein

  • Part 6: RPO and RTO (12/2003)
    Explains the concepts of Recovery Time Objective (RTO), or the measure of how much time it takes to recover from a disaster, and Recovery Point Objective (RPO), or the measure of how much data is lost in the event of a disaster, and how the different recovery architectures affect each concept.
  • Part 5: The Ultimate Architecture (9/2003)
    Uses the concepts explored in the previous articles to suggest a system architecture that can dramatically increase system availability at little additional cost.
  • Part 4: Facts of Life (6/2003)
    Explores what really makes systems fail and what if anything can be done about it. (This article draws heavily on actual experience documented by Jim Gray, one of the significant contributors to NonStop computing.)
  • Part 3: Sync Replication (4/2003)
    Compares the efficiencies of synchronous replication techniques that may be used to keep database replicates in a split system in exact synchronism, thus avoiding database corruption due to update collisions.
  • Part 2: System Splitting (2/2003)
    Points out that splitting a system, a common architecture for disaster recovery, significantly improves reliability at little or no additional cost.
  • Part 1: The 9s Game (11/2002)
    Discusses the dramatic increase that can be achieved in system availability by making a system redundant, therefore capable of tolerating one or more failures, which is the basis for NonStop systems, where availabilities of four 9s are achievable. (The system is up 99.99% of the time.)
  • Part 0: Intro and About the Author (9/2002)
    How to improve the availability of systems so that the loss of any significant capacity is measured in terms of centuries, rather than years at little or no additional cost.

Related Pages: