While high availability efforts entail what you do to prevent an outage, disaster recovery efforts address what is done to re-establish high availability after the outage. Centralpoint’s N-tiered architecture below allow for fail over sites or clones to be run, to enable clients to set up and configure the disaster recovery and fail over requirements which meet their project.
Centralpoint's Master Enterprise License empowers clients with the ability to manage multiple instances (and clones) of it's existing architecture. This enables the client to configure high availability scenarios including disaster recovery, fail over, and even regional distribution of load balancing. As much as possible, disaster recovery procedures and responsibilities should be formulated before an actual outage occurs. Based upon active monitoring and alerts, the decision to initiate an automated or manual fail over and recovery plan should be tied to pre-established thresholds. The scope of a sound disaster recovery plan should include:
- Granularity of failure and recovery.Depending upon the location and type of failure, you can take corrective action at different levels; that is, data center, infrastructure, platform, application, or workload.
- Investigative source material.Baseline and recent monitoring history, system alerts, event logs, and diagnostic queries should all be readily accessible by appropriate parties.
- Coordination of dependencies.Within the application stack, and across stakeholders, what are the system and business dependencies?
- Decision tree.A predetermined, repeatable, validated decision tree that includes role responsibilities, fault triage, failover criteria in terms of goals, and prescribed recovery steps.
- Validation.After taking steps to recover from the outage, what must be done to verify that the system has returned to normal operations?
- Documentation.Capture all of the above items in a set of documentation, with sufficient detail and clarity so that a third party team can execute the recovery plan with minimal assistance. This type of documentation is commonly referred as a ‘run book’ or a ‘cook book’.
- Recovery rehearsals.Regularly exercise the disaster recovery plan to establish baseline expectations for RTO goals, and consider regular rotation of hosting the primary production site on the primary and each of the disaster recovery sites.