Essentially, the preparation and availability of the personnel makes all the difference. A famous IT colleague once quipped, “When things go wrong, people get stupider.” In our experience that is absolutely true. Almost no one is at his best in the wee hours of the morning over a long holiday weekend when a system crash invariably happens.
They are the ones who can save the business, as they usually have the knowledge about what ultimately needs to be done to “fix” what is wrong to get the application and data back online and functioning properly. Having technical people who know performance, data comm., security, etc., matters as well; but when data loss is suspected or the application keeps faulting, the applications team is the best source of information for determining what matters now and what can wait until later. These employees must be educated and practiced in the recovery procedures for the application. Companies must invest in actual and formal testing, practicing, and education on how to recover operations when a serious fault occurs.
HPE NonStop Servers “hide” localized faults so well from staff that many become complacent and think they and their environments are invincible. But this complacency can kill a business if it does not prepare for the inevitable faults, such as datacenter fires, regional power grid outages, or even the horrors of the next 9/11.
It is a uni-directional, active/passive term, associated with the older disaster recovery architectures. More advanced, higher levels of availability can be deployed to avoid the need for a recovery, meaning that the application services at another location actually survive the geographic fault and continue the application processing.
We always advise customers to move beyond a uni-directional, active/passive, “recovery” architecture and into the more advanced and higher performing sizzling-hot-takeover and fully active/active, automatic failover architectures to achieve these benefits.
Enscribe allows a lot more flexibility with recovering lost or broken data and files than SQL environments. However, consider what is being used to back up the environment. Backup/restore tape- or virtual-tape-based? Data replication? Or nothing, meaning that no method for BC is in use?
Please realize that each HPE NonStop file system supports several file/table types, including structured and unstructured types, and each type requires a slightly different recovery process.
Regardless, any form of backup/restore or online dump/recovery will be slower and take more time than using a change data capture–based (CDC) data replication engine to replicate, backup, and recover data.
At the time of failure, data replication products start with a target database already loaded and synchronized with the source database; therefore, most of the data is already available to applications in the backup location. Generally, the data replication product is behind the changes being made by the application at the source (called latency) for a few seconds (to perhaps a few minutes in special cases). Failing over is then generally quite fast, because the data replication engine might need to clean up any incomplete transactions in process at the time of failure; but then the database is usually available to the applications to come up from that point forward. In some architectures (e.g., active/active), the target application environment could already be running and is thus available instantaneously. This practice is best as it reduces the risk that a failover fault may occur that would prevent the recovery from proceeding successfully.
Recognize the Benefits of a CDC Data Replication Engine
Some data replication architectures enable an application to remain online during a recovery. Therefore, application outages are longer with tape/virtual tape solutions. We generally view backup/restore (or online dumps/roll-forward) having a mean time to recover (MTTR) of hours to days vs. data replication’s minutes to seconds. Since the backup database is already available at all times, the SQL programs on the target can already be SQL-compiled, so no extra time is needed for that operation. Essentially, during a failover, the data replication handles the data, which allows the team to focus on ensuring that the application is up and running and that network communications are appropriately switched to route user requests to the surviving system and application environment.
In a BC context, its whole goal is to keep one or more copies of the data up-to-date in another location. If the copy is geographically dispersed, then true geographic disaster tolerance exists, because when a failure occurs, applications can quickly be recovered using the remote data copy. Data replication should be used to maintain a consistent and complete copy of the data in another location that is quickly accessible to the applications and is current at all times. By current, we mean that it is up-to-date, or only has very low latency, where the changes being replicated into it are only a moment (e.g., sub-second to a few seconds) behind the time when changes were made at the source database.
Tape/virtual tape works on a different principle. It provides a data copy capability, and is perfect for making an accurate and consistent copy of data at a particular point-in-time, and freezing that copy at the time it was taken. However, the copy itself is not directly available for applications to use, and if necessary, it will take some time (hours to days) for it to be restored. Tape/virtual tape should be used to save a consistent and complete copy of the data at a particular point-in-time, (e.g., daily, weekly, monthly, quarterly, or yearly). It is quite useful for storing online dumps and audit trails in case they are later needed for file or table recovery.
The Modern Datacenter Needs Both Tape/Virtual Tape Operations and Data Replication
The issue with data replication is that it is so fast and absolute. Accidentally purging a necessary file or table on the source will lead to that purge being replicated to the target, and the file or table is lost forever. However, tape/virtual tape can accurately retrieve a snap-shot of that file or table to the point when it was backed up, and then go from there.