What is disaster recovery?
Disaster Recovery (DR) encompasses the procedures, policies or processes that prepare an organization’s vital IT infrastructure to effectively recover from natural or human-induced disasters, and ensure business continuity.
From cyber-attacks and equipment failure, through hurricanes or other natural disasters – DR needs to cover any possible scenario that threatens the availability of IT infrastructure. In recent years, Disaster Recovery has assumed an increasingly predominant role in enterprise computing budgets, often accounting for 20-25% of IT computing expenses.
Having the right disaster recovery plan
A disaster recovery plan (DRP) delineates how an organization will respond to any given disaster scenario, with the goal of supporting time-sensitive business processes and functions, and maintaining full business continuity.
A DRP contains both responsive and preventative elements, and is a key part of company’s Business Continuity Planning (BCP). On the responsive side, a DRP delineates numerous disaster scenarios, and defines the detailed responses to each, with the aim of minimizing that event’s negative impact. On the preventative side, a DRP aims to minimize the negative effects of specific scenarios by defining what the organization needs to do in order to avoid them.
More specifically, a DRP needs to anticipate and delineate a plan of action in response to the loss of such mission-critical IT components and services as:
- Complete computer room environments
- Critical IT hardware including network infrastructure, servers, desktop or laptop computers, wireless devices, and peripherals
- Service provider connectivity
- Enterprise software applications
- Data storage devices or applications
To achieve maximum efficacy and keep costs in check, organizations should plan to leverage a combination of internal resources and vendor-supported solutions in their Disaster Recovery planning. The optimal internal/vendor mix is dependent on the organization’s specific disaster recovery objectives, which are measured in terms of Recovery Time Objective (RTO) and Recovery Point Objective (RPO). Recovery Time Objective can be roughly defined as the amount of time a business can function without system availability, whereas Recovery Point Objective expresses how old the data will be once systems do recover.
Data center disaster recovery
To meet an organization’s RTO and RPO objectives, data center operators face numerous challenges. A key challenge is data synchronization. In other words: How to best ensure that data in all alternate locations is fresh, to guarantee service consistency and business continuity, even in the event of disaster?
To some extent, the answer to this question lies in the level of replication, which can be defined as the frequency with which the receiving system (the backup environment) acknowledges the receipt of data from the sending system (the production environment). The most common replication methods are:
- Synchronous Replication – The safest, yet most resource-demanding replication method. In a synchronous replication scenario, the receiving system acknowledges every single change received from the sending system. Adopting this method requires maintenance of a “hot” backup site, and it is most effective in combination with “hot” failover solutions and Global Server Load Balancing (GSLB) solutions.
- Semi-Synchronous Replication – The receiving system sends acknowledgement only after a series of changes have been received. This method of synchronization is parallel to the “warm” failover approach, and may be the right choice for services that – in the event of a disaster – can allow for some loss of data and a reasonable amount of downtime.
- Asynchronous Replication – This method’s data replication is faster but less secure, as the sending system simply continues to send data, without receiving any response. Parallel to the “cold” failover approach, this method is best suited for static resources or scenarios in which data loss is acceptable.
When creating a DRP, organizations need to ensure that their failover policy is fully in-line with their synchronization method of choice.
For example, the “hot-hot” synchronization/failover policy ensures that data is always 100% synchronized, and that a parallel system is always ready to take over for the production system with minimal latency or downtime.
However, if a data center has chosen asynchronous replication, the expense of maintaining a hot failover server may not be warranted, as data would not necessarily be fully replicated at any given moment of failure.
Finally, it is important for effective data center disaster recovery to maintain an off-premises failover device, which will monitor system health and reroute traffic in real-time to a backup data center in the event of failure.