Business Continuity and Disaster Recovery Basics: Making a Plan
A friend of mine lives on a high prairie ranch surrounded with highly flammable Gambel oak.
Besides trimming the brush to create a defensive perimeter around his house, he has a bag prepared with important, difficult-to-reproduce documents, such as birth certificates and passports. In the event of a wildfire, his plan is to grab the bag, cell phone, backup drive for the computer and evacuate in his car. He’s willing to let everything else be “toast,” becoming part of his claim to the insurance company.
It’s impossible to clear all valuable items from his home, so he has selectively planned to save what is the most important. Whether we’re talking about your personal belongings or your enterprise’s most valuable assets, having a plan like this is the first critical step toward an effective disaster recovery posture.
Disaster Recovery Planning: Determining What’s Essential
A big part of business continuity and disaster recovery planning is just deciding what to recover. Certainly, it’s “simpler” to just plan to recover all operations for all applications, regardless of any individual application’s level of importance to business continuity. But providing resources—financial or otherwise—to recover everything is not feasible for every organization, nor is it necessarily a wise use of limited time, money and staff.
Some compliance standards may dictate what must be protected and may be audited to prove that data is protected. This can make it easy to determine what to protect, since in many ways, compliance regulations already decide for you.
Most businesses, however, just need to restore operations in order to remain viable after a disaster. In this way, what’s saved is usually just a raw factor of what makes the company money. Let’s walk through a couple of hypothetical examples.
Business Continuity Case No. 1
A COO originally placed priority on preserving his organization’s accounting systems; he wanted to ensure that his business was still able to bill for services even after a disaster.
However, he rethought his strategy to focus on revenue-generating systems: He opted to replicate operational systems that supported customers—the systems that actually contributed to his company’s bottom line. Accounting systems were still next up on the list of priorities, but the COO opted to use cloud backups, with a plan to rebuild systems within a month of disaster—quick enough to make the next invoicing period.
In other words, he prioritized making money with the next priority awarded to billing systems, while still protecting the critical data in those systems.
Business Continuity Case No. 2
An organization’s human resource records were preserved in offsite cloud backups, but it didn’t have a concrete plan for restoring them after a disaster event. At the time, HR systems were still run on local servers. (But most HR support systems have gone to hosted applications now, so this was less of an issue.)
A disaster-recovery-as-a-solution (DRaaS) service would quickly solve the issues, regardless, providing straightforward redundancy for critical systems.
Related to HR were the servers that fed training content to employees, which were relegated to the “toast” category. This doesn’t mean you lose them forever though: Using a service like cloud backups will preserve the training data until a plan can be formulated to restore the servers.
But the important part of the decision-making process is determining what needs to be online right away and what can be restored at a later time. I always ask prospective customers about their priorities.
As a third party, it’s useful to try to understand the decision-making process and whether there might be biases or gaps. But there should always be internal stakeholders asking the same questions—before disaster strikes.
Recovery operations can be chaotic enough without having to triage applications’ importance on the fly. Having a prioritized list of applications ready to go is invaluable.
How Should I Use Risk Impact Analysis to Make the Most of Limited Resources?
Beyond the monetary expense of determining what to save is the effort and manpower expense. As poet Robert Burns wrote in 1786, the “best laid plans of mice and men often go astray.”
Still, it’s best to plan out as much as possible and triage applications ahead of a disaster. This way, your IT staff already know where to spend effort and what to put on the back burner when disaster strikes.
A risk impact analysis helps you think through the probability of disaster and how much any given event would impact your business, allowing you to find a balance between your tolerance for risk with what is actually likely to happen.
For example, a server being corrupted during patch operations is high-probability but might only have a medium impact on customers, unless it’s an extended outage. On the other hand, the probability of a long-duration regional power outage affecting the entire eastern interconnection grid is far less likely but would have an enormous impact.
Risk impact analyses usually include:
- potential threats
- probability of threat occurring
- human impact
- property impact
- business impact
The spreadsheet used for a risk assessment can be made more elaborate by adding up the impact scores (1-3) and then multiplying by probability (1-3) to produce an overall score (here, the max score is 27). See below for an example of a fictitious firm located in St. Louis, Missouri:
Threat | Probability | Human Impact | Property Impact | Business Impact | Total Score |
Key server failure | 3 | 1 | 1 | 3 | 15 |
Hurricane striking St. Louis | 1 | 3 | 3 | 3 | 9 |
Earthquake striking St. Louis | 3 | 3 | 3 | 3 | 27 |
Loss of manufacturing space to fire | 2 | 3 | 3 | 3 | 18 |
In the above example, a key server failure is something nearly guaranteed to happen and would have an absolute impact to the business. A hurricane striking St Louis is highly unlikely and garners a low overall score.
However, because St. Louis is near the New Madrid Fault, it’s at risk of a catastrophic earthquake affecting personnel, property and the business and thus scores very high.
To show how much this can vary from location to location, the following is based on another fictitious firm in Cheyenne, Wyoming:
Threat | Probability | Human Impact | Property Impact | Business Impact | Total Score |
Key server failure | 3 | 1 | 1 | 3 | 15 |
Hurricane striking Cheyenne | 1 | 3 | 3 | 3 | 9 |
Earthquake striking Cheyenne | 1 | 3 | 3 | 3 | 9 |
Loss of manufacturing space to fire | 2 | 3 | 3 | 3 | 18 |
The key server failure keeps the same score, and Cheyenne is probably as unlikely as St. Louis to experience a hurricane. However, Cheyenne has a less than 5 percent chance of experiencing any seismic activity[1] and therefore has a very low probability of an earthquake. This business may thus want to prioritize not only mitigating server failures but also planning for fire within the manufacturing space.
How to Build Your Business Continuity and Disaster Recovery Plan: Exercises and Templates
I’m a big fan of checklists because I think they’re easier to follow than wordy, step-by-step instructions. (I also buy Atul Gawande’s The Checklist Manifesto for all of my direct reports.) And my friend agrees: If a wildfire is threatening his ranch, he can go down his list quickly and get out.
Similarly, creating a simple five to 10-page plan that is fast, readable, testable and executable will be far more useful than a complex or highly detailed plan that is hard to follow. And once it’s been written, it should be tested, whether formally or informally—but certainly before it’s needed.
You can get started by downloading our Business Impact Analysis Template:
I also recommend defining a communications plan. This might include:
- A phone tree of key business leaders
- A preplanned location to meet at time of disaster (e.g., at another business, a hotel conference room, the CEO’s house, etc.)
- Audio and/or web bridge information for internal discussions
- An external plan for how to communicate to customers (e.g., using a Twitter feed, a status website with up-to-the-minute information, etc.)
The value of a thoughtful business continuity and disaster recovery plan cannot be overstated. Just like you can’t clear the house of all your belongings in the middle of a wildfire, it’s just as important to determine which systems and resources must be saved when catastrophe strikes.
[1] James C. Case, Rachel N. Toner, and Robert Kirkwood Wyoming State Geological Survey, Basic Seismological Characterization for Laramie County, Wyoming, September 2002