DNS for Disaster Recovery Site Failover, Explained
As a Disaster Recovery as a Service (DRaaS) provider, our solutions engineering team is fielded with a lot of questions concerning the mechanics of the failover process. One of the most popular: How are my customers and my data redirected to HorizonIQ’s hosted environment in a DR scenario? The short answer: We redirect traffic to a failover site using a standard DNS method.
For the (slightly) longer version, let’s dive into the process, as well as some of the terminology you’ll encounter along the way.
How Do I Use DNS for Site Redirection?
HorizonIQ provides fixed public IP addresses at the DR site. Since these addresses are known, it’s fairly easy to predetermine how they will be used at time of disaster (ATOD) or at time of test (ATOT). To understand how the redirect works, keep in mind the function of two important DNS record types:
A Record
A Records are the most commonly used type of DNS record. It simply points a domain or subdomain name to an IP address.
CNAME
A CNAME (or canonical name) is an alias record. This is used to point to a particular A Record.
Since the IP addresses are known and fixed for the production and DR sites, A Records can be created for each site; we then can use CNAME to direct to the active record.
Determining the Time to Live
Time to live dictates how long it takes for your DNS information to be refreshed. In a DR scenario, the time to live (TTL) for the CNAME should be fairly short so that it isn’t cached by a browsing computer for too long a period. If an event occurs, you want browsing computers to be redirected quickly and not have to wait for their cached DNS records to expire.
The minimum TTL is one second, but this puts increased demand on your DNS server or DNS service provider. Some DNS providers’ plans limit the number of DNS queries per second (QPS), so lowering the TTL might require that you also adjust to a higher-priced plan.
We typically recommend setting TTL to match or be shorter than your recovery time objective (RTO). You will also want to check with your DNS registrar to find out how long they update their zones. For example, VeriSign refreshes zones every three minutes, which would fit very well for this.
Let’s walk through an example of how the redirect will work given the following information:
- DNS domain of paulp.com
- Production site with an IP of 108.168.254.43
- DR site with an IP of 138.78.95.129
- RTO of 1 hour (60 minutes)
I’ll first create an A Record for the production site with a TTL of 60:
Prod.paulp.com 108.168.254.43
A BIND zone file (a text file describing your DNS zones) for paulp.com might look like this:
To test this, I can ping prod.paulp.com.
I can then create an A Record for the DR site:
dr.paulp.com 138.78.95.129
If I have an active server at the DR site, this is easy to test without interrupting production. I simply ping dr.paulp.com.
Now I can use a CNAME record to point to www.paulp.com to the production site during normal operations. A BIND zone file for this record would like:
If disaster strikes, I can change my CNAME record to point www.paulp.com to the DR site.
Note that for web servers running name-based virtual hosts, the host header in this case is still www, so the web server will detect it as such and still direct it to the proper content.
Overall, DNS redirection is a quick and simple way of redirecting traffic from one site to another during a DR scenario. The process does not require a large amount of automation or technical knowledge and can be done by most end users.
If you’re in the early stages of crafting your Disaster Recovery plan and don’t know where to start, we can help.