Disaster recovery (DR) is a difficult concept in IT. It requires one to not only think about the many implications of a worst-case scenario but also develop a sound response. You have to think about that crazy thing that may or may not actually happen while deciding how much resources should be committed to that theoretical event – without materially disrupting your business, of course. But what if your DR site served an additional purpose? What if you could make your DR site more than just an expensive store of data that may or may not save your behind in the event of a worst-case scenario? Well, in the modern data center it’s much easier to make your plan B your plan A.
Think about how you’ve traditionally thought about DR. It’s been “backup the heck out of our data and get it to some off-site place so we can restore it if we need to.” The obvious questions becomes, how does that backup data end up becoming a live application environment? How long will it take to transfer the backup data to new hardware? For that matter, what infrastructure are we really planning to restore to that’s geographically diverse from our production environment? At the core of these questions is data availability; and availability is really what DR is all about.
So many of today’s applications are web-based and may even include native mobile and tablet components associated with them. As an infrastructure professional in charge of the availability of these applications, redundancy, load balancing, and replication are all things you’ve probably implemented in some form. If one of the application servers goes down, it fails over to another node. Well, you know what? This is how your DR site should work too.
At OpsCompass we approach DR from the standpoint of making your plan B your plan A. What that means is that to be truly resilient to a prolonged outage and achieve speedy recovery time objectives, your DR site should be a literal extension of your production environment. And this sort of setup doesn’t have to be cost prohibitive. Capacity, hardware specs and recovery time objectives can be tweaked to reflect financial constraints. Here are a few things that our team does to make DR a core application availability strategy, not only for us, but also for our customers:
Store backups in native hypervisor format
In a modern data center you’re probably highly virtualized. You may not be 100% there, but chances are your virtualization footprint is over 60%. Whether you’re running VMware or Hyper-V there are built-in tools for snapshots and other ways to quickly and easily migrate VMs from host to host.
Use VM replication
We use tools that allow us to replicate live VMs at our productions sites to powered-down VMs at our DR site. The DR VMs aren’t actually consuming resources when they’re powered down yet they can still be quickly powered on in the event of a disaster scenario. This produces a cost effective scenario whereby a replica of the production environment can be turned on in short order.
Pre-stage VM recovery
In many application environments, there are certain dependencies that need to be in place for it to be fully operational. So certain VMs need to go live before others. An example would be a Domain Controller needing to be up before you can bring on your sql server. We use tools that allow us to pre-stage the order of recovery in our DR site. When we say “recovery”, we really mean the order in which powered down replica VMs get turned on, so this process is quick.
Tiered Storage
In our production and DR environments we use flash-optimized tiered storage. What this means is that for any workload we can define whether it’s running on SSD or 7k spinning disks based on profile. This allows us to save money by keeping VMs on cheap 7k storage until they need to become production and we can instantly provide 10’s of thousands of IOPS to our applications by moving them into the SSD tier.
This was a quick primer on how we do DR for ourselves and our customers. Technology is so incredibly business-critical today, that we feel like it’s important that we continue to iterate and innovate on DR and application availability because it’s more important than ever.