March 1, 2023
Backing up your enterprise data and applications is a no-brainer. Most everyone has experienced that moment of panic when a hardware failure sinks in and you realize the project you’ve been working on is never coming back. When we’re talking about an entire company’s IT infrastructure, an outage means dozens or hundreds of projects with hefty downtime costs.
There are many different types of backup though, so you need to design a solution that brings critical systems back online as fast as possible while still backing up and maintaining multiple versions of your important data and applications, and perhaps even storing archive data in long-term cold storage. That takes a combination of backup and disaster recovery.
You might have a backup plan in place, but backups are not disaster recovery (and disaster recovery is not ideal for backup, either). Backup is intended as a long-term, low-cost solution for storing data, applications, configurations, etc. Disaster recovery is designed to get the most critical portions of your IT infrastructure back online as fast as possible.
That means storage and bandwidth costs tend to be higher with DR, but recovery times are measured in minutes rather than hours or days. Let’s take a look at the other ways the two methods differ.
An important considering in your data protection strategy is how much data you can afford to lose. One metric we use to define this is Recovery Point Object, or RPO. RPO is the longest acceptable amount of time it takes for changed data to make it to your backup or target environment.
After an initial backup to external and/or offsite storage, backups are only performed at certain increments. These backups may be done every day, or once per week, or less often as needed. Your backup job frequency will define your RPO. For instance, if you do a daily backup job than your highest possible RPO will be 24 hours.
A disaster recovery plan will vary based on your specific organizational requirements, but in general will involve synchronous copying of data to an off-site target. This means that the virtual or physical machines that are included in your DR plan are copied over to a failover site as soon as data is changed. This way, if your primary infrastructure goes down, you can power up a very recent version of those critical servers. Disaster Recovery solutions generally replicate data synchronously and data is sent to the target environment almost immediately, thus giving a much lower RPO and therefore less potential data loss.
Because your backups are performed at lower intervals, the retention period can be very high. You simply continue to add new data and file versions to your backup set and store it as long as your storage capacity will allow.
With disaster recovery, the data retention window is much smaller. Because data is generally replicated synchronously, many more individual checkpoints are created, which requires more storage. This window of checkpoints is often referred to as a journal. The journal size is ultimately defined by how much storage you have available, but is generally around 24-48 hours.
Backup and disaster recovery can also differ in what pieces of your infrastructure you can restore.
Backups are traditionally designed to allow for granular recovery of files or databases. This allows you to go back and grab a specific version of a spreadsheet before it was overwritten, or restore a database for testing alongside your production. In contrast, Disaster Recovery aims to restore entire systems, whether that’s a single virtual machine image or an entire datastore.
We are actually starting to see a convergence on this front, where backup products, such as Veeam, can backup entire VM images, and DR products, such as Zerto, can restore individual files from a replicated image. These features often include some compromise to your overall data protection strategy, so we suggest you review your requirements thoroughly.
Two reasons you want to carefully design your DR plan around frequency, retention, and versioning are cost and time concerns. A synchronous, rapidly restored disaster recovery environment is going to become very expensive if it covers your entire infrastructure, mostly due to bandwidth costs. A fast recovery is also impossible if terabytes or petabytes of data must be transferred.
Which brings us to the final difference between backup and DR: Recovery Time, or RTO. RTO is defined as how long it takes you to recover your data and bring systems back online.
With backups, the recovery time can take hours or even days depending on how much data you need to restore. Since backup data is often stored on a separate storage platform than your production environment, that data first has to be copied off. If an entire server is lost, you may have to rebuild a new one from scratch and reinstall all of your applications before the data can be restored.
Your disaster recovery plan, even if it entails a large portion of infrastructure, is designed to failover and/or restore entire systems in minutes. Replicated data is usually stored on the same storage and hosting platforms that will the environment in a failover, so data doesn’t need to be copied a second time. Often times a DR failover is as simple as powering on virtual machines in the public cloud of another data center.
As you can see, it’s vital to have both backup and DR strategies for your IT infrastructure, as each serves a different purpose and has different benefits. Keep all of your data backed up so you’re less likely to lose files, even from years and years ago; but design a smart DR plan so you can get critical business systems back up and running quickly in the event of a natural disaster, local outage, or any other cause of downtime.
Start Your Disaster Recovery Plan
Posted by Solutions Architect Josh Larsen