Business continuity addresses the ability for an organization to adjust and survive a disruption to the business, whether it’s organizational, safety, environmental, or technology lapse induced. In the case of the technology aspect of protecting our business’ ability to operate and recover from issues, we call that backup/recovery at the local level and disaster recovery when discussing recovery processes, often including geographic considerations to accommodate the large-scale problems like natural disasters.
A practical, logical framework for planning backup/recovery (and in many areas) is the concept of layers and building blocks. When mapping recoverability needs (RTO, RPO) to the types of data you have and their form, does your process backup each building block and know how they connect? Next, assess your processes, people, and technology platform’s ability to deliver the required level of recovery. Then implement backup solutions that address:
-
Company or industry/regulatory-driven data retention and recoverability requirements
-
Service class or tiering system by system or data object
-
Network capabilities and hardware (e.g., LAN, WAN, VPN) across your enterprise’s data centers and DR site(s)
-
Virtualization and management software and VM’s (usually VMware or Hyper-V)
-
Tools and management information (utilities, certificates, admin account vaults, encryption keys, licenses, vendor support contracts, account numbers, and contact information)
-
Containers (e.g., Kubernetes)
-
Data that resides on “bare metal” server infrastructure like switches, firewalls, specialty server appliances, and virtual tape libraries
-
Operating system images (and dependencies like the local boot and swap/paging spaces)
-
Data resident on filesystems (e.g., transaction/interface files, documents, configuration files, applications; system, application logs, etc.)
-
Database and its related transaction/archive logs
-
System-to-system consistency and restorability
-
Single points of failure (SPOF) in your backup/recovery and DR infrastructure
-
People (and partners) with knowledge, access, and documentation to act on an issue
-
IT disaster recovery requirements
-
Regular disaster recovery testing (at least once per year)
-
Business process resiliency & recovery to ensure overall business continuity
Is the Proof In Your Backup/Recovery Pudding?
The best proof you have of an effective backup solution is to demonstrate it by exercising your recovery processes and technologies regularly. Common ways to accomplish that goal are system clones, database copies, cloning, and moving VM’s. These processes are often used to copy production databases to a QA or Sandbox database, so developers have production-level data quality and volumes to test with. If you haven’t had a Backup/Recovery Assessment completed in a while, consider getting a fresh one done using current guidelines and asset inventory since it will have inevitably changed in most cases.
Special Mention on Protecting Against Ransomware
You should also periodically test recovering sets of individual test files to a point in time based on how often you do file-level backups, how many versions you keep, and how long a full retention timeframe represents. Note that your backup policies and thus your recoverability may (and probably should) vary by SLA tier, e.g., production vs. sandboxes. For example, ransomware attacks were reported to be up 148% in 2021 compared to 2020. They often target shared drives like file servers and shared drives on Office365, encrypting your files and making them unusable.
At that point, you hope you have a ransomware protection system in place (ask if you’re interested) and can recover all of the files with minimal loss (is <12-24 hours sufficient?). You might hire a cybersecurity firm to see if they can find a way to resolve it. Or, failing those and dealing with a crippled business for 1-2 weeks on average, pay millions of dollars in ransom to get access restored. Studies show that organizations that pay the ransom are also >50% likely to be hit again. Cybersecurity is a modern-day threat that will only grow. While the tools and solutions to protect against such attacks are improving, nothing beats having a sound and proven backup/recovery solution to give you options, minimize the damage, and resume operations as quickly and efficiently as possible.
Trust But Verify With a Game
Still feeling ambitious and want to take it to the next level? Conduct a “war games” simulation where the lead administrator has been captured (or off on their honeymoon again). The others must prove they can execute a series of tasks to continue supporting the business. Execute the DR test plan and document lessons learned, noting new hosts, passwords, technologies, etc., that might have been introduced (or upgraded or replaced) since the last DR test. In this way, your organization will quickly confirm, layer by layer, process by process, object by object, how well the backup/recovery processes and enabling technologies meet the organization’s DR requirements.
Conclusion
By remaining vigilant through planning, regular testing, and spot-checking your operational processes from a DR mindset when issues occur, your team will be confident, and your backup/recovery capability will continue to be there when a disaster inevitably strikes.