We’re all familiar with the most common threat vector – a hacker or other bad actor gains access to your cloud infrastructure to exploit system vulnerabilities. A less talked about threat vector is the purposeful or accidental deletion of data in house. Either way, the threat’s sure to become a reality some time or another. So how can you be prepared when that time comes?
To ensure high availability and disaster recovery capabilities that improve manageability and enable business continuity, you first need to identify what you want to protect – static infrastructure, dynamic infrastructure, persistent data, and dependencies – then create a disaster recovery plan that covers critical infrastructure and data.
Static infrastructure includes pieces of infrastructure that don’t change very often, including networking infrastructure, firewall and VPN appliances, as well as platform state files. For example, in the Kubernetes world static infrastructure might include cluster configuration and master/minion user data.
Tools like Terraform or AWS CloudFormation can be used for static infrastructure configuration management (CM). Make sure to hold regular code reviews and engineering reviews, as well as keep a version change history. Consider applying your static infrastructure configuration to multiple regions on your cloud platform, which gives you a secondary disaster recovery site without much extra cost.
Dynamic infrastructure includes pieces of the application stack that may change over your application’s lifecycle, such as server configuration, security groups, launch configurations, autoscaling groups, and load balancers.
Ansible is one commonly used DevOps tool for automating application deployment and orchestrating infrastructure changes. Regardless what tool you use, you need to track all dynamic infrastructure changes. CM, versioning, and engineering reviews are critical, as is the ability to completely rebuild the application stack if it’s destroyed or compromised in any way.
Say a bad actor gains access to your infrastructure. If you consistently maintain and monitor a single source of truth for your security policies, instance configuration, and resource provisioning, your application stack can be up and running quickly in a different region.
Persistent data is your company’s lifeblood. It’s where user information and application configuration and logic are stored. Most cloud storage resources allow you to regularly snapshot your data for backup purposes. To prevent loss, ensure snapshots are mirrored to a secondary region or account. Some tools to consider:
- Amazon Relationship Database Service (RDS) allows regular point-in-time snapshots of your database. Snapshots can be used to restore instances and are easily copied to other regions and accounts.
- Amazon Elastic Block Store (EBS) volumes hold data in each availability zone and take snapshots on a scheduled basis. Snapshots can be modified to give other AWS accounts access and can be copied between regions and accounts.
- Elasticache is an Amazon web service for caching storage or key value store. If you plan on maintaining state with this tool, be sure to set up automatic snapshots and ship backups just as you would a database.
- S3 is Amazon cloud storage for the Internet. All data in S3 buckets should be replicated to another account or region. A “write once, delete never” bucket policy and object versioning let you keep multiple versions in case your data is overwritten or deleted.
Consider maintaining a separate cloud account used only for backups. For example, create one AWS account strictly for backups, set it to “write once, delete never” to maintain a permanent backup history, sync persistent data on a regular basis, and limit the number of users who can access that data. Then if something happens to your main AWS account, you’ll still have your backup account.
Application and infrastructure-level dependencies exist within your workloads. If something happens to these dependencies, you might not be able to rebuild your infrastructure after a disaster. Some tools to consider:
- Amazon Machine Images (AMIs) are a snapshot of a provisioned operating system saved as a disk image. Your application stack may rely on custom base images, so make sure to back up AMI snapshots in case you need to rebuild. Shipping AMI snapshots to a secondary region enables disaster recovery if the primary region is compromised or has extended downtime.
- Route 53 is an Amazon DNS web service. Back in October, a DDoS attack took down a major DNS provider – and the Internet for most of the East coast. If you use Route 53 for DNS you need a backup plan. That way, if AWS has a significant outage, you can take your zone backups to another DNS provider and get back online quickly.
How can you make sure the cloud doesn’t bring you down? By keeping close tabs on what information has changed, who changed it, and when.
What’s most important is that you understand your infrastructure backup needs, then put the right custom backup and recovery strategy in place for your business.