SlideShare a Scribd company logo
1 of 47
Download to read offline
RTO & RPO
Best Practices
in Hybrid Architectures
OSDC - May 2019
Fernando Hönig
fernando@nubego.io
fernandohonig
RTO vs RPO
What is this?
2© 2019, nubeGO or its Affiliates. All rights reserved.
RTO vs RPO
Apples vs Oranges
It calculates how quickly you need to recover.
It is the target time you set for the recovery.
3
It is focused on data and your company’s loss
tolerance in relation to your data.
It is determined by looking at the time between
data backups and the amount of data that could
be lost in between backups.
© 2019, nubeGO or its Affiliates. All rights reserved.
RTO
RPO
RPO and RTO
4
© 2019, nubeGO or its Affiliates. All rights reserved.
The business can recover from
losing (at most) the last 12 hours
of data.
The application can be
unavailable for a maximum of
1 hour.
AVAILABILITY CONCEPTS
5
© 2019, nubeGO or its Affiliates. All rights reserved.
HIGH Availability
Backup
Disaster Recovery
Minimizing downtime for your application
Making your data safe
Getting your applications and data back
after a major disaster
What could go wrong?
6
© 2019, nubeGO or its Affiliates. All rights reserved.
HOW DO WE FIX IT? QUICKLY?
Small events
Large Scale events
Colossal events
Instance restart failure
Application deployment failure
Availability Zones down
Unavailable services
Unavailable region
Infrastructure destruction by error
Latest Events
7
© 2019, nubeGO or its Affiliates. All rights reserved.
Small events
Large Scale events
Colossal events
Instance restart failure
Application deployment failure
GitHub S3 AZ Unavailable
UK’s Petition System Unavailable
Data Unavailable - Failed Backups
GitLab Database Destruction
DISASTER PLANNING
8
© 2019, nubeGO or its Affiliates. All rights reserved.
RECOVERY OPTIONS
DISASTER PLANNING
9
© 2019, nubeGO or its Affiliates. All rights reserved.
Operating System
10
© 2019, nubeGO or its Affiliates. All rights reserved.
Machine Images
Snapshot to other regions
Share it across your accounts/projects
UserData
Create scripts to execute during start up
Patch / Update your OS and stay up to date
Storage
11
© 2019, nubeGO or its Affiliates. All rights reserved.
Object storage
Replicate to other regions
Enable versioning
Block storage
Create point-in-time Snapshots
Copy snapshots across regions and accounts
Machine Images and Snapshots
12
© 2019, nubeGO or its Affiliates. All rights reserved.
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/images/ami_lifecycle.png
Networking
13
© 2019, nubeGO or its Affiliates. All rights reserved.
DNS
Enable health checks
Enable Latency records
Load
balancing
Failover options
Health Checks with HTTP Code
VPC Extend your network to the Cloud
Direct
Connect
Enable fast and consistent replication/backup
options from on-premise environments to the cloud
Databases
14
© 2019, nubeGO or its Affiliates. All rights reserved.
Snapshot data and save it in a separate region
Combine Read Replicas with Multi-AZ to
build a resilient disaster recovery strategy
RDS
Infrastructure
15
Use templates to quickly deploy collections of
resources as needed
Treat it as code, test it and deploy new
changes with your application releases
IAC
© 2019, nubeGO or its Affiliates. All rights reserved.
BACKUP AND RESTORE
16
© 2019, nubeGO or its Affiliates. All rights reserved.
Backup Phase
17
© 2019, nubeGO or its Affiliates. All rights reserved.
● Take backups of current systems.
● Store backups in Object Storage Services.
● Describe procedure to restore from backup on Cloud.
● Know which machine template to use; build your own as needed.
● Know how to restore system from backups.
● Know how to switch to new system.
● Know how to configure the deployment.
Backup Options
18
© 2019, nubeGO or its Affiliates. All rights reserved.
FILES
NFS
SMB
VOLUMES iSCSI
TAPES ISCSI Virtual Tape Library
Hybrid Backup
19
© 2019, nubeGO or its Affiliates. All rights reserved.
https://d1.awsstatic.com/product-marketing/AWS%20Backup/product-page-diagram_aws_backup_hybrid.e5132f9c5fd6cd0299187d8d41147a3f7964d09a.png
Restore Phase
20
● Retrieve backups from Object Storage.
● Bring up required infrastructure.
● Cloud instances with prepared machine images, Load Balancers, etc.
● Use infrastructure as code to automate deployment of core networking.
● Restore system from backup.
● Switch over to the new system.
● Adjust DNS records to point to the cloud systems.
© 2019, nubeGO or its Affiliates. All rights reserved.
In case of disaster…
RECOVERY STRATEGIES
21
© 2019, nubeGO or its Affiliates. All rights reserved.
Pilot Light
22
Web
Server
App
Server
Database
Server
DB
Web
Server
App
Server
Database
Server
Data mirroring/replication
Not running
User or system
Amazon Route 53
hosted zone
DB
secondary
© 2019, nubeGO or its Affiliates. All rights reserved.
Pilot Light
Web
Server
App
Server
Web
Server
App
Server
Data mirroring/replication
Starts in minutes
User or system
Amazon Route 53
hosted zone
DB DB
secondary
© 2019, nubeGO or its Affiliates. All rights reserved.
Pilot Light
24
© 2019, nubeGO or its Affiliates. All rights reserved.
Very cost-effective (uses fewer 24/7 resources)Advantage
Preparation
Phase
Set up instances to replicate or mirror data.
Ensure that you have all supporting custom software
packages available in the cloud.
Create and maintain Machine Images of key servers where
fast recovery is required.
Regularly run these servers, test them, and apply any
software updates and configuration changes.
Consider automating the provisioning of cloud resources.
Pilot Light
25
Automatically bring up resources around the replicated core data set.
© 2019, nubeGO or its Affiliates. All rights reserved.
Scale the system as needed to handle current production traffic.
Switch over to the new system.
● Adjust DNS records to point to the cloud
In case of
disaster…
Objectives
RTO: As long as it takes to detect need for DR and
automatically scale up replacement system.
RPO: Depends on replication type.
Fully Working Low-Capacity Standby
© 2019, nubeGO or its Affiliates. All rights reserved.
Web
server
App
server
Database
Server
Web
Server
App
Server
Low
capacity
User or system
Amazon Route 53
hosted zone
Web
server
App
server
Auto Scaling
Auto Scaling
Database
Server
Database
Server
Data mirroring/replication
DB DB
secondary
Fully Working Low-Capacity Standby
27
© 2019, nubeGO or its Affiliates. All rights reserved.
Web
server
App
Server
Web
server
App
server
Low
capacity
User or system
Amazon Route 53
hosted zone
Web
server
App
Server
Web
server
App
server
Database
Server
Database
Server
Data mirroring/replication
DB DB
secondary
Fully Working Low-Capacity Standby
28
© 2019, nubeGO or its Affiliates. All rights reserved.
Advantages
Can take some production traffic at any time.
Cost savings (IT footprint smaller than full DR)
Preparation
Similar to Pilot Light
All necessary components running 24/7,
but not scaled for production traffic
Best practice: continuous testing
● “Tickle” a statistical subset of production traffic to DR site.
Fully Working Low-Capacity Standby
29
© 2019, nubeGO or its Affiliates. All rights reserved.
Immediately fail over most critical production load.
Adjust DNS records to point to the cloud.
(Auto) Scale the system further to handle all production load.
Objectives
RTO: For critical load: as long as it takes to fail over; for all other load,
as long as it takes to scale further.
RPO: Depends on replication type.
In case of
disaster...
Web
server
App
server
Web
server
App
server
Full
capacity
User or system
Amazon Route 53
hosted zone
Web
server
App
server
Web
server
App
server
Database
Server
Database
Server
Database
Server
Data mirroring/replication
DB DB
secondary
Multi-Site Active-Active
© 2019, nubeGO or its Affiliates. All rights reserved.
Multi-Site Active-Active
31
© 2019, nubeGO or its Affiliates. All rights reserved.
Preparation
Advantages
Objectives
In case of
disaster…
At any moment, can take all production load.
Similar to low-capacity standby.
Fully scaling in/out with production load.
Immediately fail over all production load.
RTO: As long as it takes to fail over.
RPO: Depends on replication type.
▪ Lower priority use cases
▪ Solutions: Object Storage,
Archive Storage
▪ Meeting lower RTO and
RPO requirements
▪ Core services
▪ Scale cloud resources in
response to a DR event
▪ Solutions that require
RTO and RPO in minutes
▪ Business-critical services
▪ Auto-failover of
your
environment in
the cloud to a
running
duplicate
Cost: $ Cost: $$ Cost: $$$ Cost: $$$$
© 2019, nubeGO or its Affiliates. All rights reserved.
Recovery Strategies
SCENARIO TIME!
33
© 2019, nubeGO or its Affiliates. All rights reserved.
CASE SCENARIO #1
34
© 2019, nubeGO or its Affiliates. All rights reserved.
Bob is in charge of defining the best DR strategy for a hybrid architecture and he did the setup based on the
following requirements:
We need to have a
RTO of 60 minutes
Our backups are
stored in the cloud
and are taken daily
The RPO has to be
less than 8 hours, and
we need to be able to
build a new
environment quick
Our Application runs
in the Cloud but our
database still in our
local datacenter
RTO = 1h
RPO = 8hs
35
© 2019, nubeGO or its Affiliates. All rights reserved.
CAN BE ACHIEVED?
CASE SCENARIO
CASE SCENARIO
36
© 2019, nubeGO or its Affiliates. All rights reserved.
DATABASE
RTO/RPO
CODE
ON PREM
There is no certainty they can achieve 1h RTO and
8hs RPO
Backups run daily. So RPO can’t be 8hs.
How much time would take to build a new DB and
import the data?
How much time it would take you to copy from the
cloud to your on-prem DB?
APP: Is your app code full of variables to cope with a
change of endpoints?.
INFRA: Is your infrastructure treated as code? Can
you deploy a new environment within tens of
minutes?
TIPS TIME!
37
© 2019, nubeGO or its Affiliates. All rights reserved.
MTTR: How to reduce it?
38
© 2019, nubeGO or its Affiliates. All rights reserved.
START SIMPLE CHECK FOR SOFTWARE
LICENSING ISSUES
PRACTICE
“GAME DAY” EXERCISES
Practice Failure Through Chaos Engineering
39
© 2019, nubeGO or its Affiliates. All rights reserved.
Chaos engineering can answer critical questions...
Did a system fail
in the way
you expected?
Were you able
to fix it promptly?
What did
the monitoring
data look like?
How long did it take
for the service to be
available again?
Train the entire team on different roles and functions
40
© 2019, nubeGO or its Affiliates. All rights reserved.
Intensive cross-training across
your engineering team
reducing MTTR
Avoid burning out
tech specialists by fostering
a general understanding
of how to resolve issues
when an incident arises!
Follow up on incidents to uncover root causes
41
© 2019, nubeGO or its Affiliates. All rights reserved.
What happened? How did it happen? Root causes?
How can we
prevent it?
Reducing
MTTR
Calibrate your alerting tools
42
© 2019, nubeGO or its Affiliates. All rights reserved.
Programmatic allerting will help you
sort through large amounts of information about your systems
and develop clear plans for how to use the data
Mean time to detection
(MTTD)
How long it takes you to detect the occurrence
of a customer-impacting issue in your system.
The earlier you catch the problem, the sooner you can reduce your MTTR!
Create runbooks
43
© 2019, nubeGO or its Affiliates. All rights reserved.
Incident response
procedures
Monitoring and
alerting practices
Creating runbooks
Focus on the correct fix—not the fastest one
44
© 2019, nubeGO or its Affiliates. All rights reserved.
When trying to reduce MTTR...
urge to take
shortcuts
focusing on the
correct fix
45© 2019, nubeGO or its Affiliates. All rights reserved.
Get up to 10% of your AWS bill on
AWS credits
to spend on your infrastructure!
nubego.io/aws-credits
Q/A
Wrap Up!
46© 2019, nubeGO or its Affiliates. All rights reserved.
fernando@nubego.io
fernandohonig
47
We’re Hiring!
© 2019, nubeGO or its Affiliates. All rights reserved.
https://nubego.io
info@nubego.io
careers@nubego.io
+44 (0) 20 8123 5282

More Related Content

What's hot

AWS Webcast - Business Continuity in the AWS Cloud
AWS Webcast - Business Continuity in the AWS CloudAWS Webcast - Business Continuity in the AWS Cloud
AWS Webcast - Business Continuity in the AWS CloudAmazon Web Services
 
AWS Summit Barcelona - Backup & Disaster Recovery
AWS Summit Barcelona - Backup & Disaster RecoveryAWS Summit Barcelona - Backup & Disaster Recovery
AWS Summit Barcelona - Backup & Disaster RecoveryAmazon Web Services
 
Data Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudData Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudAlluxio, Inc.
 
Arcserve Portfolio Technical Overview
Arcserve Portfolio Technical OverviewArcserve Portfolio Technical Overview
Arcserve Portfolio Technical OverviewGina Tragos
 
NICTA, Disaster Recovery Using OpenStack
NICTA, Disaster Recovery Using OpenStackNICTA, Disaster Recovery Using OpenStack
NICTA, Disaster Recovery Using OpenStacklaurabeckcahoon
 
Business Track Session 1: The Power of udp
Business Track Session 1: The Power of udpBusiness Track Session 1: The Power of udp
Business Track Session 1: The Power of udparcserve data protection
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop MapR Technologies
 
Building an Apache Hadoop data application
Building an Apache Hadoop data applicationBuilding an Apache Hadoop data application
Building an Apache Hadoop data applicationtomwhite
 
Commercial track 2_UDP Solution Selling Made Simple
Commercial track 2_UDP Solution Selling Made SimpleCommercial track 2_UDP Solution Selling Made Simple
Commercial track 2_UDP Solution Selling Made Simplearcserve data protection
 
Disaster Recovery using Amazon Web Services - Webinar
Disaster Recovery using Amazon Web Services - WebinarDisaster Recovery using Amazon Web Services - Webinar
Disaster Recovery using Amazon Web Services - WebinarAmazon Web Services
 
New Integration Options with Postgres Enterprise Manager 8.0
New Integration Options with Postgres Enterprise Manager 8.0New Integration Options with Postgres Enterprise Manager 8.0
New Integration Options with Postgres Enterprise Manager 8.0EDB
 
Next Generation Data Protection Architecture
Next Generation Data Protection Architecture Next Generation Data Protection Architecture
Next Generation Data Protection Architecture Gina Tragos
 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019Timothy Spann
 
Hadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudHadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudCloudera, Inc.
 
Journey Through the Cloud: Disaster Recovery
Journey Through the Cloud: Disaster RecoveryJourney Through the Cloud: Disaster Recovery
Journey Through the Cloud: Disaster RecoveryAmazon Web Services
 
S104872 spectrum nas-one-day-jburg-v1809e
S104872 spectrum nas-one-day-jburg-v1809eS104872 spectrum nas-one-day-jburg-v1809e
S104872 spectrum nas-one-day-jburg-v1809eTony Pearson
 
S100294 bcdr-seven-tiers-orlando-v1804a
S100294 bcdr-seven-tiers-orlando-v1804aS100294 bcdr-seven-tiers-orlando-v1804a
S100294 bcdr-seven-tiers-orlando-v1804aTony Pearson
 
Disaster Recovery with AWS - Simone Brunozzi - AWS Summit 2012 Australia -
Disaster Recovery with AWS - Simone Brunozzi - AWS Summit 2012 Australia - Disaster Recovery with AWS - Simone Brunozzi - AWS Summit 2012 Australia -
Disaster Recovery with AWS - Simone Brunozzi - AWS Summit 2012 Australia - Amazon Web Services
 
S100297 ilm-archive-orlando-v1804c
S100297 ilm-archive-orlando-v1804cS100297 ilm-archive-orlando-v1804c
S100297 ilm-archive-orlando-v1804cTony Pearson
 
CA ARCserve Solution Overview
CA ARCserve Solution OverviewCA ARCserve Solution Overview
CA ARCserve Solution OverviewMotty Ben Atia
 

What's hot (20)

AWS Webcast - Business Continuity in the AWS Cloud
AWS Webcast - Business Continuity in the AWS CloudAWS Webcast - Business Continuity in the AWS Cloud
AWS Webcast - Business Continuity in the AWS Cloud
 
AWS Summit Barcelona - Backup & Disaster Recovery
AWS Summit Barcelona - Backup & Disaster RecoveryAWS Summit Barcelona - Backup & Disaster Recovery
AWS Summit Barcelona - Backup & Disaster Recovery
 
Data Orchestration Platform for the Cloud
Data Orchestration Platform for the CloudData Orchestration Platform for the Cloud
Data Orchestration Platform for the Cloud
 
Arcserve Portfolio Technical Overview
Arcserve Portfolio Technical OverviewArcserve Portfolio Technical Overview
Arcserve Portfolio Technical Overview
 
NICTA, Disaster Recovery Using OpenStack
NICTA, Disaster Recovery Using OpenStackNICTA, Disaster Recovery Using OpenStack
NICTA, Disaster Recovery Using OpenStack
 
Business Track Session 1: The Power of udp
Business Track Session 1: The Power of udpBusiness Track Session 1: The Power of udp
Business Track Session 1: The Power of udp
 
The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop The TCO Calculator - Estimate the True Cost of Hadoop
The TCO Calculator - Estimate the True Cost of Hadoop
 
Building an Apache Hadoop data application
Building an Apache Hadoop data applicationBuilding an Apache Hadoop data application
Building an Apache Hadoop data application
 
Commercial track 2_UDP Solution Selling Made Simple
Commercial track 2_UDP Solution Selling Made SimpleCommercial track 2_UDP Solution Selling Made Simple
Commercial track 2_UDP Solution Selling Made Simple
 
Disaster Recovery using Amazon Web Services - Webinar
Disaster Recovery using Amazon Web Services - WebinarDisaster Recovery using Amazon Web Services - Webinar
Disaster Recovery using Amazon Web Services - Webinar
 
New Integration Options with Postgres Enterprise Manager 8.0
New Integration Options with Postgres Enterprise Manager 8.0New Integration Options with Postgres Enterprise Manager 8.0
New Integration Options with Postgres Enterprise Manager 8.0
 
Next Generation Data Protection Architecture
Next Generation Data Protection Architecture Next Generation Data Protection Architecture
Next Generation Data Protection Architecture
 
The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019The Edge to AI Deep Dive Barcelona Meetup March 2019
The Edge to AI Deep Dive Barcelona Meetup March 2019
 
Hadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in CloudHadoop World 2011: Hadoop as a Service in Cloud
Hadoop World 2011: Hadoop as a Service in Cloud
 
Journey Through the Cloud: Disaster Recovery
Journey Through the Cloud: Disaster RecoveryJourney Through the Cloud: Disaster Recovery
Journey Through the Cloud: Disaster Recovery
 
S104872 spectrum nas-one-day-jburg-v1809e
S104872 spectrum nas-one-day-jburg-v1809eS104872 spectrum nas-one-day-jburg-v1809e
S104872 spectrum nas-one-day-jburg-v1809e
 
S100294 bcdr-seven-tiers-orlando-v1804a
S100294 bcdr-seven-tiers-orlando-v1804aS100294 bcdr-seven-tiers-orlando-v1804a
S100294 bcdr-seven-tiers-orlando-v1804a
 
Disaster Recovery with AWS - Simone Brunozzi - AWS Summit 2012 Australia -
Disaster Recovery with AWS - Simone Brunozzi - AWS Summit 2012 Australia - Disaster Recovery with AWS - Simone Brunozzi - AWS Summit 2012 Australia -
Disaster Recovery with AWS - Simone Brunozzi - AWS Summit 2012 Australia -
 
S100297 ilm-archive-orlando-v1804c
S100297 ilm-archive-orlando-v1804cS100297 ilm-archive-orlando-v1804c
S100297 ilm-archive-orlando-v1804c
 
CA ARCserve Solution Overview
CA ARCserve Solution OverviewCA ARCserve Solution Overview
CA ARCserve Solution Overview
 

Similar to OSDC 2019 | RTO & RPO – Best Practices in Hybrid Architectures by Fernando Honig

Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfAmazon Web Services
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfAmazon Web Services
 
Implement a Modern Flash-to-Flash-to-Cloud Backup Environment (DEV205-S) - AW...
Implement a Modern Flash-to-Flash-to-Cloud Backup Environment (DEV205-S) - AW...Implement a Modern Flash-to-Flash-to-Cloud Backup Environment (DEV205-S) - AW...
Implement a Modern Flash-to-Flash-to-Cloud Backup Environment (DEV205-S) - AW...Amazon Web Services
 
AWS Summit Singapore 2019 | Build a Unified Cloud
AWS Summit Singapore 2019 | Build a Unified CloudAWS Summit Singapore 2019 | Build a Unified Cloud
AWS Summit Singapore 2019 | Build a Unified CloudAWS Summits
 
Oracle MAA Best Practices - Applications Considerations
Oracle MAA Best Practices - Applications ConsiderationsOracle MAA Best Practices - Applications Considerations
Oracle MAA Best Practices - Applications ConsiderationsMarkus Michalewicz
 
Catching the Software Defined Storage Wave
Catching the Software Defined Storage WaveCatching the Software Defined Storage Wave
Catching the Software Defined Storage WaveDataCore Software
 
Accelerate Design and Development of Data Projects Using AWS
Accelerate Design and Development of Data Projects Using AWSAccelerate Design and Development of Data Projects Using AWS
Accelerate Design and Development of Data Projects Using AWSDelphix
 
2. migration, disaster recovery and business continuity in the cloud
2. migration, disaster recovery and business continuity in the cloud2. migration, disaster recovery and business continuity in the cloud
2. migration, disaster recovery and business continuity in the cloudReham Maher El-Safarini
 
Splunk und Multi-Cloud
Splunk und Multi-CloudSplunk und Multi-Cloud
Splunk und Multi-CloudSplunk
 
Splunk and Multicloud
Splunk and MulticloudSplunk and Multicloud
Splunk and MulticloudSplunk
 
Splunk and Multicloud
Splunk and Multicloud Splunk and Multicloud
Splunk and Multicloud Splunk
 
ADV Slides: Strategies for Transitioning to a Cloud-First Enterprise
ADV Slides: Strategies for Transitioning to a Cloud-First EnterpriseADV Slides: Strategies for Transitioning to a Cloud-First Enterprise
ADV Slides: Strategies for Transitioning to a Cloud-First EnterpriseDATAVERSITY
 
Migrate and Modernize Your Database
Migrate and Modernize Your DatabaseMigrate and Modernize Your Database
Migrate and Modernize Your DatabaseAmazon Web Services
 
Hashicorp Corporate Pitch Deck Stenio_v2
Hashicorp Corporate Pitch Deck Stenio_v2 Hashicorp Corporate Pitch Deck Stenio_v2
Hashicorp Corporate Pitch Deck Stenio_v2 Stenio Ferreira
 
Make Your Disaster Recovery Plan Resilient & Cost-Effective (ENT213-S) - AWS ...
Make Your Disaster Recovery Plan Resilient & Cost-Effective (ENT213-S) - AWS ...Make Your Disaster Recovery Plan Resilient & Cost-Effective (ENT213-S) - AWS ...
Make Your Disaster Recovery Plan Resilient & Cost-Effective (ENT213-S) - AWS ...Amazon Web Services
 
ProfitBricks-white-paper-Disaster-Recovery-US
ProfitBricks-white-paper-Disaster-Recovery-USProfitBricks-white-paper-Disaster-Recovery-US
ProfitBricks-white-paper-Disaster-Recovery-USMudia Akpobome
 
How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...
How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...
How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...Amazon Web Services
 
Cloud Composer workshop at Airflow Summit 2023.pdf
Cloud Composer workshop at Airflow Summit 2023.pdfCloud Composer workshop at Airflow Summit 2023.pdf
Cloud Composer workshop at Airflow Summit 2023.pdfLeah Cole
 

Similar to OSDC 2019 | RTO & RPO – Best Practices in Hybrid Architectures by Fernando Honig (20)

Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdf
 
Breaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdfBreaking the Monolith road to containers.pdf
Breaking the Monolith road to containers.pdf
 
Implement a Modern Flash-to-Flash-to-Cloud Backup Environment (DEV205-S) - AW...
Implement a Modern Flash-to-Flash-to-Cloud Backup Environment (DEV205-S) - AW...Implement a Modern Flash-to-Flash-to-Cloud Backup Environment (DEV205-S) - AW...
Implement a Modern Flash-to-Flash-to-Cloud Backup Environment (DEV205-S) - AW...
 
Build_a_Unified_Cloud
Build_a_Unified_CloudBuild_a_Unified_Cloud
Build_a_Unified_Cloud
 
AWS Summit Singapore 2019 | Build a Unified Cloud
AWS Summit Singapore 2019 | Build a Unified CloudAWS Summit Singapore 2019 | Build a Unified Cloud
AWS Summit Singapore 2019 | Build a Unified Cloud
 
Oracle MAA Best Practices - Applications Considerations
Oracle MAA Best Practices - Applications ConsiderationsOracle MAA Best Practices - Applications Considerations
Oracle MAA Best Practices - Applications Considerations
 
Catching the Software Defined Storage Wave
Catching the Software Defined Storage WaveCatching the Software Defined Storage Wave
Catching the Software Defined Storage Wave
 
Accelerate Design and Development of Data Projects Using AWS
Accelerate Design and Development of Data Projects Using AWSAccelerate Design and Development of Data Projects Using AWS
Accelerate Design and Development of Data Projects Using AWS
 
2. migration, disaster recovery and business continuity in the cloud
2. migration, disaster recovery and business continuity in the cloud2. migration, disaster recovery and business continuity in the cloud
2. migration, disaster recovery and business continuity in the cloud
 
Splunk und Multi-Cloud
Splunk und Multi-CloudSplunk und Multi-Cloud
Splunk und Multi-Cloud
 
Splunk and Multicloud
Splunk and MulticloudSplunk and Multicloud
Splunk and Multicloud
 
Splunk and Multicloud
Splunk and Multicloud Splunk and Multicloud
Splunk and Multicloud
 
ADV Slides: Strategies for Transitioning to a Cloud-First Enterprise
ADV Slides: Strategies for Transitioning to a Cloud-First EnterpriseADV Slides: Strategies for Transitioning to a Cloud-First Enterprise
ADV Slides: Strategies for Transitioning to a Cloud-First Enterprise
 
Migrate and Modernize Your Database
Migrate and Modernize Your DatabaseMigrate and Modernize Your Database
Migrate and Modernize Your Database
 
Hashicorp Corporate Pitch Deck Stenio_v2
Hashicorp Corporate Pitch Deck Stenio_v2 Hashicorp Corporate Pitch Deck Stenio_v2
Hashicorp Corporate Pitch Deck Stenio_v2
 
Make Your Disaster Recovery Plan Resilient & Cost-Effective (ENT213-S) - AWS ...
Make Your Disaster Recovery Plan Resilient & Cost-Effective (ENT213-S) - AWS ...Make Your Disaster Recovery Plan Resilient & Cost-Effective (ENT213-S) - AWS ...
Make Your Disaster Recovery Plan Resilient & Cost-Effective (ENT213-S) - AWS ...
 
Build-a-Unified-Cloud
Build-a-Unified-CloudBuild-a-Unified-Cloud
Build-a-Unified-Cloud
 
ProfitBricks-white-paper-Disaster-Recovery-US
ProfitBricks-white-paper-Disaster-Recovery-USProfitBricks-white-paper-Disaster-Recovery-US
ProfitBricks-white-paper-Disaster-Recovery-US
 
How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...
How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...
How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...
 
Cloud Composer workshop at Airflow Summit 2023.pdf
Cloud Composer workshop at Airflow Summit 2023.pdfCloud Composer workshop at Airflow Summit 2023.pdf
Cloud Composer workshop at Airflow Summit 2023.pdf
 

Recently uploaded

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 

Recently uploaded (20)

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 

OSDC 2019 | RTO & RPO – Best Practices in Hybrid Architectures by Fernando Honig

  • 1. RTO & RPO Best Practices in Hybrid Architectures OSDC - May 2019 Fernando Hönig fernando@nubego.io fernandohonig
  • 2. RTO vs RPO What is this? 2© 2019, nubeGO or its Affiliates. All rights reserved.
  • 3. RTO vs RPO Apples vs Oranges It calculates how quickly you need to recover. It is the target time you set for the recovery. 3 It is focused on data and your company’s loss tolerance in relation to your data. It is determined by looking at the time between data backups and the amount of data that could be lost in between backups. © 2019, nubeGO or its Affiliates. All rights reserved. RTO RPO
  • 4. RPO and RTO 4 © 2019, nubeGO or its Affiliates. All rights reserved. The business can recover from losing (at most) the last 12 hours of data. The application can be unavailable for a maximum of 1 hour.
  • 5. AVAILABILITY CONCEPTS 5 © 2019, nubeGO or its Affiliates. All rights reserved. HIGH Availability Backup Disaster Recovery Minimizing downtime for your application Making your data safe Getting your applications and data back after a major disaster
  • 6. What could go wrong? 6 © 2019, nubeGO or its Affiliates. All rights reserved. HOW DO WE FIX IT? QUICKLY? Small events Large Scale events Colossal events Instance restart failure Application deployment failure Availability Zones down Unavailable services Unavailable region Infrastructure destruction by error
  • 7. Latest Events 7 © 2019, nubeGO or its Affiliates. All rights reserved. Small events Large Scale events Colossal events Instance restart failure Application deployment failure GitHub S3 AZ Unavailable UK’s Petition System Unavailable Data Unavailable - Failed Backups GitLab Database Destruction
  • 8. DISASTER PLANNING 8 © 2019, nubeGO or its Affiliates. All rights reserved. RECOVERY OPTIONS
  • 9. DISASTER PLANNING 9 © 2019, nubeGO or its Affiliates. All rights reserved.
  • 10. Operating System 10 © 2019, nubeGO or its Affiliates. All rights reserved. Machine Images Snapshot to other regions Share it across your accounts/projects UserData Create scripts to execute during start up Patch / Update your OS and stay up to date
  • 11. Storage 11 © 2019, nubeGO or its Affiliates. All rights reserved. Object storage Replicate to other regions Enable versioning Block storage Create point-in-time Snapshots Copy snapshots across regions and accounts
  • 12. Machine Images and Snapshots 12 © 2019, nubeGO or its Affiliates. All rights reserved. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/images/ami_lifecycle.png
  • 13. Networking 13 © 2019, nubeGO or its Affiliates. All rights reserved. DNS Enable health checks Enable Latency records Load balancing Failover options Health Checks with HTTP Code VPC Extend your network to the Cloud Direct Connect Enable fast and consistent replication/backup options from on-premise environments to the cloud
  • 14. Databases 14 © 2019, nubeGO or its Affiliates. All rights reserved. Snapshot data and save it in a separate region Combine Read Replicas with Multi-AZ to build a resilient disaster recovery strategy RDS
  • 15. Infrastructure 15 Use templates to quickly deploy collections of resources as needed Treat it as code, test it and deploy new changes with your application releases IAC © 2019, nubeGO or its Affiliates. All rights reserved.
  • 16. BACKUP AND RESTORE 16 © 2019, nubeGO or its Affiliates. All rights reserved.
  • 17. Backup Phase 17 © 2019, nubeGO or its Affiliates. All rights reserved. ● Take backups of current systems. ● Store backups in Object Storage Services. ● Describe procedure to restore from backup on Cloud. ● Know which machine template to use; build your own as needed. ● Know how to restore system from backups. ● Know how to switch to new system. ● Know how to configure the deployment.
  • 18. Backup Options 18 © 2019, nubeGO or its Affiliates. All rights reserved. FILES NFS SMB VOLUMES iSCSI TAPES ISCSI Virtual Tape Library
  • 19. Hybrid Backup 19 © 2019, nubeGO or its Affiliates. All rights reserved. https://d1.awsstatic.com/product-marketing/AWS%20Backup/product-page-diagram_aws_backup_hybrid.e5132f9c5fd6cd0299187d8d41147a3f7964d09a.png
  • 20. Restore Phase 20 ● Retrieve backups from Object Storage. ● Bring up required infrastructure. ● Cloud instances with prepared machine images, Load Balancers, etc. ● Use infrastructure as code to automate deployment of core networking. ● Restore system from backup. ● Switch over to the new system. ● Adjust DNS records to point to the cloud systems. © 2019, nubeGO or its Affiliates. All rights reserved. In case of disaster…
  • 21. RECOVERY STRATEGIES 21 © 2019, nubeGO or its Affiliates. All rights reserved.
  • 22. Pilot Light 22 Web Server App Server Database Server DB Web Server App Server Database Server Data mirroring/replication Not running User or system Amazon Route 53 hosted zone DB secondary © 2019, nubeGO or its Affiliates. All rights reserved.
  • 23. Pilot Light Web Server App Server Web Server App Server Data mirroring/replication Starts in minutes User or system Amazon Route 53 hosted zone DB DB secondary © 2019, nubeGO or its Affiliates. All rights reserved.
  • 24. Pilot Light 24 © 2019, nubeGO or its Affiliates. All rights reserved. Very cost-effective (uses fewer 24/7 resources)Advantage Preparation Phase Set up instances to replicate or mirror data. Ensure that you have all supporting custom software packages available in the cloud. Create and maintain Machine Images of key servers where fast recovery is required. Regularly run these servers, test them, and apply any software updates and configuration changes. Consider automating the provisioning of cloud resources.
  • 25. Pilot Light 25 Automatically bring up resources around the replicated core data set. © 2019, nubeGO or its Affiliates. All rights reserved. Scale the system as needed to handle current production traffic. Switch over to the new system. ● Adjust DNS records to point to the cloud In case of disaster… Objectives RTO: As long as it takes to detect need for DR and automatically scale up replacement system. RPO: Depends on replication type.
  • 26. Fully Working Low-Capacity Standby © 2019, nubeGO or its Affiliates. All rights reserved. Web server App server Database Server Web Server App Server Low capacity User or system Amazon Route 53 hosted zone Web server App server Auto Scaling Auto Scaling Database Server Database Server Data mirroring/replication DB DB secondary
  • 27. Fully Working Low-Capacity Standby 27 © 2019, nubeGO or its Affiliates. All rights reserved. Web server App Server Web server App server Low capacity User or system Amazon Route 53 hosted zone Web server App Server Web server App server Database Server Database Server Data mirroring/replication DB DB secondary
  • 28. Fully Working Low-Capacity Standby 28 © 2019, nubeGO or its Affiliates. All rights reserved. Advantages Can take some production traffic at any time. Cost savings (IT footprint smaller than full DR) Preparation Similar to Pilot Light All necessary components running 24/7, but not scaled for production traffic Best practice: continuous testing ● “Tickle” a statistical subset of production traffic to DR site.
  • 29. Fully Working Low-Capacity Standby 29 © 2019, nubeGO or its Affiliates. All rights reserved. Immediately fail over most critical production load. Adjust DNS records to point to the cloud. (Auto) Scale the system further to handle all production load. Objectives RTO: For critical load: as long as it takes to fail over; for all other load, as long as it takes to scale further. RPO: Depends on replication type. In case of disaster...
  • 30. Web server App server Web server App server Full capacity User or system Amazon Route 53 hosted zone Web server App server Web server App server Database Server Database Server Database Server Data mirroring/replication DB DB secondary Multi-Site Active-Active © 2019, nubeGO or its Affiliates. All rights reserved.
  • 31. Multi-Site Active-Active 31 © 2019, nubeGO or its Affiliates. All rights reserved. Preparation Advantages Objectives In case of disaster… At any moment, can take all production load. Similar to low-capacity standby. Fully scaling in/out with production load. Immediately fail over all production load. RTO: As long as it takes to fail over. RPO: Depends on replication type.
  • 32. ▪ Lower priority use cases ▪ Solutions: Object Storage, Archive Storage ▪ Meeting lower RTO and RPO requirements ▪ Core services ▪ Scale cloud resources in response to a DR event ▪ Solutions that require RTO and RPO in minutes ▪ Business-critical services ▪ Auto-failover of your environment in the cloud to a running duplicate Cost: $ Cost: $$ Cost: $$$ Cost: $$$$ © 2019, nubeGO or its Affiliates. All rights reserved. Recovery Strategies
  • 33. SCENARIO TIME! 33 © 2019, nubeGO or its Affiliates. All rights reserved.
  • 34. CASE SCENARIO #1 34 © 2019, nubeGO or its Affiliates. All rights reserved. Bob is in charge of defining the best DR strategy for a hybrid architecture and he did the setup based on the following requirements: We need to have a RTO of 60 minutes Our backups are stored in the cloud and are taken daily The RPO has to be less than 8 hours, and we need to be able to build a new environment quick Our Application runs in the Cloud but our database still in our local datacenter
  • 35. RTO = 1h RPO = 8hs 35 © 2019, nubeGO or its Affiliates. All rights reserved. CAN BE ACHIEVED? CASE SCENARIO
  • 36. CASE SCENARIO 36 © 2019, nubeGO or its Affiliates. All rights reserved. DATABASE RTO/RPO CODE ON PREM There is no certainty they can achieve 1h RTO and 8hs RPO Backups run daily. So RPO can’t be 8hs. How much time would take to build a new DB and import the data? How much time it would take you to copy from the cloud to your on-prem DB? APP: Is your app code full of variables to cope with a change of endpoints?. INFRA: Is your infrastructure treated as code? Can you deploy a new environment within tens of minutes?
  • 37. TIPS TIME! 37 © 2019, nubeGO or its Affiliates. All rights reserved.
  • 38. MTTR: How to reduce it? 38 © 2019, nubeGO or its Affiliates. All rights reserved. START SIMPLE CHECK FOR SOFTWARE LICENSING ISSUES PRACTICE “GAME DAY” EXERCISES
  • 39. Practice Failure Through Chaos Engineering 39 © 2019, nubeGO or its Affiliates. All rights reserved. Chaos engineering can answer critical questions... Did a system fail in the way you expected? Were you able to fix it promptly? What did the monitoring data look like? How long did it take for the service to be available again?
  • 40. Train the entire team on different roles and functions 40 © 2019, nubeGO or its Affiliates. All rights reserved. Intensive cross-training across your engineering team reducing MTTR Avoid burning out tech specialists by fostering a general understanding of how to resolve issues when an incident arises!
  • 41. Follow up on incidents to uncover root causes 41 © 2019, nubeGO or its Affiliates. All rights reserved. What happened? How did it happen? Root causes? How can we prevent it? Reducing MTTR
  • 42. Calibrate your alerting tools 42 © 2019, nubeGO or its Affiliates. All rights reserved. Programmatic allerting will help you sort through large amounts of information about your systems and develop clear plans for how to use the data Mean time to detection (MTTD) How long it takes you to detect the occurrence of a customer-impacting issue in your system. The earlier you catch the problem, the sooner you can reduce your MTTR!
  • 43. Create runbooks 43 © 2019, nubeGO or its Affiliates. All rights reserved. Incident response procedures Monitoring and alerting practices Creating runbooks
  • 44. Focus on the correct fix—not the fastest one 44 © 2019, nubeGO or its Affiliates. All rights reserved. When trying to reduce MTTR... urge to take shortcuts focusing on the correct fix
  • 45. 45© 2019, nubeGO or its Affiliates. All rights reserved. Get up to 10% of your AWS bill on AWS credits to spend on your infrastructure! nubego.io/aws-credits
  • 46. Q/A Wrap Up! 46© 2019, nubeGO or its Affiliates. All rights reserved. fernando@nubego.io fernandohonig
  • 47. 47 We’re Hiring! © 2019, nubeGO or its Affiliates. All rights reserved. https://nubego.io info@nubego.io careers@nubego.io +44 (0) 20 8123 5282