SlideShare a Scribd company logo
1 of 61
Apache HBase Table Snapshots
Matteo Bertozzi | Cloudera | Software Engineer / HBase Committer
Jonathan Hsieh | Cloudera | Software Engineer / HBase Committer
Jesse Yates | Salesforce.com | Software Engineer / HBase Committer
June 13, 2013
HBaseCon 2013
Outline
• Intro and Use Cases
• Usage Instructions
• Internals
• Snapshot Layout
• Snapshot Restoration
• Online Snapshots
• Conclusion
HBase Table Snapshots
Snapshot is a collection of
metadata required to
reconstitute the data near
a particular point in time
HBaseCon 2013 6/13/20133
HBase Table Snapshots
• An inexpensive way to freeze state of a
table
• A mechanism that helps backup data to
in the cluster or to a remote cluster
• Recover from user error
• Bootstrap Replication
HBaseCon 2013 6/13/20134
Old: HBase-Supported Batch Backups
• Export / Dist CP / Import
• 3 batch MR jobs
• Several extra copies of data
• High latency (hours)
• Impacts existing low-latency
workloads
• Copy Table
• 1 MR Job
• Single copy of data
• Incremental table copies
• High Latency (hours)
• Impacts existing workloads
Export
MR Job
Import
MR Job
Dist CP
MR Job
Copy Table
MR Job
HBaseCon 2013 6/13/20135
Upcoming: HDFS Snapshots (or DistCP backup)
• Take an hdfs snapshot of all the
files in the underlying HBase’s
data directory.
• Hfiles, hlogs, and other
metadata.
• Snapshots all tables in Hbase
• Cannot Clone tables
• “Restore As”
• Targeted for Hadoop 2.1 /
Hadoop 3.0 DistCP
HLog Append
Flush
Compact
Restart
Recover
HBaseCon 2013 6/13/20136
New: HBase Snapshot-based Backups
• Snapshot, then Export
• 1 MR Job
• Single copy of data
• Little impact on low-latency
workloads
• Export is like distcp directly
from hfds
• No incremental snapshot
copy
HBaseCon 2013
Export
Snapshot
6/13/20137
Export
• Like distcp for a snapshot manifest
• Copy data files without going through HBase’s “front door”
Export
HBaseCon 2013 6/13/20138
Recover from User Error
• How do we recover from user error?
Recovery Time
time
User Error:
drop ‘table’
Service is
restored, major data
loss
Service is down!
Panic! Black magic!
HBaseCon 2013 6/13/20139
Recovering from User Mistakes: Table Snapshots
• Snapshot the state of a table at a certain moment in time
• Restore it or Clone it later, creating a new read write table
• Export it to another cluster with minimal impact on HBase
time
User Error:
drop ‘table’
Service restored, Minor
data loss. Carry on.
Periodic
snapshot
Service is down!
Keep calm!
restore
Periodic
snapshot
HBaseCon 2013 6/13/201310
Usage
What an Admin needs to know
Configuration
• Simple hbase-site.xml configuration
<property>
<name>hbase.snapshot.enabled</name>
<value>true</value>
</property>
• Enabled by default in 0.95+
• Requires user to enable in 0.94.6.1+.
HBaseCon 2013 6/13/201312
Usage: Shell Commands
• snapshot ‘table’, ‘snapshot’
• Table can be offline or online
• list_snapshot [<regex>]
• clone_snapshot ‘snapshot’, ‘dsttable’
• restore_snapshot ‘snapshot’
• delete_snapshot ‘snapshot’
HBaseCon 2013 6/13/201313
Usage: Web UI
HBaseCon 2013 6/13/201314
Usage: Web UI
HBaseCon 2013 6/13/201315
Export: Usage
• Copy “MySnapshot” to a remote HDFS
• $ hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
MySnapshot -copy-to hdfs:///srv2:8082/hbase -mappers 16
• With permission change on the copy
• $ hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot
MySnapshot -copy-to hdfs:///srv2:8082/hbase -chuser MyUser -chgroup MyGroup
-chmod 700 -mappers 16
HBaseCon 2013 6/13/201316
Debugging and Info
• Dump a snapshot manifest
• Writes to standard out
• Usage
• $ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot
test-snapshot
• $ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot
test-snapshot -files
HBaseCon 2013 6/13/201317
Metrics
• Histograms of operation completion
• snapshotTime
• cloneTime
• restoreTime
• Includes ‘extended’ metrics
• Std deviation
• Min/max
HBaseCon 2013 6/13/201318
Table Snapshot Internals
Internals
• HBase Table HDFS Layout
• Snapshot HDFS layout
• Offline Snapshots
• Restore and Clone Snapshot
• Online Snapshots
HBaseCon 2013 6/13/201320
Primer: HBase Table Layout in HDFS
• HRegions map directly to a directory structure
with table name, encoded region
name, column family and hfiles.
• In HDFS:
/hbase/Table/<enc R1>/cf/<hfile f11>
/hbase/Table/<enc R1>/cf/<hfile f12>
/hbase/Table/<enc R2>/cf/<hfile f21>
/hbase/Table/<enc R2>/cf/<hfile f22>
/hbase/Table/<enc R3>/cf/<hfile f31>
/hbase/Table/<enc R3>/cf/<hfile f32>
Table
F11 F21 F31
R1 R2 R3
6/13/2013HBaseCon 201321
Table Snapshots in the File System
• A Snapshot manifest contains references to files in the original
table.
./.hbase-snapshots
Table
F11 F21 F31
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201322
Table Snapshots in the File System
• A Snapshot manifest contains references to files in the original
table.
• Each snapshot is stored in the hbase/.hbase-snapshots dir.
./.hbase-snapshots
Table
F11 F21 F31
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201323
Offline Snapshots
• Disable table, then create Snapshot Manifest
• Created in temporary dir to guarantee snapshot creation
atomicity
• Includes
• Snapshot Metadata
• Table Metadata/Schema (.tableinfo)
• References to original HFiles
• Master-only file system operation
HBaseCon 2013 6/13/201324
HFile Life Cycle
• Splits and Compactions remove hfiles
• What happens to references to these files?
./.hbase-snapshots
Table
F11 F21 F31
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201325
HFile Life Cycle
• Splits and Compactions remove hfiles
• What happens to references to these files?
./.hbase-snapshots
Table
F11 F21 F31
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201326
HFile Life Cycle
• Splits and Compactions remove hfiles
• What happens to references to these files?
./.hbase-snapshots
Table
F11 F21
F31
+32
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
No more
Hfile??
HBaseCon 2013 6/13/201327
HFile Archiver
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
Table files
F31
HBaseCon 2013 6/13/201328
• We archive old HFiles from compactions (HBASE-5547)
HFile Archiver
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
F31
+32
Table files
F31
HBaseCon 2013 6/13/201329
• We archive old HFiles from compactions (HBASE-5547)
• Files stored in hbase/.archive
HFile Archiver
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
F31
+32
Table files
F31
• We archive old HFiles from compactions (HBASE-5547)
• Files stored in hbase/.archive
• HFileCleaner ensures HFiles’ data remains available
HBaseCon 2013 6/13/201330
Restore Snapshot Internals
Restore Operations
• Restore table
• Rollback table to specific state
• Clone from snapshot (Restore As)
• Create new read-write table from snapshot
• There can be multiple replicas of a snapshot
• Export snapshot
• Send snapshot and all its data to another cluster
HBaseCon 2013 6/13/201332
Clone: Creating table from a Snapshot
• Convert snapshot manifest info into a Table.
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
Table files
Clone
R1 R2 R3
F31
HBaseCon 2013 6/13/201333
Clone: Creating table from a Snapshot
• Convert snapshot manifest info into a Table.
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
Table files
Clone
R1 R2 R3
F31
HBaseCon 2013 6/13/201334
Clone: Creating table from a Snapshot
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
F31
+32
Table files
F31
Clone
R1 R2 R3
• Convert snapshot manifest info into a Table.
• HFileLinks (HBASE-6610) to mimic unix open file descriptor semantics
HBaseCon 2013 6/13/201335
Restore: Rollback to an old state
• Rollback the existing table to snapshot state
• Restores original schema if altered
• Snapshots current table, just in case
• Minimal overhead
• Smarter delete table & clone snapshot
• Handles creating/deleting regions
• Restore META
HBaseCon 2013 6/13/201336
Restore illustrated
./.hbase-snapshots
./.archive
TableSnapshot manifest
R1 R2 R3
Table files
F31
Table
F11 F21
R1 R2 R3
F31
+32 F41
R4
• Rollback “Table” to the “TableSnapshot” state
HBaseCon 2013 6/13/201337
Restore illustrated
./.hbase-snapshots
./.archive
TableSnapshot manifest
R1 R2 R3
Table files
F31
Table
F11 F21
R1 R2 R3
F31
+32 F41
R4
• Region “R4” is not present in the snapshot
• “R4” will be removed from “Table”, files moved to “.archive”
HBaseCon 2013 6/13/201338
Restore illustrated
./.hbase-snapshots
./.archive
TableSnapshot manifest
R1 R2 R3
Table files
F31
Table
F11 F21
R1 R2 R3
F31
+32
F41
• New files not present in the snapshots are moved to the archive
HBaseCon 2013 6/13/201339
Restore illustrated
./.hbase-snapshots
./.archive
TableSnapshot manifest
R1 R2 R3
Table files
F31
Table
F11 F21
R1 R2 R3
F41F3+
32
• New files not present in the snapshots are moved to the archive
• HFileLinks are created to point to old files.
HBaseCon 2013 6/13/201340
Restore failures
• The table to restore is disabled
• META and HDFS operations may fail (network issue, server down, …)
• hbck can’t repair an incomplete restore...
• Restore again!
HBaseCon 2013 6/13/201341
Export Snapshot
• Copy a full snapshot to another cluster
• All required HFiles, and Metadata
• Lots of options
• Fancy dist-cp
• Must resolve HFileLinks
• Faster than CopyTable or table export+import!
• Minimal impact on running cluster
HBaseCon 2013 6/13/201342
Online Snapshots
Online snapshots
• Take a snapshot without making the table
unavailable
• No need to disable the table
• Continue accepting reads and writes from
clients
• Challenges
• Coordinating Region Servers
• Data is in memory
• Consistency
HBaseCon 2013 6/13/201344
Offline vs Online Snapshots
Offline Online
mastermaster
RS1 RS2 RS3 RS4
verify
Snapshot
region
subprocedure
Write
manifest
per region
verify
Write
manifest
per region
6/13/2013HBaseCon 201345
Online Snapshots
• Each Region can have data in memstore and hlog, not yet Hfile
• Snapshot is missing in memory data!
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
TableSnapshot manifest
R1 R2 R3
Table files
F31
mem mem mem
HBaseCon 2013 6/13/201346
Online Snapshots
• Flush so that all in memory data written in an Hfile
• Then add to snapshot manifest
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
Table files
F31
F13 F23 F33
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201347
Online Snapshots
• Flush so that all in memory data in an Hfile
• Then add to snapshot manifest
./.hbase-snapshots
./.archive
Table
F11 F21
R1 R2 R3
Table files
F31
F13 F23 F33
TableSnapshot manifest
R1 R2 R3
HBaseCon 2013 6/13/201348
Consistency
• Offline Snapshots
• Fully consistent snapshot
• Online Flush Snapshot
• “CopyTable” level consistency with a much smaller window.
• Time bounded by slowest region server and region flush
HBaseCon 2013 6/13/201349
Online Snapshots and Causal consistency
• Causal consistency would only allow A, AB, or neither A nor B.
• B and Not A is currently possible
Table
F11 F21
R1 R2 R3
F31
TableSnapshot manifest
R1 R2 R3
Master RS1 RS2 Client
mem mem
Flush SS
F13
HBaseCon 2013 6/13/201350
Online Snapshots and Causal consistency
• Causal consistency would only allow A, AB, or neither A nor B.
• B and Not A is currently possible
Table
F11 F21
R1 R2 R3
F31
TableSnapshot manifest
R1 R2 R3
Master RS1 RS2 Client
mem mem
Put A …
… then
Put B
F13
mem
HBaseCon 2013 6/13/201351
Online Snapshots and Causal consistency
• Causal consistency would allow A, AB, or neither A nor B.
• B and Not A is possible with Flush Snapshots
Table
F11 F21
R1 R2 R3
F31
TableSnapshot manifest
R1 R2 R3
Master RS1 RS2 Client
mem
F23
F13
Flush SS
Put B is
in but
Put A is
not!
F33
HBaseCon 2013 6/13/201352
Online snapshot attempts can fail
• If involved RS’s fail, the snapshots attempt will fail.
• Needs a way to prevent other table metadata operations
• Table Metadata Locks (0.95+)
• Avoid many snapshot failures conflicts(Ex: Online schema, splits)
• Failed attempt will report errors -- user must retry.
• o.a.h.hbase.snapshot.HBaseSnapshotException
• o.a.h.hbase.snapshot.CorruptedSnapshotException
HBaseCon 2013 6/13/201353
Development Notes
How we collaborated, built, and tested
Table Snapshots Development
• Developed in a Branch off of trunk
merged and in 0.95 and trunk.
• Feature is too big to include as a single
patch
• Does not destabilize trunk
• Does not slow time-based release
trains
• Later Backported to 0.94.6.1
src
branch
Reintegrate
into trunk
sync
HBaseCon 2013 6/13/201355
System testing with Jenkins
• Concurrently load data while taking snapshots
• Inject compactions, Kill RS’s, Meta RS, Master
• Create snapshot clones of the snapshots
• Inject Compactions, Kill RS’s, META Rs, Master
HBaseCon 2013 6/13/201356
Future Work:
• Alternative semantics and implementations
• Log Roll Snapshot (HBASE-7291)
• Store logs and replay on restore
• Faster for snapshot, slower and more complicated for restore.
• Timestamp Snapshot (HBASE-6866)
• All updates before ts in snapshot, all after not in snapshot
• Longer pause before snapshot taken
• Globally-Consistent Snapshot (HBASE-6867)
• global write lock for all regions nodes until snapshot complete.
• Expensive
• Repair tools
• Manual repairs necessary (hbck does not support yet)
HBaseCon 2013 6/13/201357
Conclusions
Feature Summary by Version
Apache 0.92.x
Apache <0.94.6.1
Apache 0.94.6.1+ Apache 0.95.0
Apache 0.96.0
Copy Table Copy Table Copy Table
Import / Export Import / Export Import / Export
Offline snapshots Offline snapshots
Flush Online
Snapshot
Flush Online
Snapshot
Table Locks
HBaseCon 2013 6/13/201359
Key Contributors
• Jesse Yates (Salesforce.com)
• HFileArchiver, Offline Snapshot, first draft online
• Matteo Bertozzi (Cloudera)
• HFileLink, Restore, clone, Testing, 0.94 backport
• Jonathan Hsieh (Cloudera)
• Online Snapshots revamp, Testing, Branch Sheppard
• Ted Yu (HortonWorks)
• Reviews
• Enis Soztutar (HortonWorks)
• Table Locks on Snapshots
HBaseCon 2013 6/13/201360
Thanks! Questions?
Matteo Bertozzi
@th30z
matteo.bertozzi@cloudera.com
Jonathan Hsieh
@jmhsieh
jon@cloudera.com
Jesse Yates
@jesse_yates
jesse.k.yates@gmail.com
HBaseCon 2013 6/13/201361

More Related Content

What's hot

ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataDataWorks Summit
 
Diving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction LogDiving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction LogDatabricks
 
Strongly Consistent Global Indexes for Apache Phoenix
Strongly Consistent Global Indexes for Apache PhoenixStrongly Consistent Global Indexes for Apache Phoenix
Strongly Consistent Global Indexes for Apache PhoenixYugabyteDB
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingDataWorks Summit
 
The State of HBase Replication
The State of HBase ReplicationThe State of HBase Replication
The State of HBase ReplicationHBaseCon
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiFlink Forward
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)DataStax Academy
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufVerverica
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkBo Yang
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopHortonworks
 
HBase replication
HBase replicationHBase replication
HBase replicationwchevreuil
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimDatabricks
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path ForwardAlluxio, Inc.
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.Taras Matyashovsky
 

What's hot (20)

ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Diving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction LogDiving into Delta Lake: Unpacking the Transaction Log
Diving into Delta Lake: Unpacking the Transaction Log
 
Strongly Consistent Global Indexes for Apache Phoenix
Strongly Consistent Global Indexes for Apache PhoenixStrongly Consistent Global Indexes for Apache Phoenix
Strongly Consistent Global Indexes for Apache Phoenix
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
The State of HBase Replication
The State of HBase ReplicationThe State of HBase Replication
The State of HBase Replication
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
 
Apache flink
Apache flinkApache flink
Apache flink
 
Node Labels in YARN
Node Labels in YARNNode Labels in YARN
Node Labels in YARN
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in SparkSpark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
Spark Shuffle Deep Dive (Explained In Depth) - How Shuffle Works in Spark
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
 
HBase replication
HBase replicationHBase replication
HBase replication
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.From cache to in-memory data grid. Introduction to Hazelcast.
From cache to in-memory data grid. Introduction to Hazelcast.
 

Viewers also liked

HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestCloudera, Inc.
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseCloudera, Inc.
 
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseCloudera, Inc.
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache HiveHBaseCon
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the BasicsHBaseCon
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshotsenissoz
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsMichael Stack
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...Yahoo Developer Network
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineData Con LA
 
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata Storagehive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata StorageDataWorks Summit/Hadoop Summit
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Sematext Group, Inc.
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop User Group
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Hortonworks
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...Cloudera, Inc.
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBaseCon
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics Cloudera, Inc.
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseCloudera, Inc.
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterCloudera, Inc.
 

Viewers also liked (20)

HBase Snapshots
HBase SnapshotsHBase Snapshots
HBase Snapshots
 
HBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at PinterestHBaseCon 2013: Apache HBase Operations at Pinterest
HBaseCon 2013: Apache HBase Operations at Pinterest
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
 
HBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBaseHBaseCon 2013: Integration of Apache Hive and HBase
HBaseCon 2013: Integration of Apache Hive and HBase
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache Hive
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
 
Mapreduce over snapshots
Mapreduce over snapshotsMapreduce over snapshots
Mapreduce over snapshots
 
HBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbmsHBaseConEast2016: Splice machine open source rdbms
HBaseConEast2016: Splice machine open source rdbms
 
Mar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBaseMar 2012 HUG: Hive with HBase
Mar 2012 HUG: Hive with HBase
 
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
October 2016 HUG: Architecture of an Open Source RDBMS powered by HBase and ...
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata Storagehive HBase Metastore - Improving Hive with a Big Data Metadata Storage
hive HBase Metastore - Improving Hive with a Big Data Metadata Storage
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
 
Hadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User GroupHadoop, Hbase and Hive- Bay area Hadoop User Group
Hadoop, Hbase and Hive- Bay area Hadoop User Group
 
Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks Non-Stop Hadoop for Hortonworks
Non-Stop Hadoop for Hortonworks
 
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
HBaseCon 2012 | Living Data: Applying Adaptable Schemas to HBase - Aaron Kimb...
 
HBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region ReplicasHBase Read High Availability Using Timeline-Consistent Region Replicas
HBase Read High Availability Using Timeline-Consistent Region Replicas
 
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
HBaseCon 2013: Apache Hadoop and Apache HBase for Real-Time Video Analytics
 
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBaseHBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
HBaseCon 2013: Project Valta - A Resource Management Layer over Apache HBase
 
HBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart MeterHBaseCon 2013: Being Smarter Than the Smart Meter
HBaseCon 2013: Being Smarter Than the Smart Meter
 

Similar to HBaseCon 2013: Apache HBase Table Snapshots

Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0enissoz
 
Meet HBase 2.0
Meet HBase 2.0Meet HBase 2.0
Meet HBase 2.0enissoz
 
[Altibase] 13 backup and recovery
[Altibase] 13 backup and recovery[Altibase] 13 backup and recovery
[Altibase] 13 backup and recoveryaltistory
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 ReleaseNick Dimiduk
 
HBase Backups
HBase BackupsHBase Backups
HBase BackupsHBaseCon
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBaseCon
 
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...Michael Stack
 
Hbase Backups: Backups in the Enterprise
Hbase Backups: Backups in the EnterpriseHbase Backups: Backups in the Enterprise
Hbase Backups: Backups in the EnterpriseSalesforce Engineering
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかToshihiro Suzuki
 
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxMarco Gralike
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0DataWorks Summit
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...Michael Stack
 
Less17 flashback tb3
Less17 flashback tb3Less17 flashback tb3
Less17 flashback tb3Imran Ali
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkMichael Stack
 
Oracle 12c New Features_RMAN_slides
Oracle 12c New Features_RMAN_slidesOracle 12c New Features_RMAN_slides
Oracle 12c New Features_RMAN_slidesSaiful
 
Take your database source code and data under control
Take your database source code and data under controlTake your database source code and data under control
Take your database source code and data under controlMarcin Przepiórowski
 
RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)Gustavo Rene Antunez
 

Similar to HBaseCon 2013: Apache HBase Table Snapshots (20)

Meet Apache HBase - 2.0
Meet Apache HBase - 2.0Meet Apache HBase - 2.0
Meet Apache HBase - 2.0
 
Meet hbase 2.0
Meet hbase 2.0Meet hbase 2.0
Meet hbase 2.0
 
Meet HBase 2.0
Meet HBase 2.0Meet HBase 2.0
Meet HBase 2.0
 
[Altibase] 13 backup and recovery
[Altibase] 13 backup and recovery[Altibase] 13 backup and recovery
[Altibase] 13 backup and recovery
 
Apache HBase 1.0 Release
Apache HBase 1.0 ReleaseApache HBase 1.0 Release
Apache HBase 1.0 Release
 
HBase Backups
HBase BackupsHBase Backups
HBase Backups
 
HBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial IndustryHBase at Bloomberg: High Availability Needs for the Financial Industry
HBase at Bloomberg: High Availability Needs for the Financial Industry
 
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
HBaseConAsia2018 Track2-7: A real-time backup solution for HBase with zero HB...
 
Les 12 fl_db
Les 12 fl_dbLes 12 fl_db
Les 12 fl_db
 
Hbase Backups: Backups in the Enterprise
Hbase Backups: Backups in the EnterpriseHbase Backups: Backups in the Enterprise
Hbase Backups: Backups in the Enterprise
 
Les 11 fl2
Les 11 fl2Les 11 fl2
Les 11 fl2
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
 
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
 
Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0Meet HBase 2.0 and Phoenix-5.0
Meet HBase 2.0 and Phoenix-5.0
 
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
HBaseConAsia2018 Track1-5: Improving HBase reliability at PInterest with geo ...
 
Less17 flashback tb3
Less17 flashback tb3Less17 flashback tb3
Less17 flashback tb3
 
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and SparkHBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
HBaseConAsia2018 Track2-4: HTAP DB-System: AsparaDB HBase, Phoenix, and Spark
 
Oracle 12c New Features_RMAN_slides
Oracle 12c New Features_RMAN_slidesOracle 12c New Features_RMAN_slides
Oracle 12c New Features_RMAN_slides
 
Take your database source code and data under control
Take your database source code and data under controlTake your database source code and data under control
Take your database source code and data under control
 
RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)RMAN in 12c: The Next Generation (PPT)
RMAN in 12c: The Next Generation (PPT)
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 

HBaseCon 2013: Apache HBase Table Snapshots

  • 1. Apache HBase Table Snapshots Matteo Bertozzi | Cloudera | Software Engineer / HBase Committer Jonathan Hsieh | Cloudera | Software Engineer / HBase Committer Jesse Yates | Salesforce.com | Software Engineer / HBase Committer June 13, 2013 HBaseCon 2013
  • 2. Outline • Intro and Use Cases • Usage Instructions • Internals • Snapshot Layout • Snapshot Restoration • Online Snapshots • Conclusion
  • 3. HBase Table Snapshots Snapshot is a collection of metadata required to reconstitute the data near a particular point in time HBaseCon 2013 6/13/20133
  • 4. HBase Table Snapshots • An inexpensive way to freeze state of a table • A mechanism that helps backup data to in the cluster or to a remote cluster • Recover from user error • Bootstrap Replication HBaseCon 2013 6/13/20134
  • 5. Old: HBase-Supported Batch Backups • Export / Dist CP / Import • 3 batch MR jobs • Several extra copies of data • High latency (hours) • Impacts existing low-latency workloads • Copy Table • 1 MR Job • Single copy of data • Incremental table copies • High Latency (hours) • Impacts existing workloads Export MR Job Import MR Job Dist CP MR Job Copy Table MR Job HBaseCon 2013 6/13/20135
  • 6. Upcoming: HDFS Snapshots (or DistCP backup) • Take an hdfs snapshot of all the files in the underlying HBase’s data directory. • Hfiles, hlogs, and other metadata. • Snapshots all tables in Hbase • Cannot Clone tables • “Restore As” • Targeted for Hadoop 2.1 / Hadoop 3.0 DistCP HLog Append Flush Compact Restart Recover HBaseCon 2013 6/13/20136
  • 7. New: HBase Snapshot-based Backups • Snapshot, then Export • 1 MR Job • Single copy of data • Little impact on low-latency workloads • Export is like distcp directly from hfds • No incremental snapshot copy HBaseCon 2013 Export Snapshot 6/13/20137
  • 8. Export • Like distcp for a snapshot manifest • Copy data files without going through HBase’s “front door” Export HBaseCon 2013 6/13/20138
  • 9. Recover from User Error • How do we recover from user error? Recovery Time time User Error: drop ‘table’ Service is restored, major data loss Service is down! Panic! Black magic! HBaseCon 2013 6/13/20139
  • 10. Recovering from User Mistakes: Table Snapshots • Snapshot the state of a table at a certain moment in time • Restore it or Clone it later, creating a new read write table • Export it to another cluster with minimal impact on HBase time User Error: drop ‘table’ Service restored, Minor data loss. Carry on. Periodic snapshot Service is down! Keep calm! restore Periodic snapshot HBaseCon 2013 6/13/201310
  • 11. Usage What an Admin needs to know
  • 12. Configuration • Simple hbase-site.xml configuration <property> <name>hbase.snapshot.enabled</name> <value>true</value> </property> • Enabled by default in 0.95+ • Requires user to enable in 0.94.6.1+. HBaseCon 2013 6/13/201312
  • 13. Usage: Shell Commands • snapshot ‘table’, ‘snapshot’ • Table can be offline or online • list_snapshot [<regex>] • clone_snapshot ‘snapshot’, ‘dsttable’ • restore_snapshot ‘snapshot’ • delete_snapshot ‘snapshot’ HBaseCon 2013 6/13/201313
  • 14. Usage: Web UI HBaseCon 2013 6/13/201314
  • 15. Usage: Web UI HBaseCon 2013 6/13/201315
  • 16. Export: Usage • Copy “MySnapshot” to a remote HDFS • $ hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to hdfs:///srv2:8082/hbase -mappers 16 • With permission change on the copy • $ hbase class org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot MySnapshot -copy-to hdfs:///srv2:8082/hbase -chuser MyUser -chgroup MyGroup -chmod 700 -mappers 16 HBaseCon 2013 6/13/201316
  • 17. Debugging and Info • Dump a snapshot manifest • Writes to standard out • Usage • $ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot test-snapshot • $ hbase org.apache.hadoop.hbase.snapshot.SnapshotInfo -snapshot test-snapshot -files HBaseCon 2013 6/13/201317
  • 18. Metrics • Histograms of operation completion • snapshotTime • cloneTime • restoreTime • Includes ‘extended’ metrics • Std deviation • Min/max HBaseCon 2013 6/13/201318
  • 20. Internals • HBase Table HDFS Layout • Snapshot HDFS layout • Offline Snapshots • Restore and Clone Snapshot • Online Snapshots HBaseCon 2013 6/13/201320
  • 21. Primer: HBase Table Layout in HDFS • HRegions map directly to a directory structure with table name, encoded region name, column family and hfiles. • In HDFS: /hbase/Table/<enc R1>/cf/<hfile f11> /hbase/Table/<enc R1>/cf/<hfile f12> /hbase/Table/<enc R2>/cf/<hfile f21> /hbase/Table/<enc R2>/cf/<hfile f22> /hbase/Table/<enc R3>/cf/<hfile f31> /hbase/Table/<enc R3>/cf/<hfile f32> Table F11 F21 F31 R1 R2 R3 6/13/2013HBaseCon 201321
  • 22. Table Snapshots in the File System • A Snapshot manifest contains references to files in the original table. ./.hbase-snapshots Table F11 F21 F31 R1 R2 R3 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201322
  • 23. Table Snapshots in the File System • A Snapshot manifest contains references to files in the original table. • Each snapshot is stored in the hbase/.hbase-snapshots dir. ./.hbase-snapshots Table F11 F21 F31 R1 R2 R3 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201323
  • 24. Offline Snapshots • Disable table, then create Snapshot Manifest • Created in temporary dir to guarantee snapshot creation atomicity • Includes • Snapshot Metadata • Table Metadata/Schema (.tableinfo) • References to original HFiles • Master-only file system operation HBaseCon 2013 6/13/201324
  • 25. HFile Life Cycle • Splits and Compactions remove hfiles • What happens to references to these files? ./.hbase-snapshots Table F11 F21 F31 R1 R2 R3 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201325
  • 26. HFile Life Cycle • Splits and Compactions remove hfiles • What happens to references to these files? ./.hbase-snapshots Table F11 F21 F31 R1 R2 R3 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201326
  • 27. HFile Life Cycle • Splits and Compactions remove hfiles • What happens to references to these files? ./.hbase-snapshots Table F11 F21 F31 +32 R1 R2 R3 TableSnapshot manifest R1 R2 R3 No more Hfile?? HBaseCon 2013 6/13/201327
  • 28. HFile Archiver ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 Table files F31 HBaseCon 2013 6/13/201328 • We archive old HFiles from compactions (HBASE-5547)
  • 29. HFile Archiver ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 F31 +32 Table files F31 HBaseCon 2013 6/13/201329 • We archive old HFiles from compactions (HBASE-5547) • Files stored in hbase/.archive
  • 30. HFile Archiver ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 F31 +32 Table files F31 • We archive old HFiles from compactions (HBASE-5547) • Files stored in hbase/.archive • HFileCleaner ensures HFiles’ data remains available HBaseCon 2013 6/13/201330
  • 32. Restore Operations • Restore table • Rollback table to specific state • Clone from snapshot (Restore As) • Create new read-write table from snapshot • There can be multiple replicas of a snapshot • Export snapshot • Send snapshot and all its data to another cluster HBaseCon 2013 6/13/201332
  • 33. Clone: Creating table from a Snapshot • Convert snapshot manifest info into a Table. ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 Table files Clone R1 R2 R3 F31 HBaseCon 2013 6/13/201333
  • 34. Clone: Creating table from a Snapshot • Convert snapshot manifest info into a Table. ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 Table files Clone R1 R2 R3 F31 HBaseCon 2013 6/13/201334
  • 35. Clone: Creating table from a Snapshot ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 F31 +32 Table files F31 Clone R1 R2 R3 • Convert snapshot manifest info into a Table. • HFileLinks (HBASE-6610) to mimic unix open file descriptor semantics HBaseCon 2013 6/13/201335
  • 36. Restore: Rollback to an old state • Rollback the existing table to snapshot state • Restores original schema if altered • Snapshots current table, just in case • Minimal overhead • Smarter delete table & clone snapshot • Handles creating/deleting regions • Restore META HBaseCon 2013 6/13/201336
  • 37. Restore illustrated ./.hbase-snapshots ./.archive TableSnapshot manifest R1 R2 R3 Table files F31 Table F11 F21 R1 R2 R3 F31 +32 F41 R4 • Rollback “Table” to the “TableSnapshot” state HBaseCon 2013 6/13/201337
  • 38. Restore illustrated ./.hbase-snapshots ./.archive TableSnapshot manifest R1 R2 R3 Table files F31 Table F11 F21 R1 R2 R3 F31 +32 F41 R4 • Region “R4” is not present in the snapshot • “R4” will be removed from “Table”, files moved to “.archive” HBaseCon 2013 6/13/201338
  • 39. Restore illustrated ./.hbase-snapshots ./.archive TableSnapshot manifest R1 R2 R3 Table files F31 Table F11 F21 R1 R2 R3 F31 +32 F41 • New files not present in the snapshots are moved to the archive HBaseCon 2013 6/13/201339
  • 40. Restore illustrated ./.hbase-snapshots ./.archive TableSnapshot manifest R1 R2 R3 Table files F31 Table F11 F21 R1 R2 R3 F41F3+ 32 • New files not present in the snapshots are moved to the archive • HFileLinks are created to point to old files. HBaseCon 2013 6/13/201340
  • 41. Restore failures • The table to restore is disabled • META and HDFS operations may fail (network issue, server down, …) • hbck can’t repair an incomplete restore... • Restore again! HBaseCon 2013 6/13/201341
  • 42. Export Snapshot • Copy a full snapshot to another cluster • All required HFiles, and Metadata • Lots of options • Fancy dist-cp • Must resolve HFileLinks • Faster than CopyTable or table export+import! • Minimal impact on running cluster HBaseCon 2013 6/13/201342
  • 44. Online snapshots • Take a snapshot without making the table unavailable • No need to disable the table • Continue accepting reads and writes from clients • Challenges • Coordinating Region Servers • Data is in memory • Consistency HBaseCon 2013 6/13/201344
  • 45. Offline vs Online Snapshots Offline Online mastermaster RS1 RS2 RS3 RS4 verify Snapshot region subprocedure Write manifest per region verify Write manifest per region 6/13/2013HBaseCon 201345
  • 46. Online Snapshots • Each Region can have data in memstore and hlog, not yet Hfile • Snapshot is missing in memory data! ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 TableSnapshot manifest R1 R2 R3 Table files F31 mem mem mem HBaseCon 2013 6/13/201346
  • 47. Online Snapshots • Flush so that all in memory data written in an Hfile • Then add to snapshot manifest ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 Table files F31 F13 F23 F33 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201347
  • 48. Online Snapshots • Flush so that all in memory data in an Hfile • Then add to snapshot manifest ./.hbase-snapshots ./.archive Table F11 F21 R1 R2 R3 Table files F31 F13 F23 F33 TableSnapshot manifest R1 R2 R3 HBaseCon 2013 6/13/201348
  • 49. Consistency • Offline Snapshots • Fully consistent snapshot • Online Flush Snapshot • “CopyTable” level consistency with a much smaller window. • Time bounded by slowest region server and region flush HBaseCon 2013 6/13/201349
  • 50. Online Snapshots and Causal consistency • Causal consistency would only allow A, AB, or neither A nor B. • B and Not A is currently possible Table F11 F21 R1 R2 R3 F31 TableSnapshot manifest R1 R2 R3 Master RS1 RS2 Client mem mem Flush SS F13 HBaseCon 2013 6/13/201350
  • 51. Online Snapshots and Causal consistency • Causal consistency would only allow A, AB, or neither A nor B. • B and Not A is currently possible Table F11 F21 R1 R2 R3 F31 TableSnapshot manifest R1 R2 R3 Master RS1 RS2 Client mem mem Put A … … then Put B F13 mem HBaseCon 2013 6/13/201351
  • 52. Online Snapshots and Causal consistency • Causal consistency would allow A, AB, or neither A nor B. • B and Not A is possible with Flush Snapshots Table F11 F21 R1 R2 R3 F31 TableSnapshot manifest R1 R2 R3 Master RS1 RS2 Client mem F23 F13 Flush SS Put B is in but Put A is not! F33 HBaseCon 2013 6/13/201352
  • 53. Online snapshot attempts can fail • If involved RS’s fail, the snapshots attempt will fail. • Needs a way to prevent other table metadata operations • Table Metadata Locks (0.95+) • Avoid many snapshot failures conflicts(Ex: Online schema, splits) • Failed attempt will report errors -- user must retry. • o.a.h.hbase.snapshot.HBaseSnapshotException • o.a.h.hbase.snapshot.CorruptedSnapshotException HBaseCon 2013 6/13/201353
  • 54. Development Notes How we collaborated, built, and tested
  • 55. Table Snapshots Development • Developed in a Branch off of trunk merged and in 0.95 and trunk. • Feature is too big to include as a single patch • Does not destabilize trunk • Does not slow time-based release trains • Later Backported to 0.94.6.1 src branch Reintegrate into trunk sync HBaseCon 2013 6/13/201355
  • 56. System testing with Jenkins • Concurrently load data while taking snapshots • Inject compactions, Kill RS’s, Meta RS, Master • Create snapshot clones of the snapshots • Inject Compactions, Kill RS’s, META Rs, Master HBaseCon 2013 6/13/201356
  • 57. Future Work: • Alternative semantics and implementations • Log Roll Snapshot (HBASE-7291) • Store logs and replay on restore • Faster for snapshot, slower and more complicated for restore. • Timestamp Snapshot (HBASE-6866) • All updates before ts in snapshot, all after not in snapshot • Longer pause before snapshot taken • Globally-Consistent Snapshot (HBASE-6867) • global write lock for all regions nodes until snapshot complete. • Expensive • Repair tools • Manual repairs necessary (hbck does not support yet) HBaseCon 2013 6/13/201357
  • 59. Feature Summary by Version Apache 0.92.x Apache <0.94.6.1 Apache 0.94.6.1+ Apache 0.95.0 Apache 0.96.0 Copy Table Copy Table Copy Table Import / Export Import / Export Import / Export Offline snapshots Offline snapshots Flush Online Snapshot Flush Online Snapshot Table Locks HBaseCon 2013 6/13/201359
  • 60. Key Contributors • Jesse Yates (Salesforce.com) • HFileArchiver, Offline Snapshot, first draft online • Matteo Bertozzi (Cloudera) • HFileLink, Restore, clone, Testing, 0.94 backport • Jonathan Hsieh (Cloudera) • Online Snapshots revamp, Testing, Branch Sheppard • Ted Yu (HortonWorks) • Reviews • Enis Soztutar (HortonWorks) • Table Locks on Snapshots HBaseCon 2013 6/13/201360
  • 61. Thanks! Questions? Matteo Bertozzi @th30z matteo.bertozzi@cloudera.com Jonathan Hsieh @jmhsieh jon@cloudera.com Jesse Yates @jesse_yates jesse.k.yates@gmail.com HBaseCon 2013 6/13/201361

Editor's Notes

  1. Talk about everything! Don’t glaze over internals
  2. Why does it cause extra latency? What “crushes” the cluster?
  3. Causes move expensive backups. Have a bunch of ‘write optimized files’ – HLogs and have to convert them to ‘read optimized files’ – HFIles. This isn’t a cheap process.
  4. Don’t over sell! Just say what it is.
  5. Add a quick summary of what just talked about BEFORE handoff!!!