SlideShare a Scribd company logo
1 of 39
Download to read offline
Making KVS 10x Scalable
Sadayuki Furuhashi
PLAZMA TD Tech Talk 2018 at Shibuya
リアルタイム配信サーバを10倍最適化した方法
Senior Principal Engineer
@frsyuki
About Sadayuki Furuhashi
A founder of Treasure Data, Inc.
Located in Silicon Valley, USA.
OSS Hacker. Github: @frsyuki
OSS projects I initially designed:
What's CDP KVS
What's CDP KVS?
✓ Streaming data collection
✓ Bulk data collection
CDP KVS
(Today's topic)
✓ On-demand data delivery
What's CDP KVS?
CDP KVS
Source data tables
Data collection
in many ways Audience data set
customers
behaviors
Segment data sets
US, JP, (EU)
Mobile, PC, Devices
Segmentation
workflows
Preprocess
workflows
JavaScript & Mobile Personalization API
REST API call
(using JavaScript SDK)
Returned value
Architecture (Old)
CDP KVS
Server
CDP KVS
Server
AWS JP
AWS US
DynamoDBDAX
DAX: DynamoDB's

write-through cache
Ignite
Ignite: Distributed cache
PrestoBulk write
Random lookup
by browsers / mobile
Challenges to solve
DynamoDB's auto-scaling doesn't scale in time
Request failure!
Load spikes right after noon
Expensive Write Capacity Cost
Request failure!
Already too expensive
Bigger margin = even more expensive
Workload analysis
Read Write
API Random lookup by ID
Bulk write & append

No delete
Temporal Locality
(時間的局所性)
High
(repeating visitors)
Low
(daily or hourly batch)
Spacial Locality
(空間的局所性)
Moderate
(hot & cold data sets)
High
(rewrite data sets by batch)
Number of records: 300 billion (3,000億件)
Size of a record: 10 bytes
Size of total records: 3 TB
Read traffic: 50 requests/sec
Ideas
(A) Alternative distributed KVS (Aerospike)
(B) Storage Hierarchy on KVS
(C) Edit log shipping & Indexed archive
Idea (A)
Alternative Distributed KVS
(A) Alternative Distributed KVS
CDP KVS
Server
CDP KVS
Server
DynamoDBDAX
Ignite
Presto
Presto
Aerospike
node
Aerospike
node
Aerospike
node
Aerospike: Pros & Cons
• Good: Very fast lookup
• In-memory index + Direct IO on SSDs
• Bad: Expensive (hardware & operation)
• Same cost for both cold & hot data

(Large memory overhead for cold data)
• No spacial locality for write

(a batch-write becomes random-writes)
SSD /dev/sdb
Aerospike: Storage Architecture
Aerospike
node
DRAM
hash(k01):

addr 01, size=3
...
hash(k02):

addr 09, size=3
hash(k03):

addr 76, size=3
k01 = v01
k02 = v02
k03 = v03
addr 01:
addr 09:
addr 76:
GET hash(k01)
✓ Primary keys (hash) are always in-memory => Always fast lookup
✓ Data is always on SSD => Always durable
✓ IO on SSD is direct IO (no filesystem cache) => Consistently fast without warm-up
Load index

at startup
(cold-start)
Aerospike: System Architecture
{
k01: v01
k02: v02
k03: v03
k04: v04
k05: v05
}
{
k06: v06
k07: v07
k08: v08
k09: v09
k0a: v0a
}
Aerospike
node
Aerospike
node
Aerospike
node
Aerospike
node
hash(key) = Node ID
Aerospike
node
Aerospike
node
Bulk write 1
Bulk write 2
Batch write => Random write:
No locality, No compression, More overhead
Note: compressing 10-byte data isn't efficient
Aerospike: Cost estimation
• 1 record needs 64 bytes of DRAM for primary key indexing
• Storing 100 billion records (our use case) needs

6.4 TB of DRAM.
• With replication-factor=3, our system needs

19.2TB of DRAM.
• It needs r5.24xlarge × 26 instances on EC2.
• It costs $89,000/month (1-year reserved, convertible).
• Cost structure:
• Very high DRAM cost per GB
• Moderate IOPS cost
• Low storage & CPU cost
• High operational cost
Idea (B)
Storage Hierarchy on KVS
Analyzing a cause of expensive DynamoDB WCU
PK Col1 Col2
Key1 Key1 Col1 Key1 Col2
Key1 Key1 Col1 Key1 Col2
1KB 1KB 1KB 1KB
3.2KB
Consumes 4 Write Capacity
(0.8 WCU wasted)
DynamoDB with record size <<< 1KB
PK Value
Key1 Val1
Key2
Key3
Key4
Val2
Val3
Val4
1KB
=> 1 Write Capacity
=> 1 Write Capacity
=> 1 Write Capacity
=> 1 Write Capacity
10 bytes
4 Write Capacity consumed to store 40 bytes.
99% WCU wasted!
Solution: Optimizing DynamoDB WCU overhead
PK Value
Key1 Val1
Key2
Key3
Key4
Val2
Val3
Val4
10 bytes
10 bytes
10 bytes
10 bytes
=> 1 Write Capacity
=> 1 Write Capacity
=> 1 Write Capacity
=> 1 Write Capacity
PK Value
Part ID {Key1: Val1, Key2: Val2, Key3: Val3, Key4: Val4} 30 bytes => 1 Write Capacity
(Note: expected 5x - 10x compression ratio)
(B) Storage Hierarchy on KVS
{
k01: v01
k03: v03
k06: v06
k08: v08
k0a: v0a
}
{
k02: v02
k04: v04
k05: v05
k07: v07
k09: v09
}
Bulk write 1
Bulk write 2
Compress
& Write
DynamoDBDAX
hash(partition id) = Primary key
Storage Hierarchy on KVS: Pros & Cons
• Good: Very scalable write & storage cost
• Data compression (10x less write & storage cost)
• Fewer number of primary keys

(1 / 100,000 with 100k records in a partition)
• Bad: Complex to understand & use
• More difficult to understand
• Writer (Presto) must partition data by partition id
Data partitioning - write
k01: v01
k03: v03

k06: v06
k08: v08

k0a: v0a
k02: v02

k04: v04
k05: v05
k07: v07

k09: v09
{
k01: v01
k02: v02
k03: v03
k04: v04
k05: v05
k06: v06
k07: v07
k08: v08
k09: v09
k0a: v0a
}
Original data set Partition id=71
Partition=69
Encoded Partition id=71
k01
v01
k03
v03
k06
v06
k08
v08
k0a
v0a
Encode &
Compress
Split 1 Split 2 Split 3
PK
Split
1
Split
2
Split
3
71
69
Store
hash(key) = partition id | split id
Partitioning using Presto

(GROUP BY + array_agg query)
DynamoDB
DynamoDB
Data partitioning - read
PK
Split
1
Split
2
Split
3
71
69
Get
hash(key) = partition id | split id
GET k06
k06 is at:
partition id=71
split id=2
k03
v03
k06
v06
Scan
{
k06: v06
}
DAX
(cache)
Encoded split
Idea (C)
Edit log shipping & Indexed Archive
(C) Edit log shipping & Indexed Archive
Kafka / Kinesis
(+ S3)
Writer API
NodeWriter API
Stream of bulk-write data sets
Indexing &

Storage Node RocksDB
Shard

0, 1
Indexing &

Storage Node RocksDB
Shard

1, 2
Indexing &

Storage Node RocksDB
Shard

2, 3
Indexing &

Storage Node RocksDB
Shard

3, 0
Writer API
NodeReader API
etcd, consul
Shard & node list

management
Write
Read
Bulk-write
S3 for backup
& cold-start
Subscribe
Read
Architecture of RocksDB
Optimization of RocksDB for Redis on Flash, Keren Ouaknine, Oran Agra, and Zvika Guz
Pros & Cons
• Good: Very scalable write & storage cost
• Data compression (10x less write & storage cost)
• Bad: Expensive to implement & operate
• Implementing 3 custom server components

(Stateless: Writer, Reader. Stateful: Storage)
• Operating stateful servers - more work to implement
backup, restoring, monitoring, alerting, etc.
• Others:
• Flexible indexing
• Eventually-consistent
Our decision: Storage Hierarchy on DynamoDB
• Operating stateful servers is harder than you think!
• Note: almost all Treasure Data components are
stateless (or cache or temporary buffer)
• Even if data format becomes complicated, stateless
servers on DynamoDB is better option for us.
Appendix: Split format
PK
Split
1
Split
2
Split
3
71
69
{
k03: v03,
k06: v06,
...
}
msgpack( [
[keyLen 1, keyLen 2, keyLen 3, ...],
"key1key2key3...",
[valLen 1, valLen 2, valLen 3, ...],
"val1val2val3...",
] )
zstd( msgpack( [
,
msgpack( [
[keyLen, keyLen, keyLen, ...],
"keykeykey...",
[valLen, valLen, valLen, ...],
"valvalval...",
] )
,
...
] )
,
bucket 1
bucket 2
bucket NHash table

serialized by MessagePack

compressed by Zstd
Size of a split:
approx. 200KB
(100,000 records)
Nested MessagePack to omit
unnecessary deserialization
when looking up a record
Results
Bulk write performance
6x less total time
8x faster single bulk-write
(which loops 18 times)
DynamoDB Write Capacity Consumption
210
105
921k Write Capacity in 45 minutes.

170 Write Capacity per second average (≒ 170 WCU).
What's Next?
• Implementation => DONE
• Testing => DONE
• Deploying => on-going
• Designing Future Extensions => FUTURE WORK
A possible future work
Read Write
API Random lookup by ID Random write
Temporal Locality
(時間的局所性)
High
(repeating visitors)
Low => High?
Spacial Locality
(空間的局所性)
Moderate
(hot & cold data sets)
High => Low?
Extension for streaming computation

(=> An on-demand read operation updates a value)
Making KVS 10x Scalable

More Related Content

What's hot

Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaDatabricks
 
Embulk - 進化するバルクデータローダ
Embulk - 進化するバルクデータローダEmbulk - 進化するバルクデータローダ
Embulk - 進化するバルクデータローダSadayuki Furuhashi
 
Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules RestructuredDoiT International
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Databricks
 
Distributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraDistributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraSATOSHI TAGOMORI
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScyllaDB
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomyDongmin Yu
 
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to KafkaApache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to KafkaMark Bittmann
 
Flink Forward SF 2017: James Malone - Make The Cloud Work For You
Flink Forward SF 2017: James Malone - Make The Cloud Work For YouFlink Forward SF 2017: James Malone - Make The Cloud Work For You
Flink Forward SF 2017: James Malone - Make The Cloud Work For YouFlink Forward
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Sadayuki Furuhashi
 
Data integration with embulk
Data integration with embulkData integration with embulk
Data integration with embulkTeguh Nugraha
 
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Shirshanka Das
 
Natural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache SparkNatural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache SparkDatabricks
 
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchLet's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchInfluxData
 
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB,  or how we implemented a 10-times faster CassandraSeastar / ScyllaDB,  or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB, or how we implemented a 10-times faster CassandraTzach Livyatan
 
Robust Operations of Kafka Streams
Robust Operations of Kafka StreamsRobust Operations of Kafka Streams
Robust Operations of Kafka Streamsconfluent
 
How cdk and projen benefit to A team
How cdk and projen benefit to A teamHow cdk and projen benefit to A team
How cdk and projen benefit to A teamShu-Jeng Hsieh
 

What's hot (20)

Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital KediaTuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
Tuning Apache Spark for Large-Scale Workloads Gaoxiang Liu and Sital Kedia
 
Embulk - 進化するバルクデータローダ
Embulk - 進化するバルクデータローダEmbulk - 進化するバルクデータローダ
Embulk - 進化するバルクデータローダ
 
Terraform Modules Restructured
Terraform Modules RestructuredTerraform Modules Restructured
Terraform Modules Restructured
 
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
Building Data Product Based on Apache Spark at Airbnb with Jingwei Lu and Liy...
 
Distributed Logging Architecture in Container Era
Distributed Logging Architecture in Container EraDistributed Logging Architecture in Container Era
Distributed Logging Architecture in Container Era
 
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by ScyllaScylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
Scylla Summit 2016: Analytics Show Time - Spark and Presto Powered by Scylla
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
Handling not so big data
Handling not so big dataHandling not so big data
Handling not so big data
 
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to KafkaApache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
Apache Kafka DC Meetup: Replicating DB Binary Logs to Kafka
 
Flink Forward SF 2017: James Malone - Make The Cloud Work For You
Flink Forward SF 2017: James Malone - Make The Cloud Work For YouFlink Forward SF 2017: James Malone - Make The Cloud Work For You
Flink Forward SF 2017: James Malone - Make The Cloud Work For You
 
Data Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
 
Data integration with embulk
Data integration with embulkData integration with embulk
Data integration with embulk
 
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
Apache Gobblin: Bridging Batch and Streaming Data Integration. Big Data Meetu...
 
Natural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache SparkNatural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache Spark
 
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and ElasticsearchLet's Compare: A Benchmark review of InfluxDB and Elasticsearch
Let's Compare: A Benchmark review of InfluxDB and Elasticsearch
 
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB,  or how we implemented a 10-times faster CassandraSeastar / ScyllaDB,  or how we implemented a 10-times faster Cassandra
Seastar / ScyllaDB, or how we implemented a 10-times faster Cassandra
 
Robust Operations of Kafka Streams
Robust Operations of Kafka StreamsRobust Operations of Kafka Streams
Robust Operations of Kafka Streams
 
How cdk and projen benefit to A team
How cdk and projen benefit to A teamHow cdk and projen benefit to A team
How cdk and projen benefit to A team
 

Similar to Making KVS 10x Scalable

Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle CoherenceBen Stopford
 
Developing with Cassandra
Developing with CassandraDeveloping with Cassandra
Developing with CassandraSperasoft
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftAmazon Web Services
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...confluent
 
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red_Hat_Storage
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftAmazon Web Services
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudCeph Community
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudPatrick McGarry
 
(GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices
(GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices(GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices
(GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ DevicesAmazon Web Services
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기Amazon Web Services Korea
 
AWS APAC Webinar Week - AWS MySQL Relational Database Services Best Practices...
AWS APAC Webinar Week - AWS MySQL Relational Database Services Best Practices...AWS APAC Webinar Week - AWS MySQL Relational Database Services Best Practices...
AWS APAC Webinar Week - AWS MySQL Relational Database Services Best Practices...Amazon Web Services
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Amazon Web Services
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)RichardWarburton
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonJAXLondon2014
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseAmazon Web Services
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Uwe Printz
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...ScyllaDB
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDBI Goo Lee
 

Similar to Making KVS 10x Scalable (20)

Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
Developing with Cassandra
Developing with CassandraDeveloping with Cassandra
Developing with Cassandra
 
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon RedshiftBest Practices for Migrating Your Data Warehouse to Amazon Redshift
Best Practices for Migrating Your Data Warehouse to Amazon Redshift
 
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
Building Scalable and Extendable Data Pipeline for Call of Duty Games (Yarosl...
 
EVCache & Moneta (GoSF)
EVCache & Moneta (GoSF)EVCache & Moneta (GoSF)
EVCache & Moneta (GoSF)
 
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
Red Hat Storage Day Seattle: Stabilizing Petabyte Ceph Cluster in OpenStack C...
 
Data & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon RedshiftData & Analytics - Session 2 - Introducing Amazon Redshift
Data & Analytics - Session 2 - Introducing Amazon Redshift
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
(GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices
(GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices(GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices
(GAM406) Glu Mobile: Real-time Analytics Processing og 10 MM+ Devices
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
 
AWS APAC Webinar Week - AWS MySQL Relational Database Services Best Practices...
AWS APAC Webinar Week - AWS MySQL Relational Database Services Best Practices...AWS APAC Webinar Week - AWS MySQL Relational Database Services Best Practices...
AWS APAC Webinar Week - AWS MySQL Relational Database Services Best Practices...
 
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard Warburton
 
Leveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data WarehouseLeveraging Amazon Redshift for Your Data Warehouse
Leveraging Amazon Redshift for Your Data Warehouse
 
Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?Hadoop 3.0 - Revolution or evolution?
Hadoop 3.0 - Revolution or evolution?
 
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
Scylla Summit 2016: Outbrain Case Study - Lowering Latency While Doing 20X IO...
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDB
 

More from Sadayuki Furuhashi

Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019
Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019
Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019Sadayuki Furuhashi
 
DigdagはなぜYAMLなのか?
DigdagはなぜYAMLなのか?DigdagはなぜYAMLなのか?
DigdagはなぜYAMLなのか?Sadayuki Furuhashi
 
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11Sadayuki Furuhashi
 
Plugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsSadayuki Furuhashi
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderSadayuki Furuhashi
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
 
Fluentd - Set Up Once, Collect More
Fluentd - Set Up Once, Collect MoreFluentd - Set Up Once, Collect More
Fluentd - Set Up Once, Collect MoreSadayuki Furuhashi
 
Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoSadayuki Furuhashi
 
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasual
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasualWhat's new in v11 - Fluentd Casual Talks #3 #fluentdcasual
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasualSadayuki Furuhashi
 
How we use Fluentd in Treasure Data
How we use Fluentd in Treasure DataHow we use Fluentd in Treasure Data
How we use Fluentd in Treasure DataSadayuki Furuhashi
 
How to collect Big Data into Hadoop
How to collect Big Data into HadoopHow to collect Big Data into Hadoop
How to collect Big Data into HadoopSadayuki Furuhashi
 
Programming Tools and Techniques #369 - The MessagePack Project
Programming Tools and Techniques #369 - The MessagePack ProjectProgramming Tools and Techniques #369 - The MessagePack Project
Programming Tools and Techniques #369 - The MessagePack ProjectSadayuki Furuhashi
 
gumiStudy#7 The MessagePack Project
gumiStudy#7 The MessagePack ProjectgumiStudy#7 The MessagePack Project
gumiStudy#7 The MessagePack ProjectSadayuki Furuhashi
 

More from Sadayuki Furuhashi (20)

Scripting Embulk Plugins
Scripting Embulk PluginsScripting Embulk Plugins
Scripting Embulk Plugins
 
Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019
Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019
Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019
 
DigdagはなぜYAMLなのか?
DigdagはなぜYAMLなのか?DigdagはなぜYAMLなのか?
DigdagはなぜYAMLなのか?
 
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
 
Plugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGems
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loader
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Presto+MySQLで分散SQL
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQL
 
Fluentd - Set Up Once, Collect More
Fluentd - Set Up Once, Collect MoreFluentd - Set Up Once, Collect More
Fluentd - Set Up Once, Collect More
 
Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for Presto
 
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasual
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasualWhat's new in v11 - Fluentd Casual Talks #3 #fluentdcasual
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasual
 
How we use Fluentd in Treasure Data
How we use Fluentd in Treasure DataHow we use Fluentd in Treasure Data
How we use Fluentd in Treasure Data
 
Fluentd meetup at Slideshare
Fluentd meetup at SlideshareFluentd meetup at Slideshare
Fluentd meetup at Slideshare
 
How to collect Big Data into Hadoop
How to collect Big Data into HadoopHow to collect Big Data into Hadoop
How to collect Big Data into Hadoop
 
Fluentd meetup
Fluentd meetupFluentd meetup
Fluentd meetup
 
upload test 1
upload test 1upload test 1
upload test 1
 
Programming Tools and Techniques #369 - The MessagePack Project
Programming Tools and Techniques #369 - The MessagePack ProjectProgramming Tools and Techniques #369 - The MessagePack Project
Programming Tools and Techniques #369 - The MessagePack Project
 
Gumi study7 messagepack
Gumi study7 messagepackGumi study7 messagepack
Gumi study7 messagepack
 
gumiStudy#7 The MessagePack Project
gumiStudy#7 The MessagePack ProjectgumiStudy#7 The MessagePack Project
gumiStudy#7 The MessagePack Project
 

Recently uploaded

Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 

Recently uploaded (20)

Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 

Making KVS 10x Scalable

  • 1. Making KVS 10x Scalable Sadayuki Furuhashi PLAZMA TD Tech Talk 2018 at Shibuya リアルタイム配信サーバを10倍最適化した方法 Senior Principal Engineer @frsyuki
  • 2. About Sadayuki Furuhashi A founder of Treasure Data, Inc. Located in Silicon Valley, USA. OSS Hacker. Github: @frsyuki OSS projects I initially designed:
  • 4. What's CDP KVS? ✓ Streaming data collection ✓ Bulk data collection CDP KVS (Today's topic) ✓ On-demand data delivery
  • 5. What's CDP KVS? CDP KVS Source data tables Data collection in many ways Audience data set customers behaviors Segment data sets US, JP, (EU) Mobile, PC, Devices Segmentation workflows Preprocess workflows
  • 6. JavaScript & Mobile Personalization API
  • 7. REST API call (using JavaScript SDK) Returned value
  • 8. Architecture (Old) CDP KVS Server CDP KVS Server AWS JP AWS US DynamoDBDAX DAX: DynamoDB's
 write-through cache Ignite Ignite: Distributed cache PrestoBulk write Random lookup by browsers / mobile
  • 10. DynamoDB's auto-scaling doesn't scale in time Request failure! Load spikes right after noon
  • 11. Expensive Write Capacity Cost Request failure! Already too expensive Bigger margin = even more expensive
  • 12. Workload analysis Read Write API Random lookup by ID Bulk write & append
 No delete Temporal Locality (時間的局所性) High (repeating visitors) Low (daily or hourly batch) Spacial Locality (空間的局所性) Moderate (hot & cold data sets) High (rewrite data sets by batch) Number of records: 300 billion (3,000億件) Size of a record: 10 bytes Size of total records: 3 TB Read traffic: 50 requests/sec
  • 13. Ideas (A) Alternative distributed KVS (Aerospike) (B) Storage Hierarchy on KVS (C) Edit log shipping & Indexed archive
  • 15. (A) Alternative Distributed KVS CDP KVS Server CDP KVS Server DynamoDBDAX Ignite Presto Presto Aerospike node Aerospike node Aerospike node
  • 16. Aerospike: Pros & Cons • Good: Very fast lookup • In-memory index + Direct IO on SSDs • Bad: Expensive (hardware & operation) • Same cost for both cold & hot data
 (Large memory overhead for cold data) • No spacial locality for write
 (a batch-write becomes random-writes)
  • 17. SSD /dev/sdb Aerospike: Storage Architecture Aerospike node DRAM hash(k01):
 addr 01, size=3 ... hash(k02):
 addr 09, size=3 hash(k03):
 addr 76, size=3 k01 = v01 k02 = v02 k03 = v03 addr 01: addr 09: addr 76: GET hash(k01) ✓ Primary keys (hash) are always in-memory => Always fast lookup ✓ Data is always on SSD => Always durable ✓ IO on SSD is direct IO (no filesystem cache) => Consistently fast without warm-up Load index
 at startup (cold-start)
  • 18. Aerospike: System Architecture { k01: v01 k02: v02 k03: v03 k04: v04 k05: v05 } { k06: v06 k07: v07 k08: v08 k09: v09 k0a: v0a } Aerospike node Aerospike node Aerospike node Aerospike node hash(key) = Node ID Aerospike node Aerospike node Bulk write 1 Bulk write 2 Batch write => Random write: No locality, No compression, More overhead Note: compressing 10-byte data isn't efficient
  • 19. Aerospike: Cost estimation • 1 record needs 64 bytes of DRAM for primary key indexing • Storing 100 billion records (our use case) needs
 6.4 TB of DRAM. • With replication-factor=3, our system needs
 19.2TB of DRAM. • It needs r5.24xlarge × 26 instances on EC2. • It costs $89,000/month (1-year reserved, convertible). • Cost structure: • Very high DRAM cost per GB • Moderate IOPS cost • Low storage & CPU cost • High operational cost
  • 21. Analyzing a cause of expensive DynamoDB WCU PK Col1 Col2 Key1 Key1 Col1 Key1 Col2 Key1 Key1 Col1 Key1 Col2 1KB 1KB 1KB 1KB 3.2KB Consumes 4 Write Capacity (0.8 WCU wasted)
  • 22. DynamoDB with record size <<< 1KB PK Value Key1 Val1 Key2 Key3 Key4 Val2 Val3 Val4 1KB => 1 Write Capacity => 1 Write Capacity => 1 Write Capacity => 1 Write Capacity 10 bytes 4 Write Capacity consumed to store 40 bytes. 99% WCU wasted!
  • 23. Solution: Optimizing DynamoDB WCU overhead PK Value Key1 Val1 Key2 Key3 Key4 Val2 Val3 Val4 10 bytes 10 bytes 10 bytes 10 bytes => 1 Write Capacity => 1 Write Capacity => 1 Write Capacity => 1 Write Capacity PK Value Part ID {Key1: Val1, Key2: Val2, Key3: Val3, Key4: Val4} 30 bytes => 1 Write Capacity (Note: expected 5x - 10x compression ratio)
  • 24. (B) Storage Hierarchy on KVS { k01: v01 k03: v03 k06: v06 k08: v08 k0a: v0a } { k02: v02 k04: v04 k05: v05 k07: v07 k09: v09 } Bulk write 1 Bulk write 2 Compress & Write DynamoDBDAX hash(partition id) = Primary key
  • 25. Storage Hierarchy on KVS: Pros & Cons • Good: Very scalable write & storage cost • Data compression (10x less write & storage cost) • Fewer number of primary keys
 (1 / 100,000 with 100k records in a partition) • Bad: Complex to understand & use • More difficult to understand • Writer (Presto) must partition data by partition id
  • 26. Data partitioning - write k01: v01 k03: v03
 k06: v06 k08: v08
 k0a: v0a k02: v02
 k04: v04 k05: v05 k07: v07
 k09: v09 { k01: v01 k02: v02 k03: v03 k04: v04 k05: v05 k06: v06 k07: v07 k08: v08 k09: v09 k0a: v0a } Original data set Partition id=71 Partition=69 Encoded Partition id=71 k01 v01 k03 v03 k06 v06 k08 v08 k0a v0a Encode & Compress Split 1 Split 2 Split 3 PK Split 1 Split 2 Split 3 71 69 Store hash(key) = partition id | split id Partitioning using Presto
 (GROUP BY + array_agg query) DynamoDB
  • 27. DynamoDB Data partitioning - read PK Split 1 Split 2 Split 3 71 69 Get hash(key) = partition id | split id GET k06 k06 is at: partition id=71 split id=2 k03 v03 k06 v06 Scan { k06: v06 } DAX (cache) Encoded split
  • 28. Idea (C) Edit log shipping & Indexed Archive
  • 29. (C) Edit log shipping & Indexed Archive Kafka / Kinesis (+ S3) Writer API NodeWriter API Stream of bulk-write data sets Indexing &
 Storage Node RocksDB Shard
 0, 1 Indexing &
 Storage Node RocksDB Shard
 1, 2 Indexing &
 Storage Node RocksDB Shard
 2, 3 Indexing &
 Storage Node RocksDB Shard
 3, 0 Writer API NodeReader API etcd, consul Shard & node list
 management Write Read Bulk-write S3 for backup & cold-start Subscribe Read
  • 30. Architecture of RocksDB Optimization of RocksDB for Redis on Flash, Keren Ouaknine, Oran Agra, and Zvika Guz
  • 31. Pros & Cons • Good: Very scalable write & storage cost • Data compression (10x less write & storage cost) • Bad: Expensive to implement & operate • Implementing 3 custom server components
 (Stateless: Writer, Reader. Stateful: Storage) • Operating stateful servers - more work to implement backup, restoring, monitoring, alerting, etc. • Others: • Flexible indexing • Eventually-consistent
  • 32. Our decision: Storage Hierarchy on DynamoDB • Operating stateful servers is harder than you think! • Note: almost all Treasure Data components are stateless (or cache or temporary buffer) • Even if data format becomes complicated, stateless servers on DynamoDB is better option for us.
  • 33. Appendix: Split format PK Split 1 Split 2 Split 3 71 69 { k03: v03, k06: v06, ... } msgpack( [ [keyLen 1, keyLen 2, keyLen 3, ...], "key1key2key3...", [valLen 1, valLen 2, valLen 3, ...], "val1val2val3...", ] ) zstd( msgpack( [ , msgpack( [ [keyLen, keyLen, keyLen, ...], "keykeykey...", [valLen, valLen, valLen, ...], "valvalval...", ] ) , ... ] ) , bucket 1 bucket 2 bucket NHash table
 serialized by MessagePack
 compressed by Zstd Size of a split: approx. 200KB (100,000 records) Nested MessagePack to omit unnecessary deserialization when looking up a record
  • 35. Bulk write performance 6x less total time 8x faster single bulk-write (which loops 18 times)
  • 36. DynamoDB Write Capacity Consumption 210 105 921k Write Capacity in 45 minutes. 170 Write Capacity per second average (≒ 170 WCU).
  • 37. What's Next? • Implementation => DONE • Testing => DONE • Deploying => on-going • Designing Future Extensions => FUTURE WORK
  • 38. A possible future work Read Write API Random lookup by ID Random write Temporal Locality (時間的局所性) High (repeating visitors) Low => High? Spacial Locality (空間的局所性) Moderate (hot & cold data sets) High => Low? Extension for streaming computation (=> An on-demand read operation updates a value)