Migrating to Riak at Shareaholic

•

2 likes•2,066 views

Robby Grossman, Shareaholic's Tech Lead, spoke at the first Boston Riak Meetup on August 30, 2012. These are his slides.

Technology

Riak @

Robby Grossman
robby@shareaholic.com
@freerobby

Agenda

Shareaholic: Product & Tech

Why Riak: The Search for a Big Data Store

Transitioning to Riak

Riak Use Cases

Deploying to EC2

Monthly @

Thousands of developers hitting API

Hundreds of thousands of publishers

Tens of millions of shares & clicks

Hundreds of millions of pageviews & events

Tech @

JRuby on Rails (via Torquebox)

MySQL (Master, Read Slave)

Elastic MapReduce (similar to Hadoop)

Redis

Formerly Mongo, Now Riak

Why Not Mongo?

Working set needs to ﬁt in memory

Global write lock blocks all queries
despite not having transactions/joins

Standbys not “hot”

Next @
Options: Goals:

HBase Linear scalability

Cassandra Full-text search

Riak Flexible indexing

Easier Devops

HBase
Pros Cons

Battle tested Complex
Architecture
High performance
SPOFs

Requires Hive for
Indexing/Querying

Expensive to deploy
at small scale

Cassandra
Pros Cons

Native secondary Known users all
indices domain experts

Linear scalability Search requires
Lucene
Tunable CAP
Heavy Weight
MapReduce

Riak
Pros Cons

Operationally simpler Multi-data center
replication requires
Linear scalability Enterprise product

Integrated search leveldb puts high
strain on CPU
Secondary indices

Tunable CAP

Vector clocks solve
time-sync problems

Migration Goals

No time where database goes “ofﬂine”

Product parity throughout migration

Migration Process

1. App writes to Mongo and Riak

2. Verify data integrity

3. Import historical data

4. App reads from Riak

5. Decommission Mongo

Share API

Save shared content

Uses MapReduce to
populate user dashboard

Recommendations

Sets of related pages

Generated on-demand

Publisher Analytics

Generated nightly via Hadoop

Typical stored “document” (JSON)

80kb-1Mb

MapReduce

Handy for querying

Runs at “web page speed”.

Easy to re-reduce for complex queries

Easy to test via CURL

Tunable CAP @

Replication: primary/secondary authority

Read failure tolerance: speed/consistency

Write failure tolerance

Full Text Search

Built on Lucene

Make user content searchable

Make arbitrary keys queryable

“Just turn it on”

Hiccup: corrupt merge indexes

$Query Example Who’s our oldest user who’s shared something in the last minute? curl -XPOST http://localhost:8098/mapred -H 'Content-Type: application/json' -d '{ "inputs": { "bucket":"links", "query":"timestamp:[1346350877 TO 1346350937}" //60 second period }, "query":[ {"map":{"language":"javascript","source":"function(riakObject) { return [[Riak.mapValuesJson(riakObject)[0].user_id]]; }"}}, {"reduce":{"language":"javascript", "name":"Riak.reduceMin" // [[2],[5],[9],[13]] => [[2]] }} ] }' [[2197]]$

In a Nutshell

EC2 specs poorly proportioned for leveldb

Multiple AZs in one location works well

Scale vertically for better latency & consistency

Scale horizontally for more throughput/$

Benchmarks

Top Graph: c1.medium (1.7G, 5 CPU)

Middle: m1.large (7.5G, 4 CPU)

Bottom: cc1.4xlarge (23G, 33.5 CPU)

Calculations
c1.medium (1.7G, 5 CPU)
1758 IOPS/$-hr
Worst 1% of queries: 300ms/800ms

m1.large (7.5G, 4 CPU)
1167 IOPS/$-hr
Worst 1% of queries: 110ms/200ms

cc1.4xlarge (23G, 33.5 CPU)
872 IOPS/$-hr
Worst 1% of queries: 47ms/139ms

Benchmark Takeaways

You can’t go “by spec”

IO is limiting factor

RAM never limiting factor for 1%
of keyspace to be in memory

Fin. Questions?
Thanks: We’re Hiring!

Tom Santero Robby Grossman

Justin Sheehy robby@shareaholic.com

Ryan Zezeski @freerobby

Reid Draper

#freenode riak crew

What's hot

Real time dashboards with Kafka and DruidVenu Ryali

GPU Computing With Apache Spark And PythonJen Aman

Spark Summit EU talk by Ahsan Javed AwanSpark Summit

Building a derived data store using KafkaVenu Ryali

Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...confluent

Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...confluent

Mobius: C# Language Binding For SparkSpark Summit

Scaling spark on kubernetes at LyftLi Gao

Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...StreamNative

Apache Kafka® at Dropboxconfluent

Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafkaconfluent

Hive & HBase For Transaction ProcessingDataWorks Summit

The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...HostedbyConfluent

Mining public datasets using opensource tools: Zeppelin, Spark and Jujuseoul_engineer

Removing performance bottlenecks with Kafka Monitoring and topic configurationKnoldus Inc.

Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Amazon Web Services

Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...confluent

Databus - LinkedIn's Change Data Capture PipelineSunil Nagaraj

Apache Superset at AirbnbBill Liu

When the Cloud is a Rockin: High Availability in Apache CloudStackJohn Burwell

What's hot (20)

Real time dashboards with Kafka and Druid

GPU Computing With Apache Spark And Python

Spark Summit EU talk by Ahsan Javed Awan

Building a derived data store using Kafka

Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...

Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...

Mobius: C# Language Binding For Spark

Scaling spark on kubernetes at Lyft

Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...

Apache Kafka® at Dropbox

Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka

Hive & HBase For Transaction Processing

The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...

Mining public datasets using opensource tools: Zeppelin, Spark and Juju

Removing performance bottlenecks with Kafka Monitoring and topic configuration

Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013

Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...

Databus - LinkedIn's Change Data Capture Pipeline

Apache Superset at Airbnb

When the Cloud is a Rockin: High Availability in Apache CloudStack

Similar to Migrating to Riak at Shareaholic

How to Make Hadoop Easy, Dependable and FastMapR Technologies

Understanding Database OptionsAmazon Web Services

Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Chris Fregly

Kafka & Hadoop in RakutenRakuten Group, Inc.

Glint with Apache SparkVenkata Naga Ravi

High Performance DatabasesAmazon Web Services

Scalable Stream Processing with Apache SamzaPrateek Maheshwari

Riak at Engine Yard CloudInes Sombra

Efficient State Management With Spark 2.0 And Scale-Out DatabasesJen Aman

Efficient State Management With Spark 2.x And Scale-Out DatabasesSnappyData

Containerized Hadoop beyond KubernetesDataWorks Summit

Handling Data in Mega Scale SystemsDirecti Group

Navigating NoSQL in cloudy skiesshnkr_rmchndrn

Scaling Spark Workloads on YARN - Boulder/Denver July 2015Mac Moore

DAT101 Understanding AWS Database Options - AWS re: Invent 2012Amazon Web Services

SnappyData overview NikeTechTalk 11/19/15SnappyData

Microsoft Openness Mongo DBHeriyadi Janwar

Big Telco - Yousun JeongSpark Summit

Big Telco Real-Time Network AnalyticsYousun Jeong

SQL and NoSQL in SQL ServerMichael Rys

Similar to Migrating to Riak at Shareaholic (20)

How to Make Hadoop Easy, Dependable and Fast

Understanding Database Options

Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...

Kafka & Hadoop in Rakuten

Glint with Apache Spark

High Performance Databases

Scalable Stream Processing with Apache Samza

Riak at Engine Yard Cloud

Efficient State Management With Spark 2.0 And Scale-Out Databases

Efficient State Management With Spark 2.x And Scale-Out Databases

Containerized Hadoop beyond Kubernetes

Handling Data in Mega Scale Systems

Navigating NoSQL in cloudy skies

Scaling Spark Workloads on YARN - Boulder/Denver July 2015

DAT101 Understanding AWS Database Options - AWS re: Invent 2012

SnappyData overview NikeTechTalk 11/19/15

Microsoft Openness Mongo DB

Big Telco - Yousun Jeong

Big Telco Real-Time Network Analytics

SQL and NoSQL in SQL Server

Recently uploaded

AI as an Interface for Commercial BuildingsMemoori

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Install Stable Diffusion in windows machinePadma Pradeep

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

"ML in Production",Oleksandr BaganFwdays

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays

Story boards and shot lists for my a level piececharlottematthew16

CloudStudio User manual (basic edition):comworks

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2

Recently uploaded (20)

AI as an Interface for Commercial Buildings

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

My INSURER PTE LTD - Insurtech Innovation Award 2024

Vertex AI Gemini Prompt Engineering Tips

DevEX - reference for building teams, processes, and platforms

Connect Wave/ connectwave Pitch Deck Presentation

"Debugging python applications inside k8s environment", Andrii Soldatenko

Streamlining Python Development: A Guide to a Modern Project Setup

Install Stable Diffusion in windows machine

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost

Unleash Your Potential - Namagunga Girls Coding Club

"ML in Production",Oleksandr Bagan

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack

Story boards and shot lists for my a level piece

CloudStudio User manual (basic edition):

Human Factors of XR: Using Human Factors to Design XR Systems

The Future of Software Development - Devin AI Innovative Approach.pdf

Migrating to Riak at Shareaholic

1. Riak @ Robby Grossman robby@shareaholic.com @freerobby

2. Agenda Shareaholic: Product & Tech Why Riak: The Search for a Big Data Store Transitioning to Riak Riak Use Cases Deploying to EC2

3. What’s ?

8. Monthly @ Thousands of developers hitting API Hundreds of thousands of publishers Tens of millions of shares & clicks Hundreds of millions of pageviews & events

9. Tech @ JRuby on Rails (via Torquebox) MySQL (Master, Read Slave) Elastic MapReduce (similar to Hadoop) Redis Formerly Mongo, Now Riak

10. Why Not Mongo? Working set needs to ﬁt in memory Global write lock blocks all queries despite not having transactions/joins Standbys not “hot”

11. Why Riak?

12. Next @ Options: Goals: HBase Linear scalability Cassandra Full-text search Riak Flexible indexing Easier Devops

13. HBase Pros Cons Battle tested Complex Architecture High performance SPOFs Requires Hive for Indexing/Querying Expensive to deploy at small scale

14. Cassandra Pros Cons Native secondary Known users all indices domain experts Linear scalability Search requires Lucene Tunable CAP Heavy Weight MapReduce

15. Riak Pros Cons Operationally simpler Multi-data center replication requires Linear scalability Enterprise product Integrated search leveldb puts high strain on CPU Secondary indices Tunable CAP Vector clocks solve time-sync problems

16. From Mongo to Riak

17. Migration Goals No time where database goes “ofﬂine” Product parity throughout migration

18. Migration Process 1. App writes to Mongo and Riak 2. Verify data integrity 3. Import historical data 4. App reads from Riak 5. Decommission Mongo

19. Use Cases

20. Share API Save shared content Uses MapReduce to populate user dashboard

21. Recommendations Sets of related pages Generated on-demand

22. Publisher Analytics Generated nightly via Hadoop Typical stored “document” (JSON) 80kb-1Mb

23. Riak Successes

24. MapReduce Handy for querying Runs at “web page speed”. Easy to re-reduce for complex queries Easy to test via CURL

25. Tunable CAP @ Replication: primary/secondary authority Read failure tolerance: speed/consistency Write failure tolerance

26. Full Text Search Built on Lucene Make user content searchable Make arbitrary keys queryable “Just turn it on” Hiccup: corrupt merge indexes

27. Query Example Who’s our oldest user who’s shared something in the last minute? curl -XPOST http://localhost:8098/mapred -H 'Content-Type: application/json' -d '{ "inputs": { "bucket":"links", "query":"timestamp:[1346350877 TO 1346350937}" //60 second period }, "query":[ {"map":{"language":"javascript","source":"function(riakObject) { return [[Riak.mapValuesJson(riakObject)[0].user_id]]; }"}}, {"reduce":{"language":"javascript", "name":"Riak.reduceMin" // [[2],[5],[9],[13]] => [[2]] }} ] }' [[2197]]

28. Riak on EC2

29. In a Nutshell EC2 specs poorly proportioned for leveldb Multiple AZs in one location works well Scale vertically for better latency & consistency Scale horizontally for more throughput/$

30. Benchmarks Top Graph: c1.medium (1.7G, 5 CPU) Middle: m1.large (7.5G, 4 CPU) Bottom: cc1.4xlarge (23G, 33.5 CPU)

31. Throughput

32. Latency (Typical)

33. Latency (Worst Case)

34. Calculations c1.medium (1.7G, 5 CPU) 1758 IOPS/$-hr Worst 1% of queries: 300ms/800ms m1.large (7.5G, 4 CPU) 1167 IOPS/$-hr Worst 1% of queries: 110ms/200ms cc1.4xlarge (23G, 33.5 CPU) 872 IOPS/$-hr Worst 1% of queries: 47ms/139ms

35. Benchmark Takeaways You can’t go “by spec” IO is limiting factor RAM never limiting factor for 1% of keyspace to be in memory

36. Fin. Questions? Thanks: We’re Hiring! Tom Santero Robby Grossman Justin Sheehy robby@shareaholic.com Ryan Zezeski @freerobby Reid Draper #freenode riak crew

37. Fin.

Migrating to Riak at Shareaholic

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Migrating to Riak at Shareaholic

Similar to Migrating to Riak at Shareaholic (20)

Recently uploaded

Recently uploaded (20)

Migrating to Riak at Shareaholic