SlideShare a Scribd company logo
1 of 37
Download to read offline
Riak @


     Robby Grossman
  robby@shareaholic.com
       @freerobby
Agenda

Shareaholic: Product & Tech

Why Riak: The Search for a Big Data Store

Transitioning to Riak

Riak Use Cases

Deploying to EC2
What’s   ?
Browser Tools
Sharing Buttons
Recommendations
Social Analytics
Monthly @

 Thousands of developers hitting API

 Hundreds of thousands of publishers

 Tens of millions of shares & clicks

 Hundreds of millions of pageviews & events
Tech @

JRuby on Rails (via Torquebox)

MySQL (Master, Read Slave)

Elastic MapReduce (similar to Hadoop)

Redis

Formerly Mongo, Now Riak
Why Not Mongo?


Working set needs to fit in memory

Global write lock blocks all queries
despite not having transactions/joins

Standbys not “hot”
Why Riak?
Next @
Options:      Goals:

  HBase         Linear scalability

  Cassandra     Full-text search

  Riak          Flexible indexing

                Easier Devops
HBase
Pros                  Cons

  Battle tested           Complex
                          Architecture
  High performance
                          SPOFs

                          Requires Hive for
                          Indexing/Querying

                          Expensive to deploy
                          at small scale
Cassandra
Pros                   Cons

  Native secondary       Known users all
  indices                domain experts

  Linear scalability     Search requires
                         Lucene
  Tunable CAP
                         Heavy Weight
                         MapReduce
Riak
Pros                          Cons

  Operationally simpler         Multi-data center
                                replication requires
  Linear scalability            Enterprise product

  Integrated search             leveldb puts high
                                strain on CPU
  Secondary indices

  Tunable CAP

  Vector clocks solve
  time-sync problems
From Mongo to Riak
Migration Goals



No time where database goes “offline”

Product parity throughout migration
Migration Process

1. App writes to Mongo and Riak

2. Verify data integrity

3. Import historical data

4. App reads from Riak

5. Decommission Mongo
Use Cases
Share API


Save shared content

Uses MapReduce to
populate user dashboard
Recommendations



Sets of related pages

Generated on-demand
Publisher Analytics


Generated nightly via Hadoop

Typical stored “document” (JSON)

80kb-1Mb
Riak Successes
MapReduce

Handy for querying

Runs at “web page speed”.

Easy to re-reduce for complex queries

Easy to test via CURL
Tunable CAP @


    Replication: primary/secondary authority

    Read failure tolerance: speed/consistency

    Write failure tolerance
Full Text Search

Built on Lucene

Make user content searchable

Make arbitrary keys queryable

“Just turn it on”


Hiccup: corrupt merge indexes
Query Example
  Who’s our oldest user who’s shared something in the last minute?

curl -XPOST http://localhost:8098/mapred -H 'Content-Type: application/json' -d '{
   "inputs": {
      "bucket":"links",
      "query":"timestamp:[1346350877 TO 1346350937}" //60 second period
   },
   "query":[
      {"map":{"language":"javascript","source":"function(riakObject) {
         return [[Riak.mapValuesJson(riakObject)[0].user_id]];
      }"}},
      {"reduce":{"language":"javascript",
         "name":"Riak.reduceMin" // [[2],[5],[9],[13]] => [[2]]
      }}
   ]
}'


                                    [[2197]]
Riak on EC2
In a Nutshell

EC2 specs poorly proportioned for leveldb

Multiple AZs in one location works well

Scale vertically for better latency & consistency

Scale horizontally for more throughput/$
Benchmarks

Top Graph: c1.medium (1.7G, 5 CPU)



Middle: m1.large (7.5G, 4 CPU)



Bottom: cc1.4xlarge (23G, 33.5 CPU)
Throughput
Latency (Typical)
Latency (Worst Case)
Calculations
c1.medium (1.7G, 5 CPU)
1758 IOPS/$-hr
Worst 1% of queries: 300ms/800ms

m1.large (7.5G, 4 CPU)
1167 IOPS/$-hr
Worst 1% of queries: 110ms/200ms

cc1.4xlarge (23G, 33.5 CPU)
872 IOPS/$-hr
Worst 1% of queries: 47ms/139ms
Benchmark Takeaways


 You can’t go “by spec”

 IO is limiting factor

 RAM never limiting factor for 1%
 of keyspace to be in memory
Fin. Questions?
Thanks:                 We’re Hiring!

  Tom Santero              Robby Grossman

  Justin Sheehy            robby@shareaholic.com

  Ryan Zezeski             @freerobby

  Reid Draper

  #freenode riak crew
Fin.

More Related Content

What's hot

Real time dashboards with Kafka and Druid
Real time dashboards with Kafka and DruidReal time dashboards with Kafka and Druid
Real time dashboards with Kafka and DruidVenu Ryali
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonJen Aman
 
Spark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed AwanSpark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed AwanSpark Summit
 
Building a derived data store using Kafka
Building a derived data store using KafkaBuilding a derived data store using Kafka
Building a derived data store using KafkaVenu Ryali
 
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...confluent
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...confluent
 
Mobius: C# Language Binding For Spark
Mobius: C# Language Binding For SparkMobius: C# Language Binding For Spark
Mobius: C# Language Binding For SparkSpark Summit
 
Scaling spark on kubernetes at Lyft
Scaling spark on kubernetes at LyftScaling spark on kubernetes at Lyft
Scaling spark on kubernetes at LyftLi Gao
 
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...StreamNative
 
Apache Kafka® at Dropbox
Apache Kafka® at DropboxApache Kafka® at Dropbox
Apache Kafka® at Dropboxconfluent
 
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafkaconfluent
 
Hive & HBase For Transaction Processing
Hive & HBase For Transaction ProcessingHive & HBase For Transaction Processing
Hive & HBase For Transaction ProcessingDataWorks Summit
 
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...HostedbyConfluent
 
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and JujuMining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Jujuseoul_engineer
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationKnoldus Inc.
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Amazon Web Services
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...confluent
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineSunil Nagaraj
 
Apache Superset at Airbnb
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at AirbnbBill Liu
 
When the Cloud is a Rockin: High Availability in Apache CloudStack
When the Cloud is a Rockin: High Availability in Apache CloudStackWhen the Cloud is a Rockin: High Availability in Apache CloudStack
When the Cloud is a Rockin: High Availability in Apache CloudStackJohn Burwell
 

What's hot (20)

Real time dashboards with Kafka and Druid
Real time dashboards with Kafka and DruidReal time dashboards with Kafka and Druid
Real time dashboards with Kafka and Druid
 
GPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And PythonGPU Computing With Apache Spark And Python
GPU Computing With Apache Spark And Python
 
Spark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed AwanSpark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed Awan
 
Building a derived data store using Kafka
Building a derived data store using KafkaBuilding a derived data store using Kafka
Building a derived data store using Kafka
 
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
Creating Connector to Bridge the Worlds of Kafka and gRPC at Wework (Anoop Di...
 
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
Enabling Insight to Support World-Class Supercomputing (Stefan Ceballos, Oak ...
 
Mobius: C# Language Binding For Spark
Mobius: C# Language Binding For SparkMobius: C# Language Binding For Spark
Mobius: C# Language Binding For Spark
 
Scaling spark on kubernetes at Lyft
Scaling spark on kubernetes at LyftScaling spark on kubernetes at Lyft
Scaling spark on kubernetes at Lyft
 
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
Ten reasons to choose Apache Pulsar over Apache Kafka for Event Sourcing_Robe...
 
Apache Kafka® at Dropbox
Apache Kafka® at DropboxApache Kafka® at Dropbox
Apache Kafka® at Dropbox
 
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
 
Hive & HBase For Transaction Processing
Hive & HBase For Transaction ProcessingHive & HBase For Transaction Processing
Hive & HBase For Transaction Processing
 
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...
The New Way of Configuring Grace Periods for Windowed Operations in Kafka Str...
 
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and JujuMining public datasets using opensource tools: Zeppelin, Spark and Juju
Mining public datasets using opensource tools: Zeppelin, Spark and Juju
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configuration
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture Pipeline
 
Apache Superset at Airbnb
Apache Superset at AirbnbApache Superset at Airbnb
Apache Superset at Airbnb
 
When the Cloud is a Rockin: High Availability in Apache CloudStack
When the Cloud is a Rockin: High Availability in Apache CloudStackWhen the Cloud is a Rockin: High Availability in Apache CloudStack
When the Cloud is a Rockin: High Availability in Apache CloudStack
 

Similar to Migrating to Riak at Shareaholic

How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastMapR Technologies
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Chris Fregly
 
Scalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache SamzaScalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache SamzaPrateek Maheshwari
 
Riak at Engine Yard Cloud
Riak at Engine Yard CloudRiak at Engine Yard Cloud
Riak at Engine Yard CloudInes Sombra
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesJen Aman
 
Efficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out DatabasesEfficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out DatabasesSnappyData
 
Containerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesContainerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesDataWorks Summit
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsDirecti Group
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesshnkr_rmchndrn
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Mac Moore
 
DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012Amazon Web Services
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DBHeriyadi Janwar
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun JeongSpark Summit
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsYousun Jeong
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerMichael Rys
 

Similar to Migrating to Riak at Shareaholic (20)

How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and Fast
 
Understanding Database Options
Understanding Database OptionsUnderstanding Database Options
Understanding Database Options
 
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
Global Big Data Conference Sept 2014 AWS Kinesis Spark Streaming Approximatio...
 
Kafka & Hadoop in Rakuten
Kafka & Hadoop in RakutenKafka & Hadoop in Rakuten
Kafka & Hadoop in Rakuten
 
Glint with Apache Spark
Glint with Apache SparkGlint with Apache Spark
Glint with Apache Spark
 
High Performance Databases
High Performance DatabasesHigh Performance Databases
High Performance Databases
 
Scalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache SamzaScalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache Samza
 
Riak at Engine Yard Cloud
Riak at Engine Yard CloudRiak at Engine Yard Cloud
Riak at Engine Yard Cloud
 
Efficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out DatabasesEfficient State Management With Spark 2.0 And Scale-Out Databases
Efficient State Management With Spark 2.0 And Scale-Out Databases
 
Efficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out DatabasesEfficient State Management With Spark 2.x And Scale-Out Databases
Efficient State Management With Spark 2.x And Scale-Out Databases
 
Containerized Hadoop beyond Kubernetes
Containerized Hadoop beyond KubernetesContainerized Hadoop beyond Kubernetes
Containerized Hadoop beyond Kubernetes
 
Handling Data in Mega Scale Systems
Handling Data in Mega Scale SystemsHandling Data in Mega Scale Systems
Handling Data in Mega Scale Systems
 
Navigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skiesNavigating NoSQL in cloudy skies
Navigating NoSQL in cloudy skies
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
 
DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012DAT101 Understanding AWS Database Options - AWS re: Invent 2012
DAT101 Understanding AWS Database Options - AWS re: Invent 2012
 
SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15SnappyData overview NikeTechTalk 11/19/15
SnappyData overview NikeTechTalk 11/19/15
 
Microsoft Openness Mongo DB
Microsoft Openness Mongo DBMicrosoft Openness Mongo DB
Microsoft Openness Mongo DB
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics
 
SQL and NoSQL in SQL Server
SQL and NoSQL in SQL ServerSQL and NoSQL in SQL Server
SQL and NoSQL in SQL Server
 

Recently uploaded

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Recently uploaded (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Migrating to Riak at Shareaholic

  • 1. Riak @ Robby Grossman robby@shareaholic.com @freerobby
  • 2. Agenda Shareaholic: Product & Tech Why Riak: The Search for a Big Data Store Transitioning to Riak Riak Use Cases Deploying to EC2
  • 8. Monthly @ Thousands of developers hitting API Hundreds of thousands of publishers Tens of millions of shares & clicks Hundreds of millions of pageviews & events
  • 9. Tech @ JRuby on Rails (via Torquebox) MySQL (Master, Read Slave) Elastic MapReduce (similar to Hadoop) Redis Formerly Mongo, Now Riak
  • 10. Why Not Mongo? Working set needs to fit in memory Global write lock blocks all queries despite not having transactions/joins Standbys not “hot”
  • 12. Next @ Options: Goals: HBase Linear scalability Cassandra Full-text search Riak Flexible indexing Easier Devops
  • 13. HBase Pros Cons Battle tested Complex Architecture High performance SPOFs Requires Hive for Indexing/Querying Expensive to deploy at small scale
  • 14. Cassandra Pros Cons Native secondary Known users all indices domain experts Linear scalability Search requires Lucene Tunable CAP Heavy Weight MapReduce
  • 15. Riak Pros Cons Operationally simpler Multi-data center replication requires Linear scalability Enterprise product Integrated search leveldb puts high strain on CPU Secondary indices Tunable CAP Vector clocks solve time-sync problems
  • 17. Migration Goals No time where database goes “offline” Product parity throughout migration
  • 18. Migration Process 1. App writes to Mongo and Riak 2. Verify data integrity 3. Import historical data 4. App reads from Riak 5. Decommission Mongo
  • 20. Share API Save shared content Uses MapReduce to populate user dashboard
  • 21. Recommendations Sets of related pages Generated on-demand
  • 22. Publisher Analytics Generated nightly via Hadoop Typical stored “document” (JSON) 80kb-1Mb
  • 24. MapReduce Handy for querying Runs at “web page speed”. Easy to re-reduce for complex queries Easy to test via CURL
  • 25. Tunable CAP @ Replication: primary/secondary authority Read failure tolerance: speed/consistency Write failure tolerance
  • 26. Full Text Search Built on Lucene Make user content searchable Make arbitrary keys queryable “Just turn it on” Hiccup: corrupt merge indexes
  • 27. Query Example Who’s our oldest user who’s shared something in the last minute? curl -XPOST http://localhost:8098/mapred -H 'Content-Type: application/json' -d '{ "inputs": { "bucket":"links", "query":"timestamp:[1346350877 TO 1346350937}" //60 second period }, "query":[ {"map":{"language":"javascript","source":"function(riakObject) { return [[Riak.mapValuesJson(riakObject)[0].user_id]]; }"}}, {"reduce":{"language":"javascript", "name":"Riak.reduceMin" // [[2],[5],[9],[13]] => [[2]] }} ] }' [[2197]]
  • 29. In a Nutshell EC2 specs poorly proportioned for leveldb Multiple AZs in one location works well Scale vertically for better latency & consistency Scale horizontally for more throughput/$
  • 30. Benchmarks Top Graph: c1.medium (1.7G, 5 CPU) Middle: m1.large (7.5G, 4 CPU) Bottom: cc1.4xlarge (23G, 33.5 CPU)
  • 34. Calculations c1.medium (1.7G, 5 CPU) 1758 IOPS/$-hr Worst 1% of queries: 300ms/800ms m1.large (7.5G, 4 CPU) 1167 IOPS/$-hr Worst 1% of queries: 110ms/200ms cc1.4xlarge (23G, 33.5 CPU) 872 IOPS/$-hr Worst 1% of queries: 47ms/139ms
  • 35. Benchmark Takeaways You can’t go “by spec” IO is limiting factor RAM never limiting factor for 1% of keyspace to be in memory
  • 36. Fin. Questions? Thanks: We’re Hiring! Tom Santero Robby Grossman Justin Sheehy robby@shareaholic.com Ryan Zezeski @freerobby Reid Draper #freenode riak crew
  • 37. Fin.