SlideShare a Scribd company logo
1 of 49
Download to read offline
©2013 DataStax Confidential. Do not distribute without consent.
@PatrickMcFadin
Patrick McFadin

Chief Evangelist
Time Series with Apache Cassandra
1
Quick intro to Cassandra
• Shared nothing
• Masterless peer-to-peer
• Based on Dynamo
Scaling
• Add nodes to scale
• Millions Ops/s Cassandra HBase Redis MySQL
THROUGHPUTOPS/SEC)
Uptime
• Built to replicate
• Resilient to failure
• Always on
NONE
Easy to use
• CQL is a familiar syntax
• Friendly to programmers
• Paxos for locking
CREATE TABLE users (!
username varchar,!
firstname varchar,!
lastname varchar,!
email list<varchar>,!
password varchar,!
created_date timestamp,!
PRIMARY KEY (username)!
);
INSERT INTO users (username, firstname, lastname, !
email, password, created_date)!
VALUES ('pmcfadin','Patrick','McFadin',!
['patrick@datastax.com'],'ba27e03fd95e507daf2937c937d499ab',!
'2011-06-20 13:50:00');!
INSERT INTO users (username, firstname, !
lastname, email, password, created_date)!
VALUES ('pmcfadin','Patrick','McFadin',!
['patrick@datastax.com'],!
'ba27e03fd95e507daf2937c937d499ab',!
'2011-06-20 13:50:00')!
IF NOT EXISTS;
Time series in production
• It’s all about “What’s happening”
• Data is the new currency
Stack Driver
• AWS and Rackspace monitoring
• Quick indexes
• Batch rollup results
MyDrive
• Moved from Mongo to Cassandra
• Queue processing
• Bound at the storing data
“One thing that is not at all obvious
from the graph is that the system was
also under massively heavier strain
after the switch to Cassandra because
of additional bulk processing going on
in the background.”
- Karl Matthias, MyDrive
Paddy Power
• Real-time product and pricing
• Much like stock tickers
• Active-active across two data
centers
“Specifically for Cassandra and Datastax, the
ability to process time-series data is something
that lots of companies have done in the past, not
something that we were very aware of, and that was
one of the reasons why we chose this as the first
use case for Cassandra.”
- John Turner, Paddy Power
Internet Of Things
• 15B devices by 2015
• 40B devices by 2020!
Why Cassandra for Time Series
Scales
Resilient
Good data model
Efficient Storage Model
What about that?
Example 1: Weather Station
• Weather station collects data
• Cassandra stores in sequence
• Application reads in sequence
Use case
• Store data per weather station
• Store time series in order: first to last
• Get all data for one weather station
• Get data for a single date and time
• Get data for a range of dates and times
Needed Queries
Data Model to support queries
Data Model
• Weather Station Id and Time
are unique
• Store as many as needed
CREATE TABLE temperature (
weatherstation_id text,
event_time timestamp,
temperature text,
PRIMARY KEY (weatherstation_id,event_time)
);
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:01:00','72F');
!
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:02:00','73F');
!
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:03:00','73F');
!
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES ('1234ABCD','2013-04-03 07:04:00','74F');
Storage Model - Logical View
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F
SELECT weatherstation_id,event_time,temperature
FROM temperature
WHERE weatherstation_id='1234ABCD';
1234ABCD
1234ABCD
1234ABCD
weatherstation_id event_time temperature
2013-04-03 07:04:00
74F
1234ABCD
Storage Model - Disk Layout
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F
1234ABCD
2013-04-03 07:04:00
74F
SELECT weatherstation_id,event_time,temperature
FROM temperature
WHERE weatherstation_id='1234ABCD';
Merged, Sorted and Stored Sequentially
2013-04-03 07:05:00!
!
74F
2013-04-03 07:06:00!
!
75F
Query patterns
• Range queries
• “Slice” operation on disk
SELECT weatherstation_id,event_time,temperature
FROM temperature
WHERE weatherstation_id='1234ABCD'
AND event_time >= '2013-04-03 07:01:00'
AND event_time <= '2013-04-03 07:04:00';
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F
1234ABCD
2013-04-03 07:04:00
74F
2013-04-03 07:05:00!
!
74F
2013-04-03 07:06:00!
!
75F
Single seek on disk
Query patterns
• Range queries
• “Slice” operation on disk
SELECT weatherstation_id,event_time,temperature
FROM temperature
WHERE weatherstation_id='1234ABCD'
AND event_time >= '2013-04-03 07:01:00'
AND event_time <= '2013-04-03 07:04:00';
2013-04-03 07:01:00
72F
2013-04-03 07:02:00
73F
2013-04-03 07:03:00
73F
1234ABCD
2013-04-03 07:04:00
74F
weatherstation_id event_time temperature
1234ABCD
1234ABCD
1234ABCD
Programmers like this
Sorted by event_time
Additional help on the storage engine
SSTable seeks
• Each read minimum
1 seek
• Cache and bloom
filter help minimize
Total seek time = Disk Latency * number of seeks
The key to speed
Use the first part of the primary key to get the node
(data localization)
Minimize seeks for SStables
(Bloom Filter,Key Cache up-to-date)
Find the data fast in the SSTable
(Indexes)
Min/Max Value Hint
• New since 2.0
• Range index on primary key values per SSTable
• Minimizes seeks on range data
CASSANDRA-5514 if you are interested in details
SELECT temperature
FROM event_time,temperature
WHERE weatherstation_id='1234ABCD'
AND event_time => '2013-04-03 07:01:00'
AND event_time =< '2013-04-03 07:04:00';
Row Key: 1234ABCD
Min event_time: 2013-04-01 00:00:00
Max event_time: 2013-04-04 23:59:59
Row Key: 1234ABCD
Min event_time: 2013-04-05 00:00:00
Max event_time: 2013-04-09 23:59:59
Row Key: 1234ABCD
Min event_time: 2013-03-27 00:00:00
Max event_time: 2013-03-31 23:59:59
?
This one
Ingestion models
• Apache Kafka
• Apache Flume
• Storm
• Spark Streaming
• Custom Applications
Apache Kafka
Your totally!
killer!
application
Kafka + Storm
• Kafka provides reliable queuing
• Storm processes (rollups, counts)
• Cassandra stores at the same speed
• Storm lookup on Cassandra
Apache Kafka
Apache Storm
Queue Process Store
Flume
• Source accepts data
• Channel buffers data
• Sink processes and stores
• Popular for log processing
Sink
Channel
Source
Application
Load
Balancer
Syslog
Dealing with data at speed
• 1 million writes per second?
• 1 insert every microsecond
• Collisions?
• Primary Key determines node
placement
• Random partitioning
• Special data type - TimeUUID
Your totally!
killer!
application weatherstation_id='1234ABCD'
weatherstation_id='5678EFGH'
How does data replicate?
Primary key determines placement*
Partitioning
jim age: 36 car: camaro gender: M
carol age: 37 car: subaru gender: F
johnny age:12 gender: M
suzy age:10 gender: F
jim
carol
johnny
suzy
PK
5e02739678...
a9a0198010...
f4eb27cea7...
78b421309e...
MD5 Hash
MD5* hash
operation yields
a 128-bit
number for keys
of any size.
Key Hashing
Node A
Node D Node C
Node B
The Token Ring
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
jim 5e02739678...
carol a9a0198010...
johnny f4eb27cea7...
suzy 78b421309e...
Start End
A 0xc000000000..1 0x0000000000..0
B 0x0000000000..1 0x4000000000..0
C 0x4000000000..1 0x8000000000..0
D 0x8000000000..1 0xc000000000..0
Node A
Node D Node C
Node B
carol a9a0198010...
Replication
Node A
Node D Node C
Node B
carol a9a0198010...
Replication
Node A
Node D Node C
Node B
carol a9a0198010...
Replication
Replication factor = 3
Consistency is a
different topic for
later
TimeUUID
• Also known as a Version 1 UUID
• Sortable
• Reversible
Timestamp to Microsecond + UUID = TimeUUID
04d580b0-9412-11e3-baa8-0800200c9a66 Wednesday, February 12, 2014 6:18:06 PM GMT
http://www.famkruithof.net/uuid/uuidgen
=
Example 2: Financial Transactions
• Trading of stocks
• When did they happen?
• Massive speeds and volumes
“Sirca, a non-profit university consortium based in Sydney, is the world’s biggest broker of
financial data, ingesting into its database 2million pieces of information a second from every
major trading exchange.”*
* http://www.theage.com.au/it-pro/business-it/help-poverty-theres-an-app-for-that-20140120-hv948.html
Use case
• Store data per symbol and date
• Store time series in reverse order: last to first
• Make sure every transaction is unique
• Get all trades for symbol and day
• Get trade for a single date and time
• Get last 10 trades for symbol and date
Needed Queries
Data Model to support queries
Data Model
• date is int of days since epoch
• timeuuid keeps it unique
• Reverse the times for later
queries
CREATE TABLE stock_ticks (
symbol text,
date int,
trade timeuuid,
trade_details text,
PRIMARY KEY ((symbol, date), trade)
) WITH CLUSTERING ORDER BY (trade DESC);
INSERT INTO stock_ticks(symbol, date, trade, trade_details)
VALUES (‘NFLX’,340,04d580b0-1431-1e33-baf8-0833200c98a6,'BUY:2000');
!
INSERT INTO stock_ticks(symbol, date, trade, trade_details)
VALUES (‘NFLX’,340,05d580b0-6472-1ef3-a3a8-0430200c9a66,'BUY:300');
!
INSERT INTO stock_ticks(symbol, date, trade, trade_details)
VALUES (‘NFLX’,340,02d580b0-9412-d223-55a8-0976200c9a25,'SELL:450');
!
INSERT INTO stock_ticks(symbol, date, trade, trade_details)
VALUES (‘NFLX’,340,08d580b0-4482-11e3-5fd3-3421200c9a65,'SELL:3000');
Storage Model - Logical View
08d580b0-4482-11e3-5fd3-
3421200c9a65
SELL:3000
02d580b0-9412-
d223-55a8-0976200c9a25
SELL:450
05d580b0-6472-1ef3-
a3a8-0430200c9a66
BUY:300
SELECT trade,trade_details
FROM stock_ticks
WHERE symbol =‘NFLX’ AND date=‘340’;
NFLX:340
NFLX:340
NFLX:340
symbol:date trade trade_details
04d580b0-1431-1e33-
baf8-0833200c98a6
BUY:2000
NFLX:340
Last thing inserted
First thing inserted
04d580b0-1431-1e33-
baf8-0833200c98a6
05d580b0-6472-1ef3-
a3a8-0430200c9a66
02d580b0-9412-d223-55a8
BUY:2000BUY:300
08d580b0-4482-11e3-5fd3-
3421200c9a65
SELL:3000 SELL:450
Storage Model - Disk Layout
NFLX:340
Order is from last trade to first
SELECT trade,trade_details
FROM stock_ticks
WHERE symbol =‘NFLX’ AND date=‘340’;
04d580b0-1431-1e33-
baf8-0833200c98a6
05d580b0-6472-1ef3-
a3a8-0430200c9a66
02d580b0-9412-
d223-55a8-0976200c9a25
Query patterns
• Limit queries
• Get last X trades
From here
SELECT trade,trade_details
FROM stock_ticks
WHERE symbol =‘NFLX’ AND date=‘340’
LIMIT 3;
BUY:2000BUY:300
08d580b0-4482-11e3-5fd3-
3421200c9a65
SELL:3000 SELL:450
NFLX:340
to here
Query patterns
Reverse sorted by trade
Last 3 trades
08d580b0-4482-11e3-5fd3-
3421200c9a65
SELL:3000
02d580b0-9412-
d223-55a8-0976200c9a25
SELL:450
05d580b0-6472-1ef3-
a3a8-0430200c9a66
BUY:300
NFLX:340
NFLX:340
NFLX:340
symbol:date trade trade_details
• Limit queries
• Get last X trades
SELECT trade,trade_details
FROM stock_ticks
WHERE symbol =‘NFLX’ AND date=‘340’
LIMIT 3;
Way more examples
• 5 minute interviews
• Use cases
• Free training!
!
www.planetcassandra.org
Thank You!
Follow me for more updates all the time: @PatrickMcFadin

More Related Content

What's hot

Bulk Loading into Cassandra
Bulk Loading into CassandraBulk Loading into Cassandra
Bulk Loading into CassandraBrian Hess
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseDataWorks Summit
 
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and ParquetBig Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and ParquetDataWorks Summit
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflixVinay Kumar Chella
 
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016DataStax
 
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsDB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsJohn Beresniewicz
 
Mindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developersMindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developersKeshav Murthy
 
MySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete TutorialMySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete TutorialSveta Smirnova
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesDataWorks Summit
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingAmir Reza Hashemi
 
Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability Mydbops
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataDataWorks Summit
 
MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)Colin Charles
 
MySQL_MariaDB-성능개선-202201.pptx
MySQL_MariaDB-성능개선-202201.pptxMySQL_MariaDB-성능개선-202201.pptx
MySQL_MariaDB-성능개선-202201.pptxNeoClova
 
How we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseHow we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseDataWorks Summit
 

What's hot (20)

Bulk Loading into Cassandra
Bulk Loading into CassandraBulk Loading into Cassandra
Bulk Loading into Cassandra
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and ParquetBig Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
Big Data Storage - Comparing Speed and Features for Avro, JSON, ORC, and Parquet
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Looking towards an official cassandra sidecar netflix
Looking towards an official cassandra sidecar   netflixLooking towards an official cassandra sidecar   netflix
Looking towards an official cassandra sidecar netflix
 
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
 
Introduction to Galera Cluster
Introduction to Galera ClusterIntroduction to Galera Cluster
Introduction to Galera Cluster
 
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsDB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
 
Mindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developersMindmap: Oracle to Couchbase for developers
Mindmap: Oracle to Couchbase for developers
 
MySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete TutorialMySQL Performance Schema in Action: the Complete Tutorial
MySQL Performance Schema in Action: the Complete Tutorial
 
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on KubernetesApache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
Apache Druid Auto Scale-out/in for Streaming Data Ingestion on Kubernetes
 
PostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / ShardingPostgreSQL Table Partitioning / Sharding
PostgreSQL Table Partitioning / Sharding
 
Galera cluster for high availability
Galera cluster for high availability Galera cluster for high availability
Galera cluster for high availability
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
AWR and ASH Deep Dive
AWR and ASH Deep DiveAWR and ASH Deep Dive
AWR and ASH Deep Dive
 
MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)MariaDB: in-depth (hands on training in Seoul)
MariaDB: in-depth (hands on training in Seoul)
 
Amazon Redshift Deep Dive
Amazon Redshift Deep Dive Amazon Redshift Deep Dive
Amazon Redshift Deep Dive
 
MySQL_MariaDB-성능개선-202201.pptx
MySQL_MariaDB-성능개선-202201.pptxMySQL_MariaDB-성능개선-202201.pptx
MySQL_MariaDB-성능개선-202201.pptx
 
How we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseHow we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBase
 
Apache Cassandra at Macys
Apache Cassandra at MacysApache Cassandra at Macys
Apache Cassandra at Macys
 

Similar to Time series with Apache Cassandra - Long version

Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strataPatrick McFadin
 
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingCassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingDataStax Academy
 
Data Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and SparkData Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and SparkChristopher Batey
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014Patrick McFadin
 
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...DataStax Academy
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and SparkPatrick McFadin
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015Patrick McFadin
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataPatrick McFadin
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinChristian Johannsen
 
Getting started with Cassandra 2.1
Getting started with Cassandra 2.1Getting started with Cassandra 2.1
Getting started with Cassandra 2.1Viswanath J
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyDataStax Academy
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series AnalysisQAware GmbH
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and SparkJosef Adersberger
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Lucidworks
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Matthias Niehoff
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinDataStax Academy
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraPatrick McFadin
 
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...it-people
 
C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1
C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1
C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1DataStax Academy
 

Similar to Time series with Apache Cassandra - Long version (20)

Time series with apache cassandra strata
Time series with apache cassandra   strataTime series with apache cassandra   strata
Time series with apache cassandra strata
 
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data ModelingCassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
Cassandra Day SV 2014: Fundamentals of Apache Cassandra Data Modeling
 
Data Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and SparkData Science Lab Meetup: Cassandra and Spark
Data Science Lab Meetup: Cassandra and Spark
 
Introduction to cassandra 2014
Introduction to cassandra 2014Introduction to cassandra 2014
Introduction to cassandra 2014
 
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Nike Tech Talk:  Double Down on Apache Cassandra and SparkNike Tech Talk:  Double Down on Apache Cassandra and Spark
Nike Tech Talk: Double Down on Apache Cassandra and Spark
 
Owning time series with team apache Strata San Jose 2015
Owning time series with team apache   Strata San Jose 2015Owning time series with team apache   Strata San Jose 2015
Owning time series with team apache Strata San Jose 2015
 
Apache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series dataApache cassandra & apache spark for time series data
Apache cassandra & apache spark for time series data
 
1 Dundee - Cassandra 101
1 Dundee - Cassandra 1011 Dundee - Cassandra 101
1 Dundee - Cassandra 101
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
 
Getting started with Cassandra 2.1
Getting started with Cassandra 2.1Getting started with Cassandra 2.1
Getting started with Cassandra 2.1
 
Managing Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al TobeyManaging Cassandra at Scale by Al Tobey
Managing Cassandra at Scale by Al Tobey
 
Time Series Analysis
Time Series AnalysisTime Series Analysis
Time Series Analysis
 
Time Series Processing with Solr and Spark
Time Series Processing with Solr and SparkTime Series Processing with Solr and Spark
Time Series Processing with Solr and Spark
 
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
Time Series Processing with Solr and Spark: Presented by Josef Adersberger, Q...
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
 
Analyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and CassandraAnalyzing Time Series Data with Apache Spark and Cassandra
Analyzing Time Series Data with Apache Spark and Cassandra
 
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
Jonathan Ellis "Apache Cassandra 2.0 and 2.1". Выступление на Cassandra conf ...
 
C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1
C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1
C* Summit EU 2013: Keynote by Jonathan Ellis — Cassandra 2.0 & 2.1
 

More from Patrick McFadin

Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast DataPatrick McFadin
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!Patrick McFadin
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team ApachePatrick McFadin
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelinesPatrick McFadin
 
Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.Patrick McFadin
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterprisePatrick McFadin
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced previewPatrick McFadin
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraPatrick McFadin
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the firePatrick McFadin
 
Making money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guideMaking money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guidePatrick McFadin
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on firePatrick McFadin
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesPatrick McFadin
 
Cassandra 2.0 better, faster, stronger
Cassandra 2.0   better, faster, strongerCassandra 2.0   better, faster, stronger
Cassandra 2.0 better, faster, strongerPatrick McFadin
 
Building Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraBuilding Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraPatrick McFadin
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data modelPatrick McFadin
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data modelPatrick McFadin
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talkPatrick McFadin
 

More from Patrick McFadin (20)

Successful Architectures for Fast Data
Successful Architectures for Fast DataSuccessful Architectures for Fast Data
Successful Architectures for Fast Data
 
Open source or proprietary, choose wisely!
Open source or proprietary,  choose wisely!Open source or proprietary,  choose wisely!
Open source or proprietary, choose wisely!
 
An Introduction to time series with Team Apache
An Introduction to time series with Team ApacheAn Introduction to time series with Team Apache
An Introduction to time series with Team Apache
 
Laying down the smack on your data pipelines
Laying down the smack on your data pipelinesLaying down the smack on your data pipelines
Laying down the smack on your data pipelines
 
Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.Help! I want to contribute to an Open Source project but my boss says no.
Help! I want to contribute to an Open Source project but my boss says no.
 
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax EnterpriseA Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
A Cassandra + Solr + Spark Love Triangle Using DataStax Enterprise
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Introduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandraIntroduction to data modeling with apache cassandra
Introduction to data modeling with apache cassandra
 
Apache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fireApache cassandra and spark. you got the the lighter, let's start the fire
Apache cassandra and spark. you got the the lighter, let's start the fire
 
Making money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guideMaking money with open source and not losing your soul: A practical guide
Making money with open source and not losing your soul: A practical guide
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
 
Cassandra 2.0 and timeseries
Cassandra 2.0 and timeseriesCassandra 2.0 and timeseries
Cassandra 2.0 and timeseries
 
Cassandra 2.0 better, faster, stronger
Cassandra 2.0   better, faster, strongerCassandra 2.0   better, faster, stronger
Cassandra 2.0 better, faster, stronger
 
Building Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache CassandraBuilding Antifragile Applications with Apache Cassandra
Building Antifragile Applications with Apache Cassandra
 
Cassandra at scale
Cassandra at scaleCassandra at scale
Cassandra at scale
 
The world's next top data model
The world's next top data modelThe world's next top data model
The world's next top data model
 
Become a super modeler
Become a super modelerBecome a super modeler
Become a super modeler
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data model
 
Cassandra Virtual Node talk
Cassandra Virtual Node talkCassandra Virtual Node talk
Cassandra Virtual Node talk
 

Recently uploaded

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Recently uploaded (20)

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

Time series with Apache Cassandra - Long version

  • 1. ©2013 DataStax Confidential. Do not distribute without consent. @PatrickMcFadin Patrick McFadin
 Chief Evangelist Time Series with Apache Cassandra 1
  • 2. Quick intro to Cassandra • Shared nothing • Masterless peer-to-peer • Based on Dynamo
  • 3. Scaling • Add nodes to scale • Millions Ops/s Cassandra HBase Redis MySQL THROUGHPUTOPS/SEC)
  • 4. Uptime • Built to replicate • Resilient to failure • Always on NONE
  • 5. Easy to use • CQL is a familiar syntax • Friendly to programmers • Paxos for locking CREATE TABLE users (! username varchar,! firstname varchar,! lastname varchar,! email list<varchar>,! password varchar,! created_date timestamp,! PRIMARY KEY (username)! ); INSERT INTO users (username, firstname, lastname, ! email, password, created_date)! VALUES ('pmcfadin','Patrick','McFadin',! ['patrick@datastax.com'],'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00');! INSERT INTO users (username, firstname, ! lastname, email, password, created_date)! VALUES ('pmcfadin','Patrick','McFadin',! ['patrick@datastax.com'],! 'ba27e03fd95e507daf2937c937d499ab',! '2011-06-20 13:50:00')! IF NOT EXISTS;
  • 6. Time series in production • It’s all about “What’s happening” • Data is the new currency
  • 7. Stack Driver • AWS and Rackspace monitoring • Quick indexes • Batch rollup results
  • 8. MyDrive • Moved from Mongo to Cassandra • Queue processing • Bound at the storing data “One thing that is not at all obvious from the graph is that the system was also under massively heavier strain after the switch to Cassandra because of additional bulk processing going on in the background.” - Karl Matthias, MyDrive
  • 9. Paddy Power • Real-time product and pricing • Much like stock tickers • Active-active across two data centers “Specifically for Cassandra and Datastax, the ability to process time-series data is something that lots of companies have done in the past, not something that we were very aware of, and that was one of the reasons why we chose this as the first use case for Cassandra.” - John Turner, Paddy Power
  • 10. Internet Of Things • 15B devices by 2015 • 40B devices by 2020!
  • 11. Why Cassandra for Time Series Scales Resilient Good data model Efficient Storage Model What about that?
  • 12. Example 1: Weather Station • Weather station collects data • Cassandra stores in sequence • Application reads in sequence
  • 13. Use case • Store data per weather station • Store time series in order: first to last • Get all data for one weather station • Get data for a single date and time • Get data for a range of dates and times Needed Queries Data Model to support queries
  • 14. Data Model • Weather Station Id and Time are unique • Store as many as needed CREATE TABLE temperature ( weatherstation_id text, event_time timestamp, temperature text, PRIMARY KEY (weatherstation_id,event_time) ); INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:01:00','72F'); ! INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:02:00','73F'); ! INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:03:00','73F'); ! INSERT INTO temperature(weatherstation_id,event_time,temperature) VALUES ('1234ABCD','2013-04-03 07:04:00','74F');
  • 15. Storage Model - Logical View 2013-04-03 07:01:00 72F 2013-04-03 07:02:00 73F 2013-04-03 07:03:00 73F SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD'; 1234ABCD 1234ABCD 1234ABCD weatherstation_id event_time temperature 2013-04-03 07:04:00 74F 1234ABCD
  • 16. Storage Model - Disk Layout 2013-04-03 07:01:00 72F 2013-04-03 07:02:00 73F 2013-04-03 07:03:00 73F 1234ABCD 2013-04-03 07:04:00 74F SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD'; Merged, Sorted and Stored Sequentially 2013-04-03 07:05:00! ! 74F 2013-04-03 07:06:00! ! 75F
  • 17. Query patterns • Range queries • “Slice” operation on disk SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD' AND event_time >= '2013-04-03 07:01:00' AND event_time <= '2013-04-03 07:04:00'; 2013-04-03 07:01:00 72F 2013-04-03 07:02:00 73F 2013-04-03 07:03:00 73F 1234ABCD 2013-04-03 07:04:00 74F 2013-04-03 07:05:00! ! 74F 2013-04-03 07:06:00! ! 75F Single seek on disk
  • 18. Query patterns • Range queries • “Slice” operation on disk SELECT weatherstation_id,event_time,temperature FROM temperature WHERE weatherstation_id='1234ABCD' AND event_time >= '2013-04-03 07:01:00' AND event_time <= '2013-04-03 07:04:00'; 2013-04-03 07:01:00 72F 2013-04-03 07:02:00 73F 2013-04-03 07:03:00 73F 1234ABCD 2013-04-03 07:04:00 74F weatherstation_id event_time temperature 1234ABCD 1234ABCD 1234ABCD Programmers like this Sorted by event_time
  • 19. Additional help on the storage engine
  • 20. SSTable seeks • Each read minimum 1 seek • Cache and bloom filter help minimize Total seek time = Disk Latency * number of seeks
  • 21. The key to speed Use the first part of the primary key to get the node (data localization) Minimize seeks for SStables (Bloom Filter,Key Cache up-to-date) Find the data fast in the SSTable (Indexes)
  • 22. Min/Max Value Hint • New since 2.0 • Range index on primary key values per SSTable • Minimizes seeks on range data CASSANDRA-5514 if you are interested in details SELECT temperature FROM event_time,temperature WHERE weatherstation_id='1234ABCD' AND event_time => '2013-04-03 07:01:00' AND event_time =< '2013-04-03 07:04:00'; Row Key: 1234ABCD Min event_time: 2013-04-01 00:00:00 Max event_time: 2013-04-04 23:59:59 Row Key: 1234ABCD Min event_time: 2013-04-05 00:00:00 Max event_time: 2013-04-09 23:59:59 Row Key: 1234ABCD Min event_time: 2013-03-27 00:00:00 Max event_time: 2013-03-31 23:59:59 ? This one
  • 23. Ingestion models • Apache Kafka • Apache Flume • Storm • Spark Streaming • Custom Applications Apache Kafka Your totally! killer! application
  • 24. Kafka + Storm • Kafka provides reliable queuing • Storm processes (rollups, counts) • Cassandra stores at the same speed • Storm lookup on Cassandra Apache Kafka Apache Storm Queue Process Store
  • 25. Flume • Source accepts data • Channel buffers data • Sink processes and stores • Popular for log processing Sink Channel Source Application Load Balancer Syslog
  • 26. Dealing with data at speed • 1 million writes per second? • 1 insert every microsecond • Collisions? • Primary Key determines node placement • Random partitioning • Special data type - TimeUUID Your totally! killer! application weatherstation_id='1234ABCD' weatherstation_id='5678EFGH'
  • 27. How does data replicate?
  • 28. Primary key determines placement* Partitioning jim age: 36 car: camaro gender: M carol age: 37 car: subaru gender: F johnny age:12 gender: M suzy age:10 gender: F
  • 30. Node A Node D Node C Node B The Token Ring
  • 31. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  • 32. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  • 33. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  • 34. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  • 35. jim 5e02739678... carol a9a0198010... johnny f4eb27cea7... suzy 78b421309e... Start End A 0xc000000000..1 0x0000000000..0 B 0x0000000000..1 0x4000000000..0 C 0x4000000000..1 0x8000000000..0 D 0x8000000000..1 0xc000000000..0
  • 36. Node A Node D Node C Node B carol a9a0198010... Replication
  • 37. Node A Node D Node C Node B carol a9a0198010... Replication
  • 38. Node A Node D Node C Node B carol a9a0198010... Replication Replication factor = 3 Consistency is a different topic for later
  • 39. TimeUUID • Also known as a Version 1 UUID • Sortable • Reversible Timestamp to Microsecond + UUID = TimeUUID 04d580b0-9412-11e3-baa8-0800200c9a66 Wednesday, February 12, 2014 6:18:06 PM GMT http://www.famkruithof.net/uuid/uuidgen =
  • 40. Example 2: Financial Transactions • Trading of stocks • When did they happen? • Massive speeds and volumes “Sirca, a non-profit university consortium based in Sydney, is the world’s biggest broker of financial data, ingesting into its database 2million pieces of information a second from every major trading exchange.”* * http://www.theage.com.au/it-pro/business-it/help-poverty-theres-an-app-for-that-20140120-hv948.html
  • 41. Use case • Store data per symbol and date • Store time series in reverse order: last to first • Make sure every transaction is unique • Get all trades for symbol and day • Get trade for a single date and time • Get last 10 trades for symbol and date Needed Queries Data Model to support queries
  • 42. Data Model • date is int of days since epoch • timeuuid keeps it unique • Reverse the times for later queries CREATE TABLE stock_ticks ( symbol text, date int, trade timeuuid, trade_details text, PRIMARY KEY ((symbol, date), trade) ) WITH CLUSTERING ORDER BY (trade DESC); INSERT INTO stock_ticks(symbol, date, trade, trade_details) VALUES (‘NFLX’,340,04d580b0-1431-1e33-baf8-0833200c98a6,'BUY:2000'); ! INSERT INTO stock_ticks(symbol, date, trade, trade_details) VALUES (‘NFLX’,340,05d580b0-6472-1ef3-a3a8-0430200c9a66,'BUY:300'); ! INSERT INTO stock_ticks(symbol, date, trade, trade_details) VALUES (‘NFLX’,340,02d580b0-9412-d223-55a8-0976200c9a25,'SELL:450'); ! INSERT INTO stock_ticks(symbol, date, trade, trade_details) VALUES (‘NFLX’,340,08d580b0-4482-11e3-5fd3-3421200c9a65,'SELL:3000');
  • 43. Storage Model - Logical View 08d580b0-4482-11e3-5fd3- 3421200c9a65 SELL:3000 02d580b0-9412- d223-55a8-0976200c9a25 SELL:450 05d580b0-6472-1ef3- a3a8-0430200c9a66 BUY:300 SELECT trade,trade_details FROM stock_ticks WHERE symbol =‘NFLX’ AND date=‘340’; NFLX:340 NFLX:340 NFLX:340 symbol:date trade trade_details 04d580b0-1431-1e33- baf8-0833200c98a6 BUY:2000 NFLX:340 Last thing inserted First thing inserted
  • 44. 04d580b0-1431-1e33- baf8-0833200c98a6 05d580b0-6472-1ef3- a3a8-0430200c9a66 02d580b0-9412-d223-55a8 BUY:2000BUY:300 08d580b0-4482-11e3-5fd3- 3421200c9a65 SELL:3000 SELL:450 Storage Model - Disk Layout NFLX:340 Order is from last trade to first SELECT trade,trade_details FROM stock_ticks WHERE symbol =‘NFLX’ AND date=‘340’;
  • 45. 04d580b0-1431-1e33- baf8-0833200c98a6 05d580b0-6472-1ef3- a3a8-0430200c9a66 02d580b0-9412- d223-55a8-0976200c9a25 Query patterns • Limit queries • Get last X trades From here SELECT trade,trade_details FROM stock_ticks WHERE symbol =‘NFLX’ AND date=‘340’ LIMIT 3; BUY:2000BUY:300 08d580b0-4482-11e3-5fd3- 3421200c9a65 SELL:3000 SELL:450 NFLX:340 to here
  • 46. Query patterns Reverse sorted by trade Last 3 trades 08d580b0-4482-11e3-5fd3- 3421200c9a65 SELL:3000 02d580b0-9412- d223-55a8-0976200c9a25 SELL:450 05d580b0-6472-1ef3- a3a8-0430200c9a66 BUY:300 NFLX:340 NFLX:340 NFLX:340 symbol:date trade trade_details • Limit queries • Get last X trades SELECT trade,trade_details FROM stock_ticks WHERE symbol =‘NFLX’ AND date=‘340’ LIMIT 3;
  • 47. Way more examples • 5 minute interviews • Use cases • Free training! ! www.planetcassandra.org
  • 48.
  • 49. Thank You! Follow me for more updates all the time: @PatrickMcFadin