SlideShare a Scribd company logo
1 of 52
Download to read offline
Intro to

Cassandra
  Tyler Hobbs
History


Dynamo                     BigTable
(clustering)               (data model)




               Cassandra
Users
Clustering

    Every node plays the same role
    – No masters, slaves, or special nodes
    – No single point of failure
Consistent Hashing

           0

     50          10




     40          20

           30
Consistent Hashing
                      Key: “www.google.com”
           0

     50          10




     40          20

           30
Consistent Hashing
                      Key: “www.google.com”
           0
                      md5(“www.google.com”)
     50          10

                               14

     40          20

           30
Consistent Hashing
                      Key: “www.google.com”
           0
                      md5(“www.google.com”)
     50          10

                               14

     40          20

           30
Consistent Hashing
                      Key: “www.google.com”
           0
                      md5(“www.google.com”)
     50          10

                               14

     40          20

           30
Consistent Hashing
                        Key: “www.google.com”
           0
                        md5(“www.google.com”)
     50          10

                                   14

     40          20

           30
                Replication Factor = 3
Clustering

    Client can talk to any node
Scaling

RF = 2             0


              50        10

The node at
50 owns the
red portion             20

                   30
Scaling

RF = 2               0


                50        10



   Add a new    40        20
   node at 40
                     30
Scaling

RF = 2               0


                50        10



   Add a new    40        20
   node at 40
                     30
Node Failures

RF = 2               0


                50        10

   Replicas
                40        20

                     30
Node Failures

RF = 2               0


                50        10

   Replicas
                40        20

                     30
Node Failures

RF = 2               0


                50        10




                40        20

                     30
Consistency, Availability

    Consistency
    – Can I read stale data?

    Availability
    – Can I write/read at all?

    Tunable Consistency
Consistency

    N = Total number of replicas

    R = Number of replicas read from
    – (before the response is returned)

    W = Number of replicas written to
    – (before the write is considered a success)
Consistency

    N = Total number of replicas

    R = Number of replicas read from
    – (before the response is returned)

    W = Number of replicas written to
    – (before the write is considered a success)


    W + R > N gives strong consistency
Consistency
 W + R > N gives strong consistency

 N=3
 W=2
 R=2

 2 + 2 > 3 ==> strongly consistent
Consistency
 W + R > N gives strong consistency

 N=3
 W=2
 R=2

 2 + 2 > 3 ==> strongly consistent

 Only 2 of the 3 replicas must be
 available.
Consistency

    Tunable Consistency
    – Specify N (Replication Factor) per data set
    – Specify R, W per operation
Consistency

    Tunable Consistency
    – Specify N (Replication Factor) per data set
    – Specify R, W per operation
    – Quorum: N/2 + 1
       • R = W = Quorum
       • Strong consistency
       • Tolerate the loss of N – Quorum replicas
    – R, W can also be 1 or N
Availability

    Can tolerate the loss of:
    – N – R replicas for reads
    – N – W replicas for writes
CAP Theorem
During node or network failure:



          100%
                                          Not
                                          Possible

   Availability
                     Possible




                     Consistency   100%
CAP Theorem
During node or network failure:



          100%
                                                 Not
                            Ca                   Possible
                              ss
                                an
                                   dr
   Availability                       a
                     Possible




                     Consistency          100%
Clustering

    No single point of failure

    Replication that works

    Scales linearly
    – 2x nodes = 2x performance
       • For both writes and reads
    – Up to 100's of nodes

    Operationally simple

    Multi-Datacenter Replication
Data Model

    Comes from Google BigTable

    Goals
    – Minimize disk seeks
    – High throughput
    – Low latency
    – Durable
Data Model

    Keyspace
    – A collection of Column Families
    – Controls replication settings

    Column Family
    – Kinda resembles a table
Column Families

    Static
    – Object data
    – Similar to a table in a relational database

    Dynamic
    – Pre-calculated query results
    – Materialized views
Static Column Families
                   Users
   zznate    password: *    name: Nate


   driftx    password: *   name: Brandon


   thobbs    password: *    name: Tyler


   jbellis   password: *   name: Jonathan   site: riptano.com
Dynamic Column Families

    Rows
    – Each row has a unique primary key
    – Sorted list of (name, value) tuples
       • Like a sorted map or dictionary
    – The (name, value) tuple is called a “column”
Dynamic Column Families
                     Following
zznate    driftx:   thobbs:


driftx


thobbs    zznate:


jbellis   driftx:   mdennis:   pcmanus   thobbs:   xedin:   zznate
Dynamic Column Families

    Column Timestamps
    – Each column (tuple) has a timestamp
    – In the case of a collision, the latest timestamp wins
    – Client specifies timestamp with write
    – Writes are idempotent
       • Infinite retries allowed
Dynamic Column Families

    Other Examples:
    – Timeline of tweets by a user
    – Timeline of tweets by all of the people a user is
      following
    – List of comments sorted by score
    – List of friends grouped by state
The Data API

    Two choices
    – RPC-based API
    – CQL
       • Cassandra Query Language
Inserting Data
 INSERT INTO users (KEY, “name”, “age”)
     VALUES (“thobbs”, “Tyler”, 24);
Updating Data
 Updates are the same as inserts:
 INSERT INTO users (KEY, “age”)
     VALUES (“thobbs”, 34);


 Or
 UPDATE users SET “age” = 34
     WHERE KEY = “thobbs”;
Fetching Data
 Whole row select:
 SELECT * FROM users WHERE KEY = “thobbs”;
Fetching Data
 Explicit column select:
 SELECT “name”, “age” FROM users
     WHERE KEY = “thobbs”;
Fetching Data
 Get a slice of columns
 UPDATE letters SET 1='a', 2='b', 3='c', 4='d', 5='e'
     WHERE KEY = “key”;

 SELECT 1..3 FROM letters WHERE KEY = “key”;


 Returns [(1, a), (2, b), (3, c)]
Fetching Data
 Get a slice of columns
 SELECT FIRST 2 FROM letters WHERE KEY = “key”;


 Returns [(1, a), (2, b)]

 SELECT FIRST 2 REVERSED FROM letters
     WHERE KEY = “key”;


 Returns [(5, e), (4, d)]
Fetching Data
 Get a slice of columns
 SELECT 3..'' FROM letters WHERE KEY = “key”;


 Returns [(3, c), (4, d), (5, e)]

 SELECT FIRST 2 REVERSED 4..'' FROM letters
     WHERE KEY = “key”;


 Returns [(4, d), (3, c)]
Deleting Data
 Delete a whole row:
 DELETE FROM users WHERE KEY = “thobbs”;

 Delete specific columns:
 DELETE “age” FROM users
     WHERE KEY = “thobbs”;
Secondary Indexes
 Builtin basic indexes
 CREATE INDEX ageIndex ON users (age);

 SELECT name FROM USERS
     WHERE age = 24 AND state = “TX”;
Performance

    Writes
    – 10k – 30k per second per node
    – Sub-millisecond latency

    Reads
    – 1k – 10k per second per node
    – Depends on data set, caching
    – Usually 0.1 to 10ms latency
Other Features

    Distributed Counters
    – Can support millions of high-volume counters

    Excellent Multi-datacenter Support
    – Disaster recovery
    – Locality

    Hadoop Integration
    – Isolation of resources
    – Hive and Pig drivers

    Compression
What Cassandra Can't Do

    Transactions
    – Unless you use a distributed lock
    – Atomicity, Isolation
    – These aren't needed as often as you'd think

    Limited support for ad-hoc queries
    – Know what you want to do with the data
Not One-size-fits-all

    Use alongside an RDBMS
    – Use the RDBMS for highly-transactional or highly-
      relational data
       • Usually a small set of data
    – Let Cassandra scale to handle the rest
Language Support

    Good:
    – Java
    – Python
    – Ruby
    – PHP
    – C#

    Coming Soon:
    – Everything else, now that we have CQL
Questions?

          Tyler Hobbs
               @tylhobbs
       tyler@datastax.com

More Related Content

What's hot

groovy databases
groovy databasesgroovy databases
groovy databasesPaul King
 
SunshinePHP 2017 - Making the most out of MySQL
SunshinePHP 2017 - Making the most out of MySQLSunshinePHP 2017 - Making the most out of MySQL
SunshinePHP 2017 - Making the most out of MySQLGabriela Ferrara
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRick Copeland
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1jbellis
 
MongoDB London 2013: Basic Replication in MongoDB presented by Marc Schwering...
MongoDB London 2013: Basic Replication in MongoDB presented by Marc Schwering...MongoDB London 2013: Basic Replication in MongoDB presented by Marc Schwering...
MongoDB London 2013: Basic Replication in MongoDB presented by Marc Schwering...MongoDB
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeWim Godden
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalabilityWim Godden
 
اسلاید اول جلسه چهارم کلاس پایتون برای هکرهای قانونی
اسلاید اول جلسه چهارم کلاس پایتون برای هکرهای قانونیاسلاید اول جلسه چهارم کلاس پایتون برای هکرهای قانونی
اسلاید اول جلسه چهارم کلاس پایتون برای هکرهای قانونیMohammad Reza Kamalifard
 
The Ring programming language version 1.10 book - Part 56 of 212
The Ring programming language version 1.10 book - Part 56 of 212The Ring programming language version 1.10 book - Part 56 of 212
The Ring programming language version 1.10 book - Part 56 of 212Mahmoud Samir Fayed
 
BDD - Behavior Driven Development Webapps mit Groovy Spock und Geb
BDD - Behavior Driven Development Webapps mit Groovy Spock und GebBDD - Behavior Driven Development Webapps mit Groovy Spock und Geb
BDD - Behavior Driven Development Webapps mit Groovy Spock und GebChristian Baranowski
 
Getting started with replica set in MongoDB
Getting started with replica set in MongoDBGetting started with replica set in MongoDB
Getting started with replica set in MongoDBKishor Parkhe
 
The Ring programming language version 1.5.2 book - Part 45 of 181
The Ring programming language version 1.5.2 book - Part 45 of 181The Ring programming language version 1.5.2 book - Part 45 of 181
The Ring programming language version 1.5.2 book - Part 45 of 181Mahmoud Samir Fayed
 
Graph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easilyGraph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easilyMark Needham
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014jbellis
 
Cassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUGCassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUGDuyhai Doan
 
Introduction databases and MYSQL
Introduction databases and MYSQLIntroduction databases and MYSQL
Introduction databases and MYSQLNaeem Junejo
 

What's hot (20)

groovy databases
groovy databasesgroovy databases
groovy databases
 
SunshinePHP 2017 - Making the most out of MySQL
SunshinePHP 2017 - Making the most out of MySQLSunshinePHP 2017 - Making the most out of MySQL
SunshinePHP 2017 - Making the most out of MySQL
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
 
Spock and Geb in Action
Spock and Geb in ActionSpock and Geb in Action
Spock and Geb in Action
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1
 
MongoDB London 2013: Basic Replication in MongoDB presented by Marc Schwering...
MongoDB London 2013: Basic Replication in MongoDB presented by Marc Schwering...MongoDB London 2013: Basic Replication in MongoDB presented by Marc Schwering...
MongoDB London 2013: Basic Replication in MongoDB presented by Marc Schwering...
 
Load Data Fast!
Load Data Fast!Load Data Fast!
Load Data Fast!
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
 
اسلاید اول جلسه چهارم کلاس پایتون برای هکرهای قانونی
اسلاید اول جلسه چهارم کلاس پایتون برای هکرهای قانونیاسلاید اول جلسه چهارم کلاس پایتون برای هکرهای قانونی
اسلاید اول جلسه چهارم کلاس پایتون برای هکرهای قانونی
 
The Ring programming language version 1.10 book - Part 56 of 212
The Ring programming language version 1.10 book - Part 56 of 212The Ring programming language version 1.10 book - Part 56 of 212
The Ring programming language version 1.10 book - Part 56 of 212
 
BDD - Behavior Driven Development Webapps mit Groovy Spock und Geb
BDD - Behavior Driven Development Webapps mit Groovy Spock und GebBDD - Behavior Driven Development Webapps mit Groovy Spock und Geb
BDD - Behavior Driven Development Webapps mit Groovy Spock und Geb
 
Getting started with replica set in MongoDB
Getting started with replica set in MongoDBGetting started with replica set in MongoDB
Getting started with replica set in MongoDB
 
The Ring programming language version 1.5.2 book - Part 45 of 181
The Ring programming language version 1.5.2 book - Part 45 of 181The Ring programming language version 1.5.2 book - Part 45 of 181
The Ring programming language version 1.5.2 book - Part 45 of 181
 
The ABCs of OTP
The ABCs of OTPThe ABCs of OTP
The ABCs of OTP
 
Graph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easilyGraph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easily
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
 
Materi my sql part 1
Materi my sql part 1Materi my sql part 1
Materi my sql part 1
 
Cassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUGCassandra introduction @ ParisJUG
Cassandra introduction @ ParisJUG
 
Introduction databases and MYSQL
Introduction databases and MYSQLIntroduction databases and MYSQL
Introduction databases and MYSQL
 

Viewers also liked

SC 2015: Thinking Fast and Slow with Software Development
SC 2015: Thinking Fast and Slow with Software DevelopmentSC 2015: Thinking Fast and Slow with Software Development
SC 2015: Thinking Fast and Slow with Software DevelopmentDaniel Bryant
 
Detect all memory leaks with LeakCanary!
 Detect all memory leaks with LeakCanary! Detect all memory leaks with LeakCanary!
Detect all memory leaks with LeakCanary!Pierre-Yves Ricau
 
How Yelp Uses Sensu to Monitor Services in a SOA World
How Yelp Uses Sensu to Monitor Services in a SOA WorldHow Yelp Uses Sensu to Monitor Services in a SOA World
How Yelp Uses Sensu to Monitor Services in a SOA WorldKyle Anderson
 
Datomic – A Modern Database - StampedeCon 2014
Datomic – A Modern Database - StampedeCon 2014Datomic – A Modern Database - StampedeCon 2014
Datomic – A Modern Database - StampedeCon 2014StampedeCon
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)Steven Francia
 
How to name things: the hardest problem in programming
How to name things: the hardest problem in programmingHow to name things: the hardest problem in programming
How to name things: the hardest problem in programmingPeter Hilton
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Patterns for building resilient and scalable microservices platform on AWS
Patterns for building resilient and scalable microservices platform on AWSPatterns for building resilient and scalable microservices platform on AWS
Patterns for building resilient and scalable microservices platform on AWSBoyan Dimitrov
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraDataStax
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data modelPatrick McFadin
 

Viewers also liked (11)

SC 2015: Thinking Fast and Slow with Software Development
SC 2015: Thinking Fast and Slow with Software DevelopmentSC 2015: Thinking Fast and Slow with Software Development
SC 2015: Thinking Fast and Slow with Software Development
 
Detect all memory leaks with LeakCanary!
 Detect all memory leaks with LeakCanary! Detect all memory leaks with LeakCanary!
Detect all memory leaks with LeakCanary!
 
How Yelp Uses Sensu to Monitor Services in a SOA World
How Yelp Uses Sensu to Monitor Services in a SOA WorldHow Yelp Uses Sensu to Monitor Services in a SOA World
How Yelp Uses Sensu to Monitor Services in a SOA World
 
Evolving the Netflix API
Evolving the Netflix APIEvolving the Netflix API
Evolving the Netflix API
 
Datomic – A Modern Database - StampedeCon 2014
Datomic – A Modern Database - StampedeCon 2014Datomic – A Modern Database - StampedeCon 2014
Datomic – A Modern Database - StampedeCon 2014
 
7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)7 Common Mistakes in Go (2015)
7 Common Mistakes in Go (2015)
 
How to name things: the hardest problem in programming
How to name things: the hardest problem in programmingHow to name things: the hardest problem in programming
How to name things: the hardest problem in programming
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Patterns for building resilient and scalable microservices platform on AWS
Patterns for building resilient and scalable microservices platform on AWSPatterns for building resilient and scalable microservices platform on AWS
Patterns for building resilient and scalable microservices platform on AWS
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
 
The data model is dead, long live the data model
The data model is dead, long live the data modelThe data model is dead, long live the data model
The data model is dead, long live the data model
 

Similar to Intro to Cassandra

Cassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails DevsCassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails DevsTyler Hobbs
 
Cassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupCassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupMichael Wynholds
 
Slide presentation pycassa_upload
Slide presentation pycassa_uploadSlide presentation pycassa_upload
Slide presentation pycassa_uploadRajini Ramesh
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelAndrey Lomakin
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectMorningstar Tech Talks
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...DataStax
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandrarantav
 
Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0Russell Spitzer
 
Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?DataWorks Summit
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Boris Yen
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistencyScyllaDB
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentationMurat Çakal
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraHanborq Inc.
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorialmubarakss
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsUniversity of Washington
 
Cassandra 2012 scandit
Cassandra 2012 scanditCassandra 2012 scandit
Cassandra 2012 scanditCharlie Zhu
 
Netcetera
NetceteraNetcetera
NetceteraScandit
 

Similar to Intro to Cassandra (20)

Cassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails DevsCassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails Devs
 
Cassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL MeetupCassandra and Rails at LA NoSQL Meetup
Cassandra and Rails at LA NoSQL Meetup
 
Slide presentation pycassa_upload
Slide presentation pycassa_uploadSlide presentation pycassa_upload
Slide presentation pycassa_upload
 
NoSQL Smackdown!
NoSQL Smackdown!NoSQL Smackdown!
NoSQL Smackdown!
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
 
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
The Best and Worst of Cassandra-stress Tool (Christopher Batey, The Last Pick...
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
Taming Cassandra
Taming CassandraTaming Cassandra
Taming Cassandra
 
Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0Cassandra Fundamentals - C* 2.0
Cassandra Fundamentals - C* 2.0
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
 
Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?Rich placement constraints: Who said YARN cannot schedule services?
Rich placement constraints: Who said YARN cannot schedule services?
 
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012Introduce Apache Cassandra - JavaTwo Taiwan, 2012
Introduce Apache Cassandra - JavaTwo Taiwan, 2012
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistency
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorial
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore Environments
 
Cassandra 2012 scandit
Cassandra 2012 scanditCassandra 2012 scandit
Cassandra 2012 scandit
 
Netcetera
NetceteraNetcetera
Netcetera
 

Recently uploaded

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesExploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesSanjay Willie
 

Recently uploaded (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your QueriesExploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
Exploring ChatGPT Prompt Hacks To Maximally Optimise Your Queries
 

Intro to Cassandra

  • 1. Intro to Cassandra Tyler Hobbs
  • 2. History Dynamo BigTable (clustering) (data model) Cassandra
  • 4. Clustering  Every node plays the same role – No masters, slaves, or special nodes – No single point of failure
  • 5. Consistent Hashing 0 50 10 40 20 30
  • 6. Consistent Hashing Key: “www.google.com” 0 50 10 40 20 30
  • 7. Consistent Hashing Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
  • 8. Consistent Hashing Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
  • 9. Consistent Hashing Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
  • 10. Consistent Hashing Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30 Replication Factor = 3
  • 11. Clustering  Client can talk to any node
  • 12. Scaling RF = 2 0 50 10 The node at 50 owns the red portion 20 30
  • 13. Scaling RF = 2 0 50 10 Add a new 40 20 node at 40 30
  • 14. Scaling RF = 2 0 50 10 Add a new 40 20 node at 40 30
  • 15. Node Failures RF = 2 0 50 10 Replicas 40 20 30
  • 16. Node Failures RF = 2 0 50 10 Replicas 40 20 30
  • 17. Node Failures RF = 2 0 50 10 40 20 30
  • 18. Consistency, Availability  Consistency – Can I read stale data?  Availability – Can I write/read at all?  Tunable Consistency
  • 19. Consistency  N = Total number of replicas  R = Number of replicas read from – (before the response is returned)  W = Number of replicas written to – (before the write is considered a success)
  • 20. Consistency  N = Total number of replicas  R = Number of replicas read from – (before the response is returned)  W = Number of replicas written to – (before the write is considered a success) W + R > N gives strong consistency
  • 21. Consistency W + R > N gives strong consistency N=3 W=2 R=2 2 + 2 > 3 ==> strongly consistent
  • 22. Consistency W + R > N gives strong consistency N=3 W=2 R=2 2 + 2 > 3 ==> strongly consistent Only 2 of the 3 replicas must be available.
  • 23. Consistency  Tunable Consistency – Specify N (Replication Factor) per data set – Specify R, W per operation
  • 24. Consistency  Tunable Consistency – Specify N (Replication Factor) per data set – Specify R, W per operation – Quorum: N/2 + 1 • R = W = Quorum • Strong consistency • Tolerate the loss of N – Quorum replicas – R, W can also be 1 or N
  • 25. Availability  Can tolerate the loss of: – N – R replicas for reads – N – W replicas for writes
  • 26. CAP Theorem During node or network failure: 100% Not Possible Availability Possible Consistency 100%
  • 27. CAP Theorem During node or network failure: 100% Not Ca Possible ss an dr Availability a Possible Consistency 100%
  • 28. Clustering  No single point of failure  Replication that works  Scales linearly – 2x nodes = 2x performance • For both writes and reads – Up to 100's of nodes  Operationally simple  Multi-Datacenter Replication
  • 29. Data Model  Comes from Google BigTable  Goals – Minimize disk seeks – High throughput – Low latency – Durable
  • 30. Data Model  Keyspace – A collection of Column Families – Controls replication settings  Column Family – Kinda resembles a table
  • 31. Column Families  Static – Object data – Similar to a table in a relational database  Dynamic – Pre-calculated query results – Materialized views
  • 32. Static Column Families Users zznate password: * name: Nate driftx password: * name: Brandon thobbs password: * name: Tyler jbellis password: * name: Jonathan site: riptano.com
  • 33. Dynamic Column Families  Rows – Each row has a unique primary key – Sorted list of (name, value) tuples • Like a sorted map or dictionary – The (name, value) tuple is called a “column”
  • 34. Dynamic Column Families Following zznate driftx: thobbs: driftx thobbs zznate: jbellis driftx: mdennis: pcmanus thobbs: xedin: zznate
  • 35. Dynamic Column Families  Column Timestamps – Each column (tuple) has a timestamp – In the case of a collision, the latest timestamp wins – Client specifies timestamp with write – Writes are idempotent • Infinite retries allowed
  • 36. Dynamic Column Families  Other Examples: – Timeline of tweets by a user – Timeline of tweets by all of the people a user is following – List of comments sorted by score – List of friends grouped by state
  • 37. The Data API  Two choices – RPC-based API – CQL • Cassandra Query Language
  • 38. Inserting Data INSERT INTO users (KEY, “name”, “age”) VALUES (“thobbs”, “Tyler”, 24);
  • 39. Updating Data Updates are the same as inserts: INSERT INTO users (KEY, “age”) VALUES (“thobbs”, 34); Or UPDATE users SET “age” = 34 WHERE KEY = “thobbs”;
  • 40. Fetching Data Whole row select: SELECT * FROM users WHERE KEY = “thobbs”;
  • 41. Fetching Data Explicit column select: SELECT “name”, “age” FROM users WHERE KEY = “thobbs”;
  • 42. Fetching Data Get a slice of columns UPDATE letters SET 1='a', 2='b', 3='c', 4='d', 5='e' WHERE KEY = “key”; SELECT 1..3 FROM letters WHERE KEY = “key”; Returns [(1, a), (2, b), (3, c)]
  • 43. Fetching Data Get a slice of columns SELECT FIRST 2 FROM letters WHERE KEY = “key”; Returns [(1, a), (2, b)] SELECT FIRST 2 REVERSED FROM letters WHERE KEY = “key”; Returns [(5, e), (4, d)]
  • 44. Fetching Data Get a slice of columns SELECT 3..'' FROM letters WHERE KEY = “key”; Returns [(3, c), (4, d), (5, e)] SELECT FIRST 2 REVERSED 4..'' FROM letters WHERE KEY = “key”; Returns [(4, d), (3, c)]
  • 45. Deleting Data Delete a whole row: DELETE FROM users WHERE KEY = “thobbs”; Delete specific columns: DELETE “age” FROM users WHERE KEY = “thobbs”;
  • 46. Secondary Indexes Builtin basic indexes CREATE INDEX ageIndex ON users (age); SELECT name FROM USERS WHERE age = 24 AND state = “TX”;
  • 47. Performance  Writes – 10k – 30k per second per node – Sub-millisecond latency  Reads – 1k – 10k per second per node – Depends on data set, caching – Usually 0.1 to 10ms latency
  • 48. Other Features  Distributed Counters – Can support millions of high-volume counters  Excellent Multi-datacenter Support – Disaster recovery – Locality  Hadoop Integration – Isolation of resources – Hive and Pig drivers  Compression
  • 49. What Cassandra Can't Do  Transactions – Unless you use a distributed lock – Atomicity, Isolation – These aren't needed as often as you'd think  Limited support for ad-hoc queries – Know what you want to do with the data
  • 50. Not One-size-fits-all  Use alongside an RDBMS – Use the RDBMS for highly-transactional or highly- relational data • Usually a small set of data – Let Cassandra scale to handle the rest
  • 51. Language Support  Good: – Java – Python – Ruby – PHP – C#  Coming Soon: – Everything else, now that we have CQL
  • 52. Questions? Tyler Hobbs @tylhobbs tyler@datastax.com