SlideShare a Scribd company logo
1 of 25
Download to read offline
Apache Cassandra
    Present and Future
    Jonathan Ellis




Monday, October 24, 2011
History


   Bigtable, 2006                                    Dynamo, 2007



                                             OSS, 2008




                                                  TLP, 2010
                           Incubator, 2009
                                             1.0, October 2011

Monday, October 24, 2011
Why people choose Cassandra

    ✤    Multi-master, multi-DC
    ✤    Linearly scalable
    ✤    Larger-than-memory datasets
    ✤    High performance
    ✤    Full durability
    ✤    Integrated caching
    ✤    Tuneable consistency



Monday, October 24, 2011
Cassandra users

    ✤    Financial
    ✤    Social Media
    ✤    Advertising
    ✤    Entertainment
    ✤    Energy
    ✤    E-tail
    ✤    Health care
    ✤    Government

Monday, October 24, 2011
Road to 1.0

    ✤    Storage engine improvements
    ✤    Specialized replication modes
    ✤    Performance and other real-world considerations
    ✤    Building an ecosystem




Monday, October 24, 2011
Compaction

                           Size-Tiered




                           Leveled




Monday, October 24, 2011
Differences in Cassandra’s leveling

    ✤    ColumnFamily instead of key/value
    ✤    Multithreading (experimental)
    ✤    Optional throttling (16MB/s by default)
    ✤    Per-sstable bloom filter for row keys
    ✤    Larger data files (5MB by default)
    ✤    Does not block writes if compaction falls behind




Monday, October 24, 2011
Column (“secondary”) indexing

cqlsh> CREATE INDEX state_idx ON users(state);
cqlsh> INSERT INTO users (uname, state, birth_date)
                VALUES (‘bsanderson’, ‘UT’, 1975)


    users                                       state_idx
                           state   birth_date
         bsanderson        UT        1975            UT     bsanderson   htayler
           prothfuss        WI       1973            WI      prothfuss
             htayler       UT        1968




Monday, October 24, 2011
Querying


cqlsh> SELECT * FROM users
                WHERE state='UT' AND birth_date > 1970
                ORDER BY tokenize(key);




Monday, October 24, 2011
More sophisticated indexing?

    ✤    Want to:
          ✤   Support more operators
          ✤   Support user-defined ordering
          ✤   Support high-cardinality values
    ✤    But:
          ✤   Scatter/gather scales poorly
          ✤   If it’s not node local, we can’t guarantee atomicity
          ✤   If we can’t guarantee atomicity, doublechecking across nodes is a
              huge penalty


Monday, October 24, 2011
Other storage engine improvements

        ✤    Compression
        ✤    Expiring columns
        ✤    Bounded worst-case reads by re-writing fragmented
             rows




Monday, October 24, 2011
Eventually-consistent counters

    ✤    Counter is partitioned by replica; each replica is “master”
         of its own partition




Monday, October 24, 2011
13

Monday, October 24, 2011
14

Monday, October 24, 2011
15

Monday, October 24, 2011
16

Monday, October 24, 2011
The interesting parts

    ✤    Tuneable consistency
    ✤    Avoiding contention on local increments
          ✤   Store increments, not full value; merge on read + compaction
    ✤    Renewing counter id to deal with data loss
    ✤    Interaction with tombstones




Monday, October 24, 2011
(What about version vectors?)

    ✤    Version vectors allow detecting conflict, but do not give
         enough information to resolve it except in special cases
          ✤   In the counters case, we’d need to hold onto all previous versions
              until we can be sure no new conflict with them can occur
          ✤   Jeff Darcy has a longer discussion at http://pl.atyp.us/
              wordpress/?p=2601




Monday, October 24, 2011
Performance




     A single four-core machine; one million inserts + one million updates



Monday, October 24, 2011
Dealing with the JVM

    ✤    JNA
          ✤   mlockall()
          ✤   posix_fadvise()
          ✤   link()
    ✤    Memory
          ✤   Move cache off-heap
          ✤   In-heap arena allocation for memtables, bloom filters
          ✤   Move compaction to a separate process?



Monday, October 24, 2011
The Cassandra ecosystem

    ✤    Replication into Cassandra
          ✤   Gigaspaces
          ✤   Drizzle
    ✤    Solandra: Cassandra + Solr search
    ✤    DataStax Enterprise: Cassandra + Hadoop analytics




Monday, October 24, 2011
DataStax Enterprise




Monday, October 24, 2011
Operations

    ✤    “Vanilla” Hadoop
          ✤   8+ services to setup, monitor, backup, and recover
              (NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker, Zookeeper,
              Metastore, ...)
          ✤   Single points of failure
          ✤   Can't separate online and offline processing

    ✤    DataStax Enterprise
          ✤   Single, simplified component
          ✤   Peer to peer
          ✤   JobTracker failover
          ✤   No additional cassandra config



Monday, October 24, 2011
What’s next

    ✤    Ease Of Use
    ✤    CQL: “Native” transport, prepared statements
    ✤    Triggers
    ✤    Entity groups
    ✤    Smarter range queries enabling Hive predicate push-
         down
    ✤    Blue sky: streaming / CEP




Monday, October 24, 2011
Questions?

    ✤    jbellis@datastax.com




    ✤    DataStax is hiring!
          ✤   ~15 engineers (35 employees), want to double in 2012
               ✤    Austin, Burlingame, NYC, France, Japan, Belarus
          ✤   100+ customers
          ✤   $11M Series B funding

Monday, October 24, 2011

More Related Content

Similar to Cassandra at High Performance Transaction Systems 2011

Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Amazon Web Services
 
Migrating Social Content to Drupal
Migrating Social Content to DrupalMigrating Social Content to Drupal
Migrating Social Content to DrupalAcquia
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed DatabaseEric Evans
 
Spotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessSpotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessNick Barkas
 
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016DataStax
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... CassandraInstaclustr
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databasesjbellis
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftAmazon Web Services
 
Outside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraOutside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraEric Evans
 
Storage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems PresentationStorage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems Presentationandyman3000
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorialmubarakss
 
An Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User GroupAn Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User GroupCarlos Juzarte Rolo
 
SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)Shy Engelberg
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An OverviewC. Scyphers
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftAmazon Web Services
 

Similar to Cassandra at High Performance Transaction Systems 2011 (20)

Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
 
Migrating Social Content to Drupal
Migrating Social Content to DrupalMigrating Social Content to Drupal
Migrating Social Content to Drupal
 
The Cassandra Distributed Database
The Cassandra Distributed DatabaseThe Cassandra Distributed Database
The Cassandra Distributed Database
 
Spotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great SuccessSpotify: Horizontal Scalability for Great Success
Spotify: Horizontal Scalability for Great Success
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
6269441.ppt
6269441.ppt6269441.ppt
6269441.ppt
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon Redshift
 
Outside The Box With Apache Cassnadra
Outside The Box With Apache CassnadraOutside The Box With Apache Cassnadra
Outside The Box With Apache Cassnadra
 
Storage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems PresentationStorage Systems for High Scalable Systems Presentation
Storage Systems for High Scalable Systems Presentation
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Cassandra Tutorial
Cassandra TutorialCassandra Tutorial
Cassandra Tutorial
 
An Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User GroupAn Introduction to Cassandra - Oracle User Group
An Introduction to Cassandra - Oracle User Group
 
SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 

More from jbellis

Data day texas: Cassandra and the Cloud
Data day texas: Cassandra and the CloudData day texas: Cassandra and the Cloud
Data day texas: Cassandra and the Cloudjbellis
 
Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015jbellis
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014jbellis
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1jbellis
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013jbellis
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0jbellis
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynotejbellis
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012jbellis
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionjbellis
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012jbellis
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandrajbellis
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1jbellis
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Javajbellis
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprisejbellis
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)jbellis
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from javajbellis
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011jbellis
 
Brisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by CassandraBrisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by Cassandrajbellis
 
PyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorialPyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorialjbellis
 
Cassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability GroupCassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability Groupjbellis
 

More from jbellis (20)

Data day texas: Cassandra and the Cloud
Data day texas: Cassandra and the CloudData day texas: Cassandra and the Cloud
Data day texas: Cassandra and the Cloud
 
Cassandra Summit 2015
Cassandra Summit 2015Cassandra Summit 2015
Cassandra Summit 2015
 
Cassandra summit keynote 2014
Cassandra summit keynote 2014Cassandra summit keynote 2014
Cassandra summit keynote 2014
 
Cassandra 2.1
Cassandra 2.1Cassandra 2.1
Cassandra 2.1
 
Cassandra Summit EU 2013
Cassandra Summit EU 2013Cassandra Summit EU 2013
Cassandra Summit EU 2013
 
London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0London + Dublin Cassandra 2.0
London + Dublin Cassandra 2.0
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
 
Top five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solutionTop five questions to ask when choosing a big data solution
Top five questions to ask when choosing a big data solution
 
State of Cassandra 2012
State of Cassandra 2012State of Cassandra 2012
State of Cassandra 2012
 
Massively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache CassandraMassively Scalable NoSQL with Apache Cassandra
Massively Scalable NoSQL with Apache Cassandra
 
Cassandra 1.1
Cassandra 1.1Cassandra 1.1
Cassandra 1.1
 
Pycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from JavaPycon 2012 What Python can learn from Java
Pycon 2012 What Python can learn from Java
 
Apache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterpriseApache Cassandra: NoSQL in the enterprise
Apache Cassandra: NoSQL in the enterprise
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
 
What python can learn from java
What python can learn from javaWhat python can learn from java
What python can learn from java
 
State of Cassandra, 2011
State of Cassandra, 2011State of Cassandra, 2011
State of Cassandra, 2011
 
Brisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by CassandraBrisk: more powerful Hadoop powered by Cassandra
Brisk: more powerful Hadoop powered by Cassandra
 
PyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorialPyCon 2010 SQLAlchemy tutorial
PyCon 2010 SQLAlchemy tutorial
 
Cassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability GroupCassandra 0.7, Los Angeles High Scalability Group
Cassandra 0.7, Los Angeles High Scalability Group
 

Recently uploaded

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncObject Automation
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioChristian Posta
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.francesco barbera
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemAsko Soukka
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxYounusS2
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServiceRenan Moreira de Oliveira
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfinfogdgmi
 

Recently uploaded (20)

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
GenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation IncGenAI and AI GCC State of AI_Object Automation Inc
GenAI and AI GCC State of AI_Object Automation Inc
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Comparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and IstioComparing Sidecar-less Service Mesh from Cilium and Istio
Comparing Sidecar-less Service Mesh from Cilium and Istio
 
Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.Digital magic. A small project for controlling smart light bulbs.
Digital magic. A small project for controlling smart light bulbs.
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Bird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystemBird eye's view on Camunda open source ecosystem
Bird eye's view on Camunda open source ecosystem
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Babel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptxBabel Compiler - Transforming JavaScript for All Browsers.pptx
Babel Compiler - Transforming JavaScript for All Browsers.pptx
 
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer ServicePicPay - GenAI Finance Assistant - ChatGPT for Customer Service
PicPay - GenAI Finance Assistant - ChatGPT for Customer Service
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Videogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdfVideogame localization & technology_ how to enhance the power of translation.pdf
Videogame localization & technology_ how to enhance the power of translation.pdf
 

Cassandra at High Performance Transaction Systems 2011

  • 1. Apache Cassandra Present and Future Jonathan Ellis Monday, October 24, 2011
  • 2. History Bigtable, 2006 Dynamo, 2007 OSS, 2008 TLP, 2010 Incubator, 2009 1.0, October 2011 Monday, October 24, 2011
  • 3. Why people choose Cassandra ✤ Multi-master, multi-DC ✤ Linearly scalable ✤ Larger-than-memory datasets ✤ High performance ✤ Full durability ✤ Integrated caching ✤ Tuneable consistency Monday, October 24, 2011
  • 4. Cassandra users ✤ Financial ✤ Social Media ✤ Advertising ✤ Entertainment ✤ Energy ✤ E-tail ✤ Health care ✤ Government Monday, October 24, 2011
  • 5. Road to 1.0 ✤ Storage engine improvements ✤ Specialized replication modes ✤ Performance and other real-world considerations ✤ Building an ecosystem Monday, October 24, 2011
  • 6. Compaction Size-Tiered Leveled Monday, October 24, 2011
  • 7. Differences in Cassandra’s leveling ✤ ColumnFamily instead of key/value ✤ Multithreading (experimental) ✤ Optional throttling (16MB/s by default) ✤ Per-sstable bloom filter for row keys ✤ Larger data files (5MB by default) ✤ Does not block writes if compaction falls behind Monday, October 24, 2011
  • 8. Column (“secondary”) indexing cqlsh> CREATE INDEX state_idx ON users(state); cqlsh> INSERT INTO users (uname, state, birth_date) VALUES (‘bsanderson’, ‘UT’, 1975) users state_idx state birth_date bsanderson UT 1975 UT bsanderson htayler prothfuss WI 1973 WI prothfuss htayler UT 1968 Monday, October 24, 2011
  • 9. Querying cqlsh> SELECT * FROM users WHERE state='UT' AND birth_date > 1970 ORDER BY tokenize(key); Monday, October 24, 2011
  • 10. More sophisticated indexing? ✤ Want to: ✤ Support more operators ✤ Support user-defined ordering ✤ Support high-cardinality values ✤ But: ✤ Scatter/gather scales poorly ✤ If it’s not node local, we can’t guarantee atomicity ✤ If we can’t guarantee atomicity, doublechecking across nodes is a huge penalty Monday, October 24, 2011
  • 11. Other storage engine improvements ✤ Compression ✤ Expiring columns ✤ Bounded worst-case reads by re-writing fragmented rows Monday, October 24, 2011
  • 12. Eventually-consistent counters ✤ Counter is partitioned by replica; each replica is “master” of its own partition Monday, October 24, 2011
  • 17. The interesting parts ✤ Tuneable consistency ✤ Avoiding contention on local increments ✤ Store increments, not full value; merge on read + compaction ✤ Renewing counter id to deal with data loss ✤ Interaction with tombstones Monday, October 24, 2011
  • 18. (What about version vectors?) ✤ Version vectors allow detecting conflict, but do not give enough information to resolve it except in special cases ✤ In the counters case, we’d need to hold onto all previous versions until we can be sure no new conflict with them can occur ✤ Jeff Darcy has a longer discussion at http://pl.atyp.us/ wordpress/?p=2601 Monday, October 24, 2011
  • 19. Performance A single four-core machine; one million inserts + one million updates Monday, October 24, 2011
  • 20. Dealing with the JVM ✤ JNA ✤ mlockall() ✤ posix_fadvise() ✤ link() ✤ Memory ✤ Move cache off-heap ✤ In-heap arena allocation for memtables, bloom filters ✤ Move compaction to a separate process? Monday, October 24, 2011
  • 21. The Cassandra ecosystem ✤ Replication into Cassandra ✤ Gigaspaces ✤ Drizzle ✤ Solandra: Cassandra + Solr search ✤ DataStax Enterprise: Cassandra + Hadoop analytics Monday, October 24, 2011
  • 23. Operations ✤ “Vanilla” Hadoop ✤ 8+ services to setup, monitor, backup, and recover (NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker, Zookeeper, Metastore, ...) ✤ Single points of failure ✤ Can't separate online and offline processing ✤ DataStax Enterprise ✤ Single, simplified component ✤ Peer to peer ✤ JobTracker failover ✤ No additional cassandra config Monday, October 24, 2011
  • 24. What’s next ✤ Ease Of Use ✤ CQL: “Native” transport, prepared statements ✤ Triggers ✤ Entity groups ✤ Smarter range queries enabling Hive predicate push- down ✤ Blue sky: streaming / CEP Monday, October 24, 2011
  • 25. Questions? ✤ jbellis@datastax.com ✤ DataStax is hiring! ✤ ~15 engineers (35 employees), want to double in 2012 ✤ Austin, Burlingame, NYC, France, Japan, Belarus ✤ 100+ customers ✤ $11M Series B funding Monday, October 24, 2011