Lessons learned by Restlet when deploying DataStax Enterprise search with APISpark. Presentation by Jerome Louvel and Guillaume Blondeau at the Cassandra Summit 2015. Includes 7 challenges and solutions when deploying DataStax.
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
Cassandra Summit 2015 - Building a multi-tenant API PaaS with DataStax Enterprise Search
1.
2. 1. Introduction
2. Persistence needs of an API PaaS
3. Selecting DataStax Enterprise Search
4. Main challenges and solutions
5. Conclusion
6. Q&A
Agenda
4. ● Jérôme Louvel
○ founder & CTO of Restlet, Web API platform vendor
○ created Restlet Framework, first REST framework in 2004
○ contributor to “RESTful Web Services” (O’Reilly, 2007)
○ member of the JAX-RS 1.0 expert group (2007 - 2009)
○ co-author of “Restlet in Action” (Manning, 2012)
○ InfoQ editor covering Web APIs since 2014
● Guillaume Blondeau
○ DevOps engineer at Restlet
○ working on APISpark cloud platform
○ Cassandra Administrator certified by DataStax
About the Speakers
6. ● Key features
○ visual creation & deployment of
data APIs
○ operation of APIs &
their local data sources
○ management of any API
● Benefits
○ accessible via web browser,
no technical expertise required
○ companies of any size can
become API providers
○ get started for free, then pay
when the API generates traffic
About APISpark
10. High Scalability & Elasticity
● For API traffic
○ concurrent calls
○ workload types
○ peaks handling
● For data storage
○ number of stores
○ volume of data ...
...
...
...
12. High Multi-tenant Density
● Balance between
○ data isolation
○ low cost
● Many customers & projects
○ sharing persistence
infrastructure
○ isolated data stores
● Many users & groups
○ personal data
○ shared group data
14. Step 1: Prototyping with AWS NoSQL
● Started with SimpleDB
○ zero ops, highly available & low latency
○ mono-region & limited query capabilities
● Upgraded to DynamoDB
○ better scalability & predictability
○ not really for multi-tenant use cases (soft limits)
○ not very elastic (provisioned throughput)
● Other limitations
○ unable to develop and test locally (MySQL mode)
○ strong AWS lock-in
15. Step 2: Moving to Apache Cassandra
● For APISpark beta version
○ increasing multi-tenancy needs
○ increasing cost concerns
● Benefits
○ fully open source & free (vendor support)
○ on-premise deployments possible
○ proven scalability on AWS (Netflix)
○ richer query capabilities
○ natively multi-region
16. Step 3: Upgrading to DataStax Enterprise
● For APISpark GA
○ DataStax certified stack
○ production support
● Improved capabilities
○ much richer query capabilities with Solr integration
○ administration console
○ command line tooling
○ comprehensive documentation
● Still open source foundation
○ limited vendor lock-in
○ mature open source components
19. ● Using Ec2MultiRegionSnitch
● 1 Entity Store = 1 Keyspace
○ Each keyspace can set its own replication policy
I. Deploying Across Multiple Regions
20. ● 1 Entity Store = 1 Keyspace
○ Data isolated in File System and Memory
● Complementary benefit
○ ACL per keyspace
II. Isolating Customer Data & Keeping Cost Low
Keyspace
Table
22. IV. Dealing with Dynamic Schema Changes (1/3)
ALTER TABLE DROP
ALTER TABLE ADD
23. IV. Dealing with Dynamic Schema Changes (2/3)
User Action on Entity Store Action performed in DB
Create Entity CQL: “CREATE TABLE <tableName>” + Solr Core creation
Delete Entity CQL: “DROP TABLE <tableName>”
Create Property
CQL: “ALTER TABLE ADD <columnName> <type>” +
Solr Core schema update
Delete Property
CQL: “ALTER TABLE DROP <columnName>” +
Solr Core schema update
Add Property in composite Java: Alter JSON for all rows
Delete Property in composite Java: Alter JSON for all rows
24. ● Advantages
○ flexibility compared to RDBMS
■ no lock
○ available actions
■ add / drop / rename column
■ change type of column
● Limitations
○ schema deployment can take time
○ in some edge cases can’t recreate columns
IV. Dealing with Dynamic Schema Changes (3/3)
25. V. High Multi-tenant Density (1/2)
Schema deployment time with growing # of tables
26. ● Challenge
○ large number of C* tables & Solr cores
○ memory usage (ex: 1 C* table takes more than 1MB of heap)
● Solutions
○ adjust JVM memory settings
○ need to create additional clusters
○ deprovision unused Entity Stores
V. High Multi-tenant Density (2/2)
32. ● Special use case of DataStax Enterprise
○ not a lot of shared knowledge about it
○ great support from DataStax
○ DSE is a good fit despite some challenges
● Looking forward to DSE 4.8 !
○ User Defined Types with Solr indexing
○ live indexing of C* data into Solr
○ improved overall performance
Conclusion