SlideShare a Scribd company logo
1 of 39
The Game of Big Data!
Analytics Infrastructure at KIXEYE
Randy Shoup 
@randyshoup
linkedin.com/in/randyshoup

QCon New York, June 13 2014
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/kixeye-big-data
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Free-to-Play Real-time Strategy
Games
 •  Web and mobile
•  Strategy and tactics
•  Really real-time J
•  Deep relationships with
players
•  Constantly evolving
gameplay, feature set,
economy, balance
•  >500 employees worldwide
Intro: Analytics at KIXEYE
User Acquisition

Game Analytics

Retention and Monetization

Analytic Requirements
User Acquisition
Goal: ELTV > acquisition cost
•  User’s estimated lifetime value is more than it
costs to acquire that user

Mechanisms
•  Publisher Campaigns
•  On-Platform Recommendations
Game Analytics
Goal: Measure and Optimize “Fun”
•  Difficult to define 
•  Includes gameplay, feature set, performance, bugs
•  All metrics are just proxies for fun (!)

Mechanisms
•  Game balance
•  Match balance
•  Economy management
•  Player typology
Retention and Monetization
Goal: Sustainable Business
•  Monetization drivers
•  Revenue recognition

Mechanisms
•  Pricing and Bundling
•  Tournament (“Event”) Design
•  Recommendations
Analytic Requirements
•  Data Integrity and Availability
•  Cohorting
•  Controlled experiments
•  Deep ad-hoc analysis
“Deep Thought”
V1 Analytic System

Goals

Core Capabilities

Implementation
“Deep Thought”
V1 Analytic System

Goals

Core Capabilities

Implementation
V1 Analytic System
Grew Organically
•  Built originally for user acquisition
•  Progressively grown to much more

Idiosyncratic mix of languages, systems, tools
•  Log files -> Chukwa -> Hadoop -> Hive -> MySQL
•  PHP for reports and ETL
•  Single massive table with everything
V1 Analytic System
Many Issues
•  Very slow to query
•  No data standardization or validation
•  Very difficult to add a new game, report, ETL
•  Extremely difficult to backfill on error or outage
•  Difficult for analysts to use; impossible for PMs,
designers, etc.
… but we survived (!)
“Deep Thought”
V1 Analytic System

Goals

Core Capabilities

Implementation
Goals of Deep Thought
Independent Scalability
•  Logically separate, independently scalable tiers
Stability and Outage Recovery
•  Tiers can completely fail with no data loss
•  Every step idempotent and replayable
Standardization
•  Standardized event types, fields, queries, reports
Goals of Deep Thought
In-Stream Event Processing
•  Sessionalization, Dimensionalization, Cohorting
Queryability
•  Structures are simple to reason about
•  Simple things are simple
•  Analysts, Data Scientists, PMs, Game Designers,
etc.
Extensibility
•  Easy to add new games, events, fields, reports
“Deep Thought”
V1 Analytic System

Goals

Core Capabilities

Implementation
Core Capabilities
•  Sessionalization
•  Dimensionalization
•  Cohorting
Sessionalization
All events are part of a “session”
•  Explicit start event, optional stop event
•  Game-defined semantics
Event Batching
•  Events arrive in batch, associated with session
•  Pipeline computes batch-level metrics,
disaggregates events
•  Can optionally attach batch-level metrics to each
event
Sessionalization
Time-Series Aggregations
•  Configurable metrics
•  1-day X, 7-day X, lifetime X
•  Total attacks, total time played
•  Accumulated in-stream 
•  V1 aggregate + batch delta
•  Faster to calculate in-stream vs. Map-Reduce
Dimensionalization
Pipeline assigns unique numeric id to string enums
•  E.g., “twigs” resource  id 1234

Automatic mapping and assignment
•  Games log strings
•  Pipeline generates and maps ids
•  No configuration necessary

Fast dimensional queries
•  Join on integers, not strings
Dimensionalization
Metadata enumeration and manipulation
•  Easily enumerate all values for a field
•  Merge multiple values 
•  “TWIGS” == “Twigs” == “twigs”
Metadata tagging
•  Can assign arbitrary tags to metadata
•  E.g., “Panzer 05” is {tank, mechanized infantry, event prize}
•  Enables custom views
Cohorting
Group players along any dimension / metric
•  Well beyond classic age-based cohorts

Core analytical building block
•  Experiment groups
•  User acquisition campaign tracking
•  Prospective modeling
•  Retrospective analysis
Cohorting
Set-based
•  Overlapping groups: >100, >200, etc.
•  Exclusive groups: (100-200), (200-500), etc.

Time-based
•  E.g., people who played in last 3 days
•  E.g., “whale” == ($$ > X) in last N days
•  Autoexpire from a group without explicit
intervention
“Deep Thought”
V1 Analytic System

Goals

Core Capabilities

Implementation
Implementation of Pipeline
•  Ingestion
•  Event Log
•  Transformation
•  Data Storage
•  Analysis and Visualization
Logging	
  Service	
  
Ka.a	
  
Importer	
  /	
  Session	
  Store	
  
Hadoop	
  2	
  
Hive	
  /	
  Redshi:	
  
Ingestion: Logging Service
HTTP / JSON Endpoint
•  Play framework
•  Non-blocking, event-driven

Responsibilities
•  Message integrity via checksums
•  Durability via local disk persistence
•  Async batch writes to Kafka topics 
•  {valid, invalid, unauth} 
Logging	
  Service	
  
Ka.a	
  
Importer	
  /	
  Session	
  Store	
  
Hadoop	
  2	
  
Hive	
  /	
  Redshi:	
  
Event Log: Kafka
Persistent, replayable pipe of events
•  Events stored for 7 days

Responsibilities
•  Durability via replication and local
disk streaming
•  Replayability via commit log
•  Scalability via partitioned brokers
•  Segment data for different types of
processing
Logging	
  Service	
  
Ka.a	
  
Importer	
  /	
  Session	
  Store	
  
Hadoop	
  2	
  
Hive	
  /	
  Redshi:	
  
Transformation: Importer
Consume Kafka topics, rebroadcast
•  E.g., consume batches, rebroadcast
events

Responsibilities
•  Batch validation against JSON schema
•  Syntactic validation
•  Semantic validation (is this event
possible?)
•  Batches -> events
Logging	
  Service	
  
Ka.a	
  
Importer	
  /	
  Session	
  Store	
  
Hadoop	
  2	
  
Hive	
  /	
  Redshi:	
  
Transformation: Importer
Responsibilities (cont.)
•  Sessionalization
•  Assign event to session
•  Calculate time-series aggregates
•  Dimensionalization
•  String enum -> numeric id
•  Merge / coalesce different string
representations into single id
•  Player metadata
•  Join player metadata from session
store
Logging	
  Service	
  
Ka.a	
  
Importer	
  /	
  Session	
  Store	
  
Hadoop	
  2	
  
Hive	
  /	
  Redshi:	
  
Transformation: Importer
Responsibilities (cont.)
•  Cohorting
•  Process enter-cohort, exit-cohort
events
•  Process A / B testing events
•  Evaluate cohort rules (e.g., spend
thresholds)
•  Decorate events with cohort tags
Logging	
  Service	
  
Ka.a	
  
Importer	
  /	
  Session	
  Store	
  
Hadoop	
  2	
  
Hive	
  /	
  Redshi:	
  
Transformation: Session Store
Key-value store (Couchbase)
•  Fast, constant-time access to sessions,
players

Responsibilities
•  Store Sessions, Players, Dimensions,
Config
•  Lookup
•  Idempotent update
•  Store accumulated session-level metrics
•  Store player history
Logging	
  Service	
  
Ka.a	
  
Importer	
  /	
  Session	
  Store	
  
Hadoop	
  2	
  
Hive	
  /	
  Redshi:	
  
Storage: Hadoop 2
Camus MR
•  Kafka -> HDFS every 3 minutes

append_events table
•  Append-only log of events
•  Each event has session-version for
deduplication
Logging	
  Service	
  
Ka.a	
  
Importer	
  /	
  Session	
  Store	
  
Hadoop	
  2	
  
Hive	
  /	
  Redshi:	
  
Storage: Hadoop 2
append_events -> base_events MR
•  Logical update of base_events
•  Update events with new metadata
•  Swap old partition for new partition
•  Replayable from beginning without
duplication
Logging	
  Service	
  
Ka.a	
  
Importer	
  /	
  Session	
  Store	
  
Hadoop	
  2	
  
Hive	
  /	
  Redshi:	
  
Storage: Hadoop 2
base_events table
•  Denormalized table of all events
•  Stores original JSON + decoration
•  Custom Serdes to query / extract
JSON fields without materializing
entire rows
•  Standardized event types  lots of
functionality for free
Logging	
  Service	
  
Ka.a	
  
Importer	
  /	
  Session	
  Store	
  
Hadoop	
  2	
  
Hive	
  /	
  Redshi:	
  
Analysis and Visualization
Hive Warehouse
•  Normalized event-specific, game-
specific stores
•  Aggregate metric data for reporting,
analysis
•  Maintained through custom ETL 
•  MR
•  Hive queries

Logging	
  Service	
  
Ka.a	
  
Importer	
  /	
  Session	
  Store	
  
Hadoop	
  2	
  
Hive	
  /	
  Redshi:	
  
Analysis and Visualization
Amazon Redshift
•  Fast ad-hoc querying

Tableau
•  Simple, powerful reporting
Logging	
  Service	
  
Ka.a	
  
Importer	
  /	
  Session	
  Store	
  
Hadoop	
  2	
  
Hive	
  /	
  Redshi:	
  
Come Join Us!
KIXEYE is hiring in 

SF, Seattle, Victoria,
Brisbane, Amsterdam

rshoup@kixeye.com
@randyshoup
Deep Thought Team:
•  Mark Weaver
•  Josh McDonald
•  Ben Speakmon
•  Snehal Nagmote
•  Mark Roberts
•  Kevin Lee
•  Woo Chan Kim
•  Tay Carpenter
•  Tim Ellis
•  Kazue Watanabe
•  Erica Chan
•  Jessica Cox
•  Casey DeWitt
•  Steve Morin
•  Lih Chen
•  Neha Kumari
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/kixeye-
big-data

More Related Content

More from C4Media

Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsC4Media
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechC4Media
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/awaitC4Media
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaC4Media
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?C4Media
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseC4Media
 

More from C4Media (20)

Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 
Navigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery TeamsNavigating Complexity: High-performance Delivery and Discovery Teams
Navigating Complexity: High-performance Delivery and Discovery Teams
 
High Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in AdtechHigh Performance Cooperative Distributed Systems in Adtech
High Performance Cooperative Distributed Systems in Adtech
 
Rust's Journey to Async/await
Rust's Journey to Async/awaitRust's Journey to Async/await
Rust's Journey to Async/await
 
Opportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven UtopiaOpportunities and Pitfalls of Event-Driven Utopia
Opportunities and Pitfalls of Event-Driven Utopia
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
Are We Really Cloud-Native?
Are We Really Cloud-Native?Are We Really Cloud-Native?
Are We Really Cloud-Native?
 
CockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL DatabaseCockroachDB: Architecture of a Geo-Distributed SQL Database
CockroachDB: Architecture of a Geo-Distributed SQL Database
 

Recently uploaded

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

The Game of Big Data: Scalable, Reliable Analytics Infrastructure at KIXEYE

  • 1. The Game of Big Data! Analytics Infrastructure at KIXEYE Randy Shoup @randyshoup linkedin.com/in/randyshoup QCon New York, June 13 2014
  • 2. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /kixeye-big-data
  • 3. Presented at QCon New York www.qconnewyork.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  • 4. Free-to-Play Real-time Strategy Games •  Web and mobile •  Strategy and tactics •  Really real-time J •  Deep relationships with players •  Constantly evolving gameplay, feature set, economy, balance •  >500 employees worldwide
  • 5. Intro: Analytics at KIXEYE User Acquisition Game Analytics Retention and Monetization Analytic Requirements
  • 6. User Acquisition Goal: ELTV > acquisition cost •  User’s estimated lifetime value is more than it costs to acquire that user Mechanisms •  Publisher Campaigns •  On-Platform Recommendations
  • 7. Game Analytics Goal: Measure and Optimize “Fun” •  Difficult to define •  Includes gameplay, feature set, performance, bugs •  All metrics are just proxies for fun (!) Mechanisms •  Game balance •  Match balance •  Economy management •  Player typology
  • 8. Retention and Monetization Goal: Sustainable Business •  Monetization drivers •  Revenue recognition Mechanisms •  Pricing and Bundling •  Tournament (“Event”) Design •  Recommendations
  • 9. Analytic Requirements •  Data Integrity and Availability •  Cohorting •  Controlled experiments •  Deep ad-hoc analysis
  • 10. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  • 11. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  • 12. V1 Analytic System Grew Organically •  Built originally for user acquisition •  Progressively grown to much more Idiosyncratic mix of languages, systems, tools •  Log files -> Chukwa -> Hadoop -> Hive -> MySQL •  PHP for reports and ETL •  Single massive table with everything
  • 13. V1 Analytic System Many Issues •  Very slow to query •  No data standardization or validation •  Very difficult to add a new game, report, ETL •  Extremely difficult to backfill on error or outage •  Difficult for analysts to use; impossible for PMs, designers, etc. … but we survived (!)
  • 14. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  • 15. Goals of Deep Thought Independent Scalability •  Logically separate, independently scalable tiers Stability and Outage Recovery •  Tiers can completely fail with no data loss •  Every step idempotent and replayable Standardization •  Standardized event types, fields, queries, reports
  • 16. Goals of Deep Thought In-Stream Event Processing •  Sessionalization, Dimensionalization, Cohorting Queryability •  Structures are simple to reason about •  Simple things are simple •  Analysts, Data Scientists, PMs, Game Designers, etc. Extensibility •  Easy to add new games, events, fields, reports
  • 17. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  • 18. Core Capabilities •  Sessionalization •  Dimensionalization •  Cohorting
  • 19. Sessionalization All events are part of a “session” •  Explicit start event, optional stop event •  Game-defined semantics Event Batching •  Events arrive in batch, associated with session •  Pipeline computes batch-level metrics, disaggregates events •  Can optionally attach batch-level metrics to each event
  • 20. Sessionalization Time-Series Aggregations •  Configurable metrics •  1-day X, 7-day X, lifetime X •  Total attacks, total time played •  Accumulated in-stream •  V1 aggregate + batch delta •  Faster to calculate in-stream vs. Map-Reduce
  • 21. Dimensionalization Pipeline assigns unique numeric id to string enums •  E.g., “twigs” resource  id 1234 Automatic mapping and assignment •  Games log strings •  Pipeline generates and maps ids •  No configuration necessary Fast dimensional queries •  Join on integers, not strings
  • 22. Dimensionalization Metadata enumeration and manipulation •  Easily enumerate all values for a field •  Merge multiple values •  “TWIGS” == “Twigs” == “twigs” Metadata tagging •  Can assign arbitrary tags to metadata •  E.g., “Panzer 05” is {tank, mechanized infantry, event prize} •  Enables custom views
  • 23. Cohorting Group players along any dimension / metric •  Well beyond classic age-based cohorts Core analytical building block •  Experiment groups •  User acquisition campaign tracking •  Prospective modeling •  Retrospective analysis
  • 24. Cohorting Set-based •  Overlapping groups: >100, >200, etc. •  Exclusive groups: (100-200), (200-500), etc. Time-based •  E.g., people who played in last 3 days •  E.g., “whale” == ($$ > X) in last N days •  Autoexpire from a group without explicit intervention
  • 25. “Deep Thought” V1 Analytic System Goals Core Capabilities Implementation
  • 26. Implementation of Pipeline •  Ingestion •  Event Log •  Transformation •  Data Storage •  Analysis and Visualization Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 27. Ingestion: Logging Service HTTP / JSON Endpoint •  Play framework •  Non-blocking, event-driven Responsibilities •  Message integrity via checksums •  Durability via local disk persistence •  Async batch writes to Kafka topics •  {valid, invalid, unauth} Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 28. Event Log: Kafka Persistent, replayable pipe of events •  Events stored for 7 days Responsibilities •  Durability via replication and local disk streaming •  Replayability via commit log •  Scalability via partitioned brokers •  Segment data for different types of processing Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 29. Transformation: Importer Consume Kafka topics, rebroadcast •  E.g., consume batches, rebroadcast events Responsibilities •  Batch validation against JSON schema •  Syntactic validation •  Semantic validation (is this event possible?) •  Batches -> events Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 30. Transformation: Importer Responsibilities (cont.) •  Sessionalization •  Assign event to session •  Calculate time-series aggregates •  Dimensionalization •  String enum -> numeric id •  Merge / coalesce different string representations into single id •  Player metadata •  Join player metadata from session store Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 31. Transformation: Importer Responsibilities (cont.) •  Cohorting •  Process enter-cohort, exit-cohort events •  Process A / B testing events •  Evaluate cohort rules (e.g., spend thresholds) •  Decorate events with cohort tags Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 32. Transformation: Session Store Key-value store (Couchbase) •  Fast, constant-time access to sessions, players Responsibilities •  Store Sessions, Players, Dimensions, Config •  Lookup •  Idempotent update •  Store accumulated session-level metrics •  Store player history Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 33. Storage: Hadoop 2 Camus MR •  Kafka -> HDFS every 3 minutes append_events table •  Append-only log of events •  Each event has session-version for deduplication Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 34. Storage: Hadoop 2 append_events -> base_events MR •  Logical update of base_events •  Update events with new metadata •  Swap old partition for new partition •  Replayable from beginning without duplication Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 35. Storage: Hadoop 2 base_events table •  Denormalized table of all events •  Stores original JSON + decoration •  Custom Serdes to query / extract JSON fields without materializing entire rows •  Standardized event types  lots of functionality for free Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 36. Analysis and Visualization Hive Warehouse •  Normalized event-specific, game- specific stores •  Aggregate metric data for reporting, analysis •  Maintained through custom ETL •  MR •  Hive queries Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 37. Analysis and Visualization Amazon Redshift •  Fast ad-hoc querying Tableau •  Simple, powerful reporting Logging  Service   Ka.a   Importer  /  Session  Store   Hadoop  2   Hive  /  Redshi:  
  • 38. Come Join Us! KIXEYE is hiring in SF, Seattle, Victoria, Brisbane, Amsterdam rshoup@kixeye.com @randyshoup Deep Thought Team: •  Mark Weaver •  Josh McDonald •  Ben Speakmon •  Snehal Nagmote •  Mark Roberts •  Kevin Lee •  Woo Chan Kim •  Tay Carpenter •  Tim Ellis •  Kazue Watanabe •  Erica Chan •  Jessica Cox •  Casey DeWitt •  Steve Morin •  Lih Chen •  Neha Kumari
  • 39. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/kixeye- big-data