SlideShare a Scribd company logo
1 of 34
Download to read offline
Appboy Analytics
Jon Hyman
NY MongoDB User Group, November 19, 2013
eBay NYC

@appboy @jon_hyman
A LITTLE BIT ABOUT
US & APPBOY
(who we are and what we do)

Appboy is a mobile relationship
management platform for apps
Jon Hyman
CIO :: @jon_hyman

!
Harvard
Bridgewater
Appboy improves
engagement by helping you
understand your app users
•

IDENTIFY - Understand demographics,

social and behavioral data
•

SEGMENT - Organize customers into

groups based on behaviors, events, user
attributes, and location
•

ENGAGE - Message users through

push notifications, emails, and multiple
forms of in-app messages
Use Case: Customer engagement begins with onboarding

Urban Outfitters

textPlus

Shape Magazine
Agenda
•

How to quickly store time series data in
MongoDB using flexible schemas


•

Learn how flexible schemas can easily
provide breakdowns across dimensions


•

Counting quickly: statistical analysis on top
of MongoDB queries
What kinds of analytics does Appboy track?
•

Lots of time series data
•

App opens over time

•

Events over time

•

Revenue over time

•

Marketing campaign stats and efficacy over time
What kinds of analytics does Appboy track?
•

Breakdowns*
•

Device types

•

Device OS versions

•

Screen resolutions

•

Revenue by product

* We also care about this over time!
What kinds of analytics does Appboy track?
•

User segment membership
•

How many users are in each
segment?

•

How many can be emailed or
reached via push notifications?

•

What is the average revenue
per user in the segment?

•

Per paying user?
Pre-aggregated Analytics:

APP OPENS OVER TIME
Typical time series collection
Log a new row for each open received
!
{!
timestamp: 2013-11-14 00:00:00 UTC,!
app_id: App identifier!
}!
!
db.app_opens.find({app_id: A, timestamp: {$gte: date}})!

Pro: Really, really simple. Easy to add attribution to users.
Con: You need to aggregate the data before
drawing the chart; lots of documents read into
memory, lots of dirty pages
Fewer documents with pre-aggregation iteration 1
Create a document that groups by the time period
!

{!
app_id: App identifier,!
date: Date of the document,!
hour: 0-23 based hour this document represents,!
opens: Number of opens this hour!
}!
!

db.app_opens.update({date: D, app_id: A, hour: 0},
{$inc: {opens:1}})
Pro: Really easy to draw histograms
Con: We never care about an hour by itself. We lose attribution.
Fewer documents with pre-aggregation iteration 2
Create a document by day and have each hour be a field
!
{!
app_id: App identifier,!
date: Date of the document,!
total_opens: Total number of opens this day,!
0: Number of opens at midnight,!
1: Number of opens at 1am,!
...!
23: Number of opens at 11pm!
}!

!
db.app_opens.update(!
{date: D, app_id: A}, !
{$inc: {“0”:1, total:1}}!
)

Pro: Document count is low, easy to use aggregation framework
for longer spans, fast: document should be in working set
Fewer documents with pre-aggregation iteration 2
•

What about looking at different dimensions?
•

App opens by device type (e.g., how do iPads

compare to iPhones?)
•

Demographics (gender, age group)
Solution!

FLEXIBLE SCHEMAS!
Fewer documents with pre-aggregation iteration 3
Dynamically add dimensions in the document

!
{!
app_id; App identifier,!
date: Date of the document,!
totals: {!
app_opens: Total number of opens this day,!
devices: {!
"iPad Air": Total number of opens on the iPad Air,!
"iPhone 4": Total number of opens on the iPhone 4,!
},!
genders: {!
male: Total number of opens from male users,!
female: Total number of opens from female users!
},!
...!
},!
0: {!
app_opens: Number of opens at midnight,!
devices: {!
"iPad Air": Number of opens on the iPad Air at midnight,!
"iPhone 4": Number of opens on the iPhone 4 at midnight,!
},!
...!
},!
...!
}!

!

db.app_opens.update({date: D, app_id: A}, {$inc: {“0”:1, total:1}})
Pre-aggregated analytics
Pros

•
•

Easily extensible to add other dimensions

•

Still only using one document, therefore you can create
charts very quickly

•

You get breakdowns over a time period for free

!

Cons

•
•

Pre-aggregated data has no attribution

•

Have to know questions ahead of time

Follow up: What if we wanted to look at a graph by age group?
Pre-aggregated analytics summary
•

Get started tracking time series
data quickly

•

You get breakdowns for free

•

Adding dimensions is super simple

•

No attribution, need to know
questions ahead of time

•

Don’t just rely on pre-aggregated
analytics
Counting quickly:

USER SEGMENTATION &
STATISTICAL ANALYSIS
User Segmentation
•A

group of users who match some set of filters
Counting quickly
Appboy shows you segment membership in real-time
as you add/edit/remove filters.
!

How do we do it quickly?
!

We estimate the population sizes of segments when
using our web UI.
Counting quickly

Goal: Quickly get the
count() of an arbitrary
query
!

Problem: MongoDB
counts are slow,
especially unindexed
ones
Counting quickly
10 million documents that represent people:
{!
favorite_color: “blue”,!
age: 27,!
gender: “M”,!
favorite_food: “pizza”,!
city: “NYC”,!
shoe_size: 11,!
attractiveness: 10,!
...!
} !
Counting quickly
10 million documents that represent people:
{!
favorite_color: “blue”,!
age: 27,!
gender: “M”,!
favorite_food: “pizza”,!
city: “NYC”,!
shoe_size: 11,!
attractiveness: 10,!
...!
} !
•

How many people like blue?

•

How many live in NYC and love pizza?

•

How many men have a shoe size less than 10?
Answer:

Big Question:
How do you estimate
counts?

The same way news
networks do it.
!

With confidence.
Counting quickly
Add a random number in a known range to each document. Say,
between 0 and 9999.
{!
random: 4583,!
favorite_color: “blue”,!
age: 27,!
gender: “M”,!
favorite_food: “pizza”,!
city: “NYC”,!
shoe_size: 11,!
attractiveness: 10,!
...!
} !

Add an index on the random number:
!

db.users.ensureIndex({random:1})
Counting quickly
Step 1: Get a random sample
!

I have 10 million documents. Of my 10,000 random “buckets”, I
should expect each “bucket” to hold about 1,000 users.
!

E.g.,
!

db.users.find({random: 123}).count() == ~1000!
db.users.find({random: 9043}).count() == ~1000!
db.users.find({random: 4982}).count() == ~1000
Counting quickly
Step 1: Get a random sample
!

Let’s take a random 100,000 users. Grab a random range that
“holds” those users. These all work:
!

db.users.find({random: {$gt: 0, $lt: 101})!
db.users.find({random: {$gt: 503, $lt: 604})!
db.users.find({random: {$gt: 8938, $lt: 9039})!
db.users.find({$or: [!
{random: {$gt: 9955}}, !
{random: {$lt: 56}}!
])
Tip: Limit $maxScan to 100,000 just to be safe
Counting quickly
Step 2: Learn about that random sample
!

db.users.find(!
{!
random: {$gt: 0, $lt: 101},!
gender: “M”,!
favorite_color: “blue”,!
size_size: {$gt: 10}!
}, !
)!
._addSpecial(“$maxScan”, 100000)!
.explain()
Explain Result:
!
{!
nscannedObjects: 100000,!
n: 11302,!
...!
} !
Counting quickly
Step 3: Do the math
!

Population: 10,000,000
!

Sample size: 100,000
!

Num matches: 11,302
!

Percentage of users who matched: 11.3%
!

Estimated total count: 1,130,000 +/- 0.2%
with 95% confidence
Counting quickly
Step 4: Optimize
!

Limit $maxScan to (100,000/numShards) to be even
faster
•

!

Cache the random range for a few hours

•
!

Add more RAM (or shards)

•
!

Cache results to not hit the database for the same
query
•
Counting quickly
Step 5: Improve
!

Get more than one count: use the aggregation
framework on top of the population’s sample size

•

•

Work around all sorts of Mongo bugs :-(
Summarize
•

Pre-aggregated analytics
•

Create a document that represents event occurrences
in some time period

•

Takes full advantage of MongoDB’s flexible schemas

•

Not a catch-all for analytics, you should still store event
data
Summarize
•

Counting quickly
•

Estimate results of arbitrary queries using population
sample sizes

•

Depending on your app, this could be a great way to
keep response time predictable as you scale
Thanks! Questions?
jon@appboy.com

@appboy @jon_hyman

More Related Content

Similar to Appboy analytics - NYC MUG 11/19/13

AppSec Pipelines and Event based Security
AppSec Pipelines and Event based SecurityAppSec Pipelines and Event based Security
AppSec Pipelines and Event based SecurityMatt Tesauro
 
穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scale穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scalehdhappy001
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherMongoDB
 
Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015kingsBSD
 
Genn.ai introduction for Buzzwords
Genn.ai introduction for BuzzwordsGenn.ai introduction for Buzzwords
Genn.ai introduction for BuzzwordsTakeshi Nakano
 
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Amazon Web Services
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014The Hive
 
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Krist Wongsuphasawat
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsDataWorks Summit
 
An Introduction To Software Development - Software Development Midterm Review
An Introduction To Software Development - Software Development Midterm ReviewAn Introduction To Software Development - Software Development Midterm Review
An Introduction To Software Development - Software Development Midterm ReviewBlue Elephant Consulting
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...Paolo Corti
 
Un backend: pour tous vos objets connectés
Un backend: pour tous vos objets connectésUn backend: pour tous vos objets connectés
Un backend: pour tous vos objets connectésAmazon Web Services
 
Klout as an Example Application of Topics-oriented NLP APIs
Klout as an Example Application of Topics-oriented NLP APIsKlout as an Example Application of Topics-oriented NLP APIs
Klout as an Example Application of Topics-oriented NLP APIsTyler Singletary
 

Similar to Appboy analytics - NYC MUG 11/19/13 (20)

Data Visualization
Data VisualizationData Visualization
Data Visualization
 
AWS re:Invent Hackathon
AWS re:Invent HackathonAWS re:Invent Hackathon
AWS re:Invent Hackathon
 
AppSec Pipelines and Event based Security
AppSec Pipelines and Event based SecurityAppSec Pipelines and Event based Security
AppSec Pipelines and Event based Security
 
穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scale穆黎森:Interactive batch query at scale
穆黎森:Interactive batch query at scale
 
UI and UX for Mobile Developers
UI and UX for Mobile DevelopersUI and UX for Mobile Developers
UI and UX for Mobile Developers
 
Android development first steps
Android development   first stepsAndroid development   first steps
Android development first steps
 
Using MongoDB + Hadoop Together
Using MongoDB + Hadoop TogetherUsing MongoDB + Hadoop Together
Using MongoDB + Hadoop Together
 
Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015Our Data Ourselves, Pydata 2015
Our Data Ourselves, Pydata 2015
 
amansingh.docx
amansingh.docxamansingh.docx
amansingh.docx
 
Genn.ai introduction for Buzzwords
Genn.ai introduction for BuzzwordsGenn.ai introduction for Buzzwords
Genn.ai introduction for Buzzwords
 
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
 
APIs v2
APIs v2APIs v2
APIs v2
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
 
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
Using Visualizations to Monitor Changes and Harvest Insights from a Global-sc...
 
Agile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics ApplicationsAgile Data: Building Hadoop Analytics Applications
Agile Data: Building Hadoop Analytics Applications
 
An Introduction To Software Development - Software Development Midterm Review
An Introduction To Software Development - Software Development Midterm ReviewAn Introduction To Software Development - Software Development Midterm Review
An Introduction To Software Development - Software Development Midterm Review
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
Building an Open Source, Real-Time, Billion Object Spatio-Temporal Search Pla...
 
Un backend: pour tous vos objets connectés
Un backend: pour tous vos objets connectésUn backend: pour tous vos objets connectés
Un backend: pour tous vos objets connectés
 
Klout as an Example Application of Topics-oriented NLP APIs
Klout as an Example Application of Topics-oriented NLP APIsKlout as an Example Application of Topics-oriented NLP APIs
Klout as an Example Application of Topics-oriented NLP APIs
 

More from MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

More from MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Recently uploaded

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

Appboy analytics - NYC MUG 11/19/13

  • 1. Appboy Analytics Jon Hyman NY MongoDB User Group, November 19, 2013 eBay NYC @appboy @jon_hyman
  • 2. A LITTLE BIT ABOUT US & APPBOY (who we are and what we do) Appboy is a mobile relationship management platform for apps Jon Hyman CIO :: @jon_hyman ! Harvard Bridgewater
  • 3. Appboy improves engagement by helping you understand your app users • IDENTIFY - Understand demographics, social and behavioral data • SEGMENT - Organize customers into groups based on behaviors, events, user attributes, and location • ENGAGE - Message users through push notifications, emails, and multiple forms of in-app messages
  • 4. Use Case: Customer engagement begins with onboarding Urban Outfitters textPlus Shape Magazine
  • 5. Agenda • How to quickly store time series data in MongoDB using flexible schemas
 • Learn how flexible schemas can easily provide breakdowns across dimensions
 • Counting quickly: statistical analysis on top of MongoDB queries
  • 6. What kinds of analytics does Appboy track? • Lots of time series data • App opens over time • Events over time • Revenue over time • Marketing campaign stats and efficacy over time
  • 7. What kinds of analytics does Appboy track? • Breakdowns* • Device types • Device OS versions • Screen resolutions • Revenue by product * We also care about this over time!
  • 8. What kinds of analytics does Appboy track? • User segment membership • How many users are in each segment? • How many can be emailed or reached via push notifications? • What is the average revenue per user in the segment? • Per paying user?
  • 10. Typical time series collection Log a new row for each open received ! {! timestamp: 2013-11-14 00:00:00 UTC,! app_id: App identifier! }! ! db.app_opens.find({app_id: A, timestamp: {$gte: date}})! Pro: Really, really simple. Easy to add attribution to users. Con: You need to aggregate the data before drawing the chart; lots of documents read into memory, lots of dirty pages
  • 11. Fewer documents with pre-aggregation iteration 1 Create a document that groups by the time period ! {! app_id: App identifier,! date: Date of the document,! hour: 0-23 based hour this document represents,! opens: Number of opens this hour! }! ! db.app_opens.update({date: D, app_id: A, hour: 0}, {$inc: {opens:1}}) Pro: Really easy to draw histograms Con: We never care about an hour by itself. We lose attribution.
  • 12. Fewer documents with pre-aggregation iteration 2 Create a document by day and have each hour be a field ! {! app_id: App identifier,! date: Date of the document,! total_opens: Total number of opens this day,! 0: Number of opens at midnight,! 1: Number of opens at 1am,! ...! 23: Number of opens at 11pm! }! ! db.app_opens.update(! {date: D, app_id: A}, ! {$inc: {“0”:1, total:1}}! ) Pro: Document count is low, easy to use aggregation framework for longer spans, fast: document should be in working set
  • 13. Fewer documents with pre-aggregation iteration 2 • What about looking at different dimensions? • App opens by device type (e.g., how do iPads compare to iPhones?) • Demographics (gender, age group)
  • 15. Fewer documents with pre-aggregation iteration 3 Dynamically add dimensions in the document ! {! app_id; App identifier,! date: Date of the document,! totals: {! app_opens: Total number of opens this day,! devices: {! "iPad Air": Total number of opens on the iPad Air,! "iPhone 4": Total number of opens on the iPhone 4,! },! genders: {! male: Total number of opens from male users,! female: Total number of opens from female users! },! ...! },! 0: {! app_opens: Number of opens at midnight,! devices: {! "iPad Air": Number of opens on the iPad Air at midnight,! "iPhone 4": Number of opens on the iPhone 4 at midnight,! },! ...! },! ...! }! ! db.app_opens.update({date: D, app_id: A}, {$inc: {“0”:1, total:1}})
  • 16. Pre-aggregated analytics Pros • • Easily extensible to add other dimensions • Still only using one document, therefore you can create charts very quickly • You get breakdowns over a time period for free ! Cons • • Pre-aggregated data has no attribution • Have to know questions ahead of time Follow up: What if we wanted to look at a graph by age group?
  • 17. Pre-aggregated analytics summary • Get started tracking time series data quickly • You get breakdowns for free • Adding dimensions is super simple • No attribution, need to know questions ahead of time • Don’t just rely on pre-aggregated analytics
  • 18. Counting quickly: USER SEGMENTATION & STATISTICAL ANALYSIS
  • 19. User Segmentation •A group of users who match some set of filters
  • 20. Counting quickly Appboy shows you segment membership in real-time as you add/edit/remove filters. ! How do we do it quickly? ! We estimate the population sizes of segments when using our web UI.
  • 21. Counting quickly Goal: Quickly get the count() of an arbitrary query ! Problem: MongoDB counts are slow, especially unindexed ones
  • 22. Counting quickly 10 million documents that represent people: {! favorite_color: “blue”,! age: 27,! gender: “M”,! favorite_food: “pizza”,! city: “NYC”,! shoe_size: 11,! attractiveness: 10,! ...! } !
  • 23. Counting quickly 10 million documents that represent people: {! favorite_color: “blue”,! age: 27,! gender: “M”,! favorite_food: “pizza”,! city: “NYC”,! shoe_size: 11,! attractiveness: 10,! ...! } ! • How many people like blue? • How many live in NYC and love pizza? • How many men have a shoe size less than 10?
  • 24. Answer: Big Question: How do you estimate counts? The same way news networks do it. ! With confidence.
  • 25. Counting quickly Add a random number in a known range to each document. Say, between 0 and 9999. {! random: 4583,! favorite_color: “blue”,! age: 27,! gender: “M”,! favorite_food: “pizza”,! city: “NYC”,! shoe_size: 11,! attractiveness: 10,! ...! } ! Add an index on the random number: ! db.users.ensureIndex({random:1})
  • 26. Counting quickly Step 1: Get a random sample ! I have 10 million documents. Of my 10,000 random “buckets”, I should expect each “bucket” to hold about 1,000 users. ! E.g., ! db.users.find({random: 123}).count() == ~1000! db.users.find({random: 9043}).count() == ~1000! db.users.find({random: 4982}).count() == ~1000
  • 27. Counting quickly Step 1: Get a random sample ! Let’s take a random 100,000 users. Grab a random range that “holds” those users. These all work: ! db.users.find({random: {$gt: 0, $lt: 101})! db.users.find({random: {$gt: 503, $lt: 604})! db.users.find({random: {$gt: 8938, $lt: 9039})! db.users.find({$or: [! {random: {$gt: 9955}}, ! {random: {$lt: 56}}! ]) Tip: Limit $maxScan to 100,000 just to be safe
  • 28. Counting quickly Step 2: Learn about that random sample ! db.users.find(! {! random: {$gt: 0, $lt: 101},! gender: “M”,! favorite_color: “blue”,! size_size: {$gt: 10}! }, ! )! ._addSpecial(“$maxScan”, 100000)! .explain() Explain Result: ! {! nscannedObjects: 100000,! n: 11302,! ...! } !
  • 29. Counting quickly Step 3: Do the math ! Population: 10,000,000 ! Sample size: 100,000 ! Num matches: 11,302 ! Percentage of users who matched: 11.3% ! Estimated total count: 1,130,000 +/- 0.2% with 95% confidence
  • 30. Counting quickly Step 4: Optimize ! Limit $maxScan to (100,000/numShards) to be even faster • ! Cache the random range for a few hours • ! Add more RAM (or shards) • ! Cache results to not hit the database for the same query •
  • 31. Counting quickly Step 5: Improve ! Get more than one count: use the aggregation framework on top of the population’s sample size
 • • Work around all sorts of Mongo bugs :-(
  • 32. Summarize • Pre-aggregated analytics • Create a document that represents event occurrences in some time period • Takes full advantage of MongoDB’s flexible schemas • Not a catch-all for analytics, you should still store event data
  • 33. Summarize • Counting quickly • Estimate results of arbitrary queries using population sample sizes • Depending on your app, this could be a great way to keep response time predictable as you scale