This document discusses how large datasets are needed to model rare behaviors, complex interactions, and build confidence in simpler models. While many models can perform well with only 100k records, leveraging billions of records allows addressing rare events through sufficient training data. It also enables comparing simple models to more complex "high bar" models to ensure the simple models are accurate enough. A case study on spawn prediction showed that while a complex grid-based model benefited from large data, a simpler distance-based model reached peak accuracy more quickly with less data. Overall, big data helps with challenges like rare events, complex feature interactions, and validating simpler models, but the essential information needs of a given model must be investigated.
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Leveraging Large Scale Datasets to Make the Most of In-game Decisions by Dylan Rogerson
1. The 100k Question:
Leveraging Large Scale
Datasets to Make The Most of
In-Game Decisions
Dylan Rogerson
Senior Data Scientist
2. Facing the Starkness of Reality…
OMG Vlad with our new environment we can build models off
of BILLIONS of records!!!
…ok, but why? Almost every model I build barely needs more
than 100k records.
But…but…I…BILLIONS!!! <long uncomfortable sullen silence>
Alright I’ll do some research.
3. Table of Contents
Are we actually leveraging our big data?
We need data to address rare behavior.
Big data can build confidence / Beyond Accuracy
Complex interactions & weird configurations.
Case Study: Predicting Spawn Behavior/ When you fundamentally
need big data.
4. Are we actually leveraging our big data?
Here’s a churn model for one of our titles.
5. Are we actually leveraging our big data?
Here’s a completely different model (predicting survey responses).
6. We Can Address Rare Behavior
To capture and predict rare
events we need ‘enough’
training data.
Examples: Fraud (Boosting)
Detection, Outlier Detection,
Spawns Outliers…
Example of boosting.
7. Focusing on Fraud Detection
Many different types (regular boosting, reverse boosting,
challenge boosting, leaderboard boosting).
Not just rare: Difficult to detect because their features are not
innately obvious.
Ok I cheated! You need big data to make the dataset but not to
train the model.
8. Big Data Can Build Confidence
Let’s go back to the churn model. Start off with loving crafted
intuition driven variables.
Most predictive power from player lifetime variables. Boss: Can
we brute force lifetime data?
Approach: Dataset with number of hours played per day for each
day in the observation window. Create ‘Complex Model’
0
1
2
3
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930
HoursPlayed
Day of Observation Period
Player Hours Per Day (Example Purposes Only)
Hours
9. Big Data Can Build Confidence
Result: We can’t implement it. ‘Complex’ model is bulky and
requires a lot of data, but very accurate ‘high bar’.
Inspecting Var importance in complex model led to new feature:
# Days played in preceding week.
New feature in simple model -> Nearly the same AUC as complex
model!
We can be confident that our ‘simple’ models (built on small data)
are good enough by comparing to feature rich ‘complex’ models
(which require more data).
10. Beyond Accuracy
Execution time, complexity, interpretability may force simpler
models. In-game models have to be lean.
Rare behavior: False positives
We also need to understand how our models develop over
time.
Digging into any of these requires more data.
11. Complex Interactions & Configurations
What’s the best match composition? Low quit rate.
Teams vs. Solo Players: The Eternal Struggle
Many permutations. Solution: dummy encode
composition.
Approach will take more data.
You’ll need to pay attention to big parties since
they’re more engaged players. Big parties are rare
events (more data).
For Example Purposes Only
12. Case Study: Predicting Spawn Behavior
A fun side project.
Design wants to be able to control the spawn experience with
even greater accuracy.
Can we predict short spawns? < 3 seconds or long spawns? > 30
seconds
Data shown in this section is for an older title. May not be
representative of current games.
14. Case Study: Predicting Spawn Behavior
First stab at the data: 30MM observations and less than 1MM
were targeted spawns.
Initial data was all positional: Team and Enemy coordinates.
Built 2 Models and compared predictive power.
First chance to leverage our big data architecture (WOOO)!
15. Case Study: Predicting Spawn Behavior
We created models in Spark using a Zeppelin notebook written
in either Scala or PySpark.
Different models were tried: Logistic Regression, Random
Forest and GBM
Language Hierarchy: Scala > PySpark > SparkR. Learn Scala
16. Case Study: Predicting Spawn Behavior
Gridded Model: Made a grid with 200 x 200 units and counted
how many enemies or allies were in a grid square.
Allows for complex positional interactions (cover, team spacing,
high ground).
Requires a significant amount of data.
17. Case Study: Predicting Spawn Behavior
Distance Model: Bucket enemy and ally distances by 200 units
and count.
Very simple approach: Just checking to see how far away
enemies and allies are.
Requires much less data.
18. Case Study: Predicting Spawn Behavior
End result: Distance Model > Gridded Model
Why? Distance Model hit peak accuracy early. Gridded had
more to go.
Verdict: For Gridded (big data) approach we don’t have enough
data.
This was only for 1 map. ~500MM data points for all maps.
19. Final Thoughts
We need big data to build confidence in simple models:
Complex models to compare to our simple assumptions.
Rare events require a lot of data to understand (and model).
Complex Interactions: Combinatorial, Acoustic and Spatial
problems require a lot of data.
Even so, thoughtful exploration of the data / feature creation
can make up for small datasets (you just won’t know it).
Investigate how much information your model actually needs.
Editor's Notes
We begin our story a few years ago when I was tasked to work on our new model development pipeline (in Spark). After getting super hyped about all the data we could process I turned to our most expert data scientist and this conversation transpired…
We’re collecting a ton of data, but are we actually leveraging it to build better models and fundamentally better decisions? This is a churn model from one of our older titles, but it’s a story I’ve seen time and time again. Here we see the learning curve for the model, showing a dataset only needs to be so big for the model to reach a plateau in accuracy. For this logistic regression that happens around 10k observations. To be safe we can say 100k.
Same story, different model. This GBM needs around 10k (maybe 50k - 100k) observations to build a competent initial model. So when do you need a lot of data?
I cheated. You need big data to make the dataset but you still may only need 100k observations to train the model.
Here’s where we at Activision find a good deal of value in big data.
On the bottom is an example of what the previously mentioned dataset might look like for an individual player.
For us in boosting detection false positive rates are very important. You don’t want to falsely accuse someone of cheating in your game!
In regards to how models change over time, we’ve had to change churn models throughout the year to account for different behavior.
Now for some weird considerations, stuff that might only be answerable with large coverage from a vast amount of data.
Here we see two examples of spawns. Red triangles are enemy players and green triangles are ally players. The player in question is a white triangle. Circles are some possible spawn points. Can you easily tell which one is good or bad? Example 2 might be better simply because teammates are closer. Additionally the alley viewing the enemy team is narrower and it’s easier to hop to cover. Positioning data is subtle.
Oh btw…all of those calculations were only for a single map, so feel free to multiply your problem by 12 unless you include DLC maps, or get mode specific or…. (~500MM data points for full models).
And now I guess I’ll turn it over to you. Please do look into how much data your model actually needs and ask yourself if you’re making the most of the data you have available. The results might surprise you.