PopRank

Team
Keilin Bickar
Ayush K Singh
Monisha Singh
Populous Player Rankings
Machine Learning Course Project
CS 6140 Spring ‘16
Instructor
Lu Wang

Problem Description
Populous: The Beginning is a strategy and
god-style video game, where teams of
players create settlements and battle to
destroy the opposition.
People play 1 vs 1 and 2 vs 2 games via a
matchmaking service.
Fair games are fun games so accurate
rankings are needed to make fair matches.

Problem Description - Goal
● Create league table and rankings based on game results
● Predict the results of future games based on rankings
● Improve accuracy of game predictions

Problem Description - Inputs
Data from “Populous: Reincarnated” game database
- games.csv
- One row per game played
- General information concerning game such as map and number of players
- game_details.csv
- One row per player per game played (two - four per game)
- Stats for how well a player did such as enemy buildings destroyed and ﬁghts lost
- users.csv
- One row per user, not much besides name
- game_pops.csv
- Populations of each player taken every 15 minutes of each game

Sample Input Data
Games.csv
"Id";"pack_id";"map_id";"time";"players";"rated"
"107299";"1";"0";"1330734968";"2";"11"
Game_details.csv
"game_id";"user_id";"player";"ping";"tribe";"status";"length";"start_allies";"allies";"ﬁghts_won";"ﬁghts_lost";"followers_killed";"buil
dings_destroyed";"shamans_killed";"followers_lost";"buildings_lost";"shaman_deaths";"pop0";"pop1";"pop2";"pop3"
"107299";"38680";"0";"443";"0";"A";"1338";"1";"1";"25";"25";"46";"3";"7";"80";"1";"5";"80";"30";"0";"0"
Users.csv
"User_id";"user_regdate";"username"
"38680";"1327504496";"ADVENTURER_XD"
Game_pops.csv:
"Game_id";"user_id";"sec";"pop0";"pop1";"pop2";"pop3"
"220327";"48618";"485";"23";"32";"16";"21"

Problem Description - Outputs
● Ranking of players ordered by skill level
● Point value assigned to each player to
be used for predictions
● Ranking system that can iteratively rank
new inputs
Rank Name Points
# 1 Alice 64
# 2 Bob 53
# 3 Charlie 29
# 4 Dan 15

Related Work
Traditional Rating System
● Players on winning team gain 1 point
● Ratings of the losing team remain unaffected
● This method is currently being used in the game of Populous.
● Only requires winners to report games
Simple Ranking
● Players on winning team gain 1 point
● Players on losing team lose 1 point
● Intuitive system, widely used

Related Work
Elo Rating System
● Rating process takes into account prior ratings of players
● Subtracts X points from loser and gives X points to winner
● Very widely used for 1 vs 1 matchups such as Chess
● Update calculations are very fast
Glicko2 Rating System
● Rating process takes into account prior ratings of players and their experience
● Stores 𝜇 (rating), 𝜎 (volatility), and 𝜙 (rating deviation) for each player
● More recently created and used in 1 vs 1 matchups

Related Work
TrueSkill Rating System:
● Rating process uses Bayesian inference to compare two team distribution
● Every time a player plays a game, the system accordingly changes the perceived skill of the player
and acquires more conﬁdence about this perception
● Stores 𝜇 (rating) and 𝜎 (uncertainty | variance) for each player
● The extent of actual updates depends on how "surprising" the outcome is to the system
● Designed to support teams of variable size along with ties and grows polynomially
● Assumes team performance is the sum of performance of the players
● Developed by Microsoft Research and used in Xbox matchmaking system

Our Work
Address shortcomings and find optimal model
● TrueSkill suffers from “rich get richer” problem with unbalanced teams (auto rating boost)
● TrueSkill handles ties in a naive fashion ignoring the complexity of the system
● We experimented with different models like
○ numerous values of of K for Elo
○ Same and Separate Feature Factors in Weighted Elo
○ Swapped constant used to standardize the logistic function in Glicko
○ Selected Trueskill and further experimented with Weighted TrueSkill
● Find the best model, use it to rate players and predict result of future games

Methodology - Preprocessing
● Data contained around 300,000 games
● Removed games with irregularities e.g. players crashing, etc
● Removed games with incomplete data e.g. 4 players games with data only
from 3 players
● Some games had spectators but were still valid, these were converted from a
4 player game with 2 spectators to a 2 player 1v1 game

Methodology - Preprocessing
● Mixture of 1v1, 2v2, 1v3, etc - stripped everything but 1v1 and 2v2
○ Unbalanced games hard to rate in complex ranking systems
○ Skills for 3 vs 1 game don’t translate to 1 vs 1 or 2 vs 2 games
● 136k Remaining games:
○ 50k - 1 vs 1 games
○ 86k - 2 vs 2 games
● 3 datasets stored separately for faster loading to run experiments
○ Disk IO the main contributor to load times so smaller sets were better
● Post preprocessing we were able to increase accuracy from 69 to 76%

Experiments: Datasets
● Ranking System
○ Traditional Ranking
○ Simple Ranking
○ Elo rating - Modiﬁed to support 2v2
○ Glicko2 Rating - Modiﬁed to support 2v2
○ TrueSkill Rating
● Features
○ Feature Selection based on Info Gain, Gain Ratio, Correlation Feature Selection
○ Feature Weights based on Perceptron Learning algorithm, SMO, Multilayer
perceptron with Backpropagation, and Logistic Regression

Experiments: Evaluation metrics
Ranking
● Traditional, Simpliﬁed, and Elo systems use native Points
● Glicko and TrueSkill use:
○ Points = 𝜇(rating) - 3𝜎(uncertainty)
● Players sorted by Points highest to lowest

Experiments: Evaluation metrics
Future Game Predictions
● Winning team predicted by selecting team with higher sum of Points
● Accuracy of Predictions
○ Accuracy = Correct Predictions / Total number of instances
○ Iterative calculations so all training data is used as test data
○ Order matters so cross-validation is cannot be used

Experiments: Baselines
Prediction Accuracies
League Full (136,144) 1 vs 1 (50,134) 2 vs 2 (86,010)
Traditional 0.678164296627 0.683827342722 0.66064411115
Simple 0.64075537666 0.651813140783 0.643448436228
Elo 0.739577212363 0.718574221087 0.737774677363
Glicko 0.72718592079 0.714664698608 0.714463434484
TrueSkill 0.756955870255 0.742968843499 0.758051389373

Experiment - Weighted Elo
● Elo is close in score to TrueSkill, but runs much faster
● Uses “K” value to decide how many points to move between teams
● K was weighted based on features in game details
● Features and factors selected one at a time by increasing/decreasing factor
until accuracy reached maximum
● Weighting was capped to prevent small/large values from exploding ranks

● Tested raw feature vs. ratio of winning team/losing team
○ Ratio was better
● Tested inverting ratio for winning/losing and losing/winning
○ Mixed results
● Tested adding factors to K vs. multiplying K by factors
○ Multiplying worked better
● Tested assigning different weight for winners and losers
○ Improved accuracy!

Notable improvements in score
Best feature was “shamans_killed” of winning team
League Full (136,144) 1 vs 1 (50,134) 2 vs 2 (86,010)
Baseline Elo 0.739577212363 0.718574221087 0.737774677363
Weighted Elo 0.750146903279 0.725136633821 0.749098941983

Experiment - Weighted TrueSkill
● TrueSkill starts out more accurate than Elo
● Has a built in weight for an update ranging from 0.0 - 1.0
● Using same feature/factor as Elo resulted in negligible improvements
● Running the same process to ﬁnd new feature/factors also resulted in
negligible improvements

Experiment - Weighted TrueSkill
Tested skewing the results to give weight to the player doing the most work in
2 vs 2 games. Accuracy of top four features after weighing:
Feature Score
Unweighted 0.758051389373
followers_killed 0.758783862342
fights_won 0.758714103011
shamans_killed 0.758109522149
buildings_destroyed 0.757923497268
Results overall were
positive, but small

Experiment - Value of Games
There is a hidden feature of game that is hard to calculate i.e. the value of how
helpful a game is for ranking players.
● Tested 1 vs 1 games using Elo (for speed)
● Removed one game from test, compared accuracy to baseline
● Resulting change very small, but enough to see positive/negative
● Values normalized and stored as boolean
● Can run algorithms to classify games based on value

BinarySMO
Machine linear: showing attribute weights, not support vectors.
-0.0003 * (normalized) map
+ -0.0076 * (normalized) length
+ 0.0025 * (normalized) fights_won
+ 0.0149 * (normalized) fights_lost
+ 0.0037 * (normalized) followers_killed
+ 0.0018 * (normalized) buildings_destroyed
+ 0.0001 * (normalized) shamans_killed
+ 0.0015 * (normalized) followers_lost
+ -0.0022 * (normalized) buildings_lost
+ 0.005 * (normalized) shaman_deaths
+ -0.0002 * (normalized) fights_won
+ -0.0004 * (normalized) fights_lost
+ -0.0001 * (normalized) followers_killed
+ -0.0003 * (normalized) shamans_killed
+ 0.0001 * (normalized) followers_lost
+ 0.0004 * (normalized) fights_won
+ -0.0001 * (normalized) fights_lost
+ 0.0009 * (normalized) followers_killed
+ 0.0004 * (normalized) shamans_killed
+ 0 * (normalized) followers_lost
- 1.0003
However, everything classified as bad:
=== Confusion Matrix ===
a b <-- classified as
16405 0 | a = True
13669 0 | b = False
Some insight into most influential factors -
“fights_lost” is largest

Tested weighting 1 vs. 1 games in TrueSkill using values found:
Shows some positive results
Tested weighting all games in TrueSkill using values found:
Results against full dataset slightly negative
1v1 Base 0.742968843499
Weighting on fights_lost 0.743287988192
All Base 0.756955870255
Weighting on fights_lost 0.756698789517
Weighting on fights_lost only for 1v1 0.756882418616

Results
● Experimenting with different parameters did not result in quantitative
accuracy
● Overall we were able to predict outcome 8% better than the traditional system
● Found some features to be more important in the gameplay than others
● Model takes into account all priors so player’s ﬁrst game is also a part of
rating
Thank You!

PopRank

Recommended

Recommended

More Related Content

Similar to PopRank

Similar to PopRank (20)

Recently uploaded

Recently uploaded (20)

PopRank