BigML Late Summer 2014 Release Webinar - Anomaly Detection!

Today’s Webinar
• Speaker:
• Poul Petersen, CIO
• Moderator:
• Andrew Shikiar, VP Business Development
• Enter questions into chat box – we’ll answer some
via text; others at the end of the session
• For direct follow-up, email us at info@bigml.com
BigML Inc 2

Agenda
1
What’s New
2 Anomaly Detection
2 Coming Soon
3 Questions
BigML Inc 3

Model Clusters
Use models to discover rules that describe clusters
5
6
7
3 1
2
4
Spicy Body Nutty
5.1 3.5 1.4
2.6 3.5
6.7 2.5 5.8
… … …
Spicy Body Nutty In 5?
5.1 3.5 1.4 TRUE
5.7 2.6 3.5 FALSE
6.7 2.5 5.8 TRUE
… … … …
In Cluster 5?
BigML Inc 4

Model Clusters
• Dataset of 86 whiskies
• Each whiskey scored on a scale from 0 to 4
for each of 12 possible flavor characteristics.
GOAL: Cluster the whiskies by flavor profile, then
discover rules that distinguish the clusters from each
other.
BigML Inc 5

Missing Splits
Missing:
101010
Real World Data
… is messy
x?
• Define missing tokens: N/A, Null, etc
• Filter out missing values
• Add a new feature to replace missing values
• Default numeric values in cluster
• Proportional prediction for missing input data
• Allow splits on missing values
BigML Inc 6

Online Predictions
• Single predictions
• Computed in real-time using browser JS
• JS will be open sourced
• Available for models, ensembles, and clusters
BigML Inc 7

Fast(er) Ensembles
Fetch
Dataset
“F” secs
Transform
Dataset
“T” secs
Model
Dataset
“M” secs
Store
Model
“S” secs
Insight: if the dataset fits in memory, we can perform the
fetch and transform steps once and model quickly in memory
Old New Savings
Number of
Models “n”
Time
n * [ F + T + M + S ] F + T + n * [ M + S ] ( n - 1 ) * [ F + T ]
BigML Inc 8

Anomaly Detection
An unsupervised
algorithm to find
unusual data
quickly and easily
BigML Inc 9

Learning Tasks
Trees (Supervised Learning)
!
Provide: labeled data
Learning Task: be able to predict label
Cluster (Unsupervised Learning)
!
Provide: unlabeled data
Learning Task: group data by similarity
Anomalies (Unsupervised Learning)
!
Provide: unlabeled data
Learning Task: Rank data by dissimilarity
BigML Inc 10

Learning Tasks
sepal
length
sepal
width
petal
length
petal
width
species
5.1 3.5 1.4 0.2 setosa
5.7 2.6 3.5 1.0 versicolor
6.7 2.5 5.8 1.8 virginica
… … … … …
Inputs “X” “Y”
Learning Task:
Find function “f” such that:
f(X)≈Y
sepal
length
sepal
width
petal
length
petal
width
5.1 3.5 1.4 0.2
5.7 2.6 3.5 1.0
6.7 2.5 5.8 1.8
… … … …
Learning Task:
Find “k” clusters such that
the data in each cluster is
self similar
sepal
length
sepal
width
petal
length
petal
width
5.1 3.5 1.4 0.2
5.7 2.6 3.5 1.0
6.7 2.5 5.8 1.8
… … … …
Learning Task:
Assign value from 0 (similar)
to 1 (dissimilar) to each
instance.
BigML Inc 11

Anomalies
Isolation Forest:
Grow a random decision tree until
each instance is in its own leaf
“easy” to isolate
Depth
“hard” to isolate
Now repeat the process several times and
use average Depth to compute anomaly
score: 0 (similar) -> 1 (dissimilar)
BigML Inc 12

cluster anomaly
centroid anomalyscore
+
+
batchcentroid batchanomalyscore
BigML Inc
13
Workflow
Clusters Anomalies
ANOMALYSCORE
DATASET
+
CSV
DATASET CLUSTER DATASET
INSTANCE
INSTANCE CENTROID
DATASET
+
CSV
ANOMALY
CLUSTER ANOMALY
CLUSTER ANOMALY

Use Cases
• Unusual instance discovery
• Intrusion Detection
• Fraud
• Identify Incorrect Data
• Remove Outliers
• Model Competence / Input Data Drift
BigML Inc 14

Anomalies
• High dimensions - 10,000 fields
• Mixed data:
• numerical: 3.4
• categorical: red, green, blue
• date time: 2014-05-14T12:34:56
Coming
• unstructured text: “The quick brown fox…”
• Computing anomaly score for new data
• Using anomaly detectors programmatically
BigML Inc 15

Coming Soon
• Config panel for anomaly detection
• Project Management
• In-memory sample server
• Dynamic scatterplots
BigML Inc 16

Get Started Today!
RESOURCES Join us for future
FEEDBACK
webinars & hangouts
info@bigml.com
TWITTER @bigmlcom
BigML Inc 18

BigML Late Summer 2014 Release Webinar - Anomaly Detection!

Recommended

Recommended

More Related Content

More from BigML, Inc

More from BigML, Inc (20)

Recently uploaded

Recently uploaded (20)

BigML Late Summer 2014 Release Webinar - Anomaly Detection!