SlideShare a Scribd company logo
1 of 18
Download to read offline
BigML Inc
Today’s Webinar 
• Speaker: 
• Poul Petersen, CIO 
• Moderator: 
• Andrew Shikiar, VP Business Development 
• Enter questions into chat box – we’ll answer some 
via text; others at the end of the session 
• For direct follow-up, email us at info@bigml.com 
BigML Inc 2
Agenda 
1 
What’s New 
2 Anomaly Detection 
2 Coming Soon 
3 Questions 
BigML Inc 3
Model Clusters 
Use models to discover rules that describe clusters 
5 
6 
7 
3 1 
2 
4 
Spicy Body Nutty 
5.1 3.5 1.4 
2.6 3.5 
6.7 2.5 5.8 
… … … 
Spicy Body Nutty In 5? 
5.1 3.5 1.4 TRUE 
5.7 2.6 3.5 FALSE 
6.7 2.5 5.8 TRUE 
… … … … 
In Cluster 5? 
BigML Inc 4
Model Clusters 
• Dataset of 86 whiskies 
• Each whiskey scored on a scale from 0 to 4 
for each of 12 possible flavor characteristics. 
GOAL: Cluster the whiskies by flavor profile, then 
discover rules that distinguish the clusters from each 
other. 
BigML Inc 5
Missing Splits 
Missing: 
101010 
Real World Data 
… is messy 
x? 
• Define missing tokens: N/A, Null, etc 
• Filter out missing values 
• Add a new feature to replace missing values 
• Default numeric values in cluster 
• Proportional prediction for missing input data 
• Allow splits on missing values 
BigML Inc 6
Online Predictions 
• Single predictions 
• Computed in real-time using browser JS 
• JS will be open sourced 
• Available for models, ensembles, and clusters 
BigML Inc 7
Fast(er) Ensembles 
Fetch 
Dataset 
“F” secs 
Transform 
Dataset 
“T” secs 
Model 
Dataset 
“M” secs 
Store 
Model 
“S” secs 
Insight: if the dataset fits in memory, we can perform the 
fetch and transform steps once and model quickly in memory 
Old New Savings 
Number of 
Models “n” 
Time 
n * [ F + T + M + S ] F + T + n * [ M + S ] ( n - 1 ) * [ F + T ] 
BigML Inc 8
Anomaly Detection 
An unsupervised 
algorithm to find 
unusual data 
quickly and easily 
BigML Inc 9
Learning Tasks 
Trees (Supervised Learning) 
! 
Provide: labeled data 
Learning Task: be able to predict label 
Cluster (Unsupervised Learning) 
! 
Provide: unlabeled data 
Learning Task: group data by similarity 
Anomalies (Unsupervised Learning) 
! 
Provide: unlabeled data 
Learning Task: Rank data by dissimilarity 
BigML Inc 10
Learning Tasks 
sepal 
length 
sepal 
width 
petal 
length 
petal 
width 
species 
5.1 3.5 1.4 0.2 setosa 
5.7 2.6 3.5 1.0 versicolor 
6.7 2.5 5.8 1.8 virginica 
… … … … … 
Inputs “X” “Y” 
Learning Task: 
Find function “f” such that: 
f(X)≈Y 
sepal 
length 
sepal 
width 
petal 
length 
petal 
width 
5.1 3.5 1.4 0.2 
5.7 2.6 3.5 1.0 
6.7 2.5 5.8 1.8 
… … … … 
Learning Task: 
Find “k” clusters such that 
the data in each cluster is 
self similar 
sepal 
length 
sepal 
width 
petal 
length 
petal 
width 
5.1 3.5 1.4 0.2 
5.7 2.6 3.5 1.0 
6.7 2.5 5.8 1.8 
… … … … 
Learning Task: 
Assign value from 0 (similar) 
to 1 (dissimilar) to each 
instance. 
BigML Inc 11
Anomalies 
Isolation Forest: 
Grow a random decision tree until 
each instance is in its own leaf 
“easy” to isolate 
Depth 
“hard” to isolate 
Now repeat the process several times and 
use average Depth to compute anomaly 
score: 0 (similar) -> 1 (dissimilar) 
BigML Inc 12
cluster anomaly 
centroid anomalyscore 
+ 
+ 
batchcentroid batchanomalyscore 
BigML Inc 
13 
Workflow 
Clusters Anomalies 
ANOMALYSCORE 
DATASET 
+ 
CSV 
DATASET CLUSTER DATASET 
INSTANCE 
INSTANCE CENTROID 
DATASET 
+ 
CSV 
ANOMALY 
CLUSTER ANOMALY 
CLUSTER ANOMALY
Use Cases 
• Unusual instance discovery 
• Intrusion Detection 
• Fraud 
• Identify Incorrect Data 
• Remove Outliers 
• Model Competence / Input Data Drift 
BigML Inc 14
Anomalies 
• High dimensions - 10,000 fields 
• Mixed data: 
• numerical: 3.4 
• categorical: red, green, blue 
• date time: 2014-05-14T12:34:56 
Coming 
• unstructured text: “The quick brown fox…” 
• Computing anomaly score for new data 
• Using anomaly detectors programmatically 
BigML Inc 15
Coming Soon 
• Config panel for anomaly detection 
• Project Management 
• In-memory sample server 
• Dynamic scatterplots 
BigML Inc 16
Coming Soon 
BigML Inc 17
Get Started Today! 
RESOURCES Join us for future 
FEEDBACK 
webinars & hangouts 
info@bigml.com 
TWITTER @bigmlcom 
BigML Inc 18

More Related Content

More from BigML, Inc

DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsBigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object DetectionBigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image ProcessingBigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceBigML, Inc
 
Intelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Machine Learning in the Mobility IndustryIntelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Machine Learning in the Mobility IndustryBigML, Inc
 
Intelligent Mobility: Embedded Machine Learning, Damage Detection in Rail
Intelligent Mobility: Embedded Machine Learning, Damage Detection in RailIntelligent Mobility: Embedded Machine Learning, Damage Detection in Rail
Intelligent Mobility: Embedded Machine Learning, Damage Detection in RailBigML, Inc
 
Intelligent Mobility: Business Value of IoT and ML in Logistics
Intelligent Mobility: Business Value of IoT and ML in LogisticsIntelligent Mobility: Business Value of IoT and ML in Logistics
Intelligent Mobility: Business Value of IoT and ML in LogisticsBigML, Inc
 
Intelligent Mobility: The Added Value of Predictions for Transport Delivery
Intelligent Mobility: The Added Value of Predictions for Transport DeliveryIntelligent Mobility: The Added Value of Predictions for Transport Delivery
Intelligent Mobility: The Added Value of Predictions for Transport DeliveryBigML, Inc
 
Intelligent Mobility: From Last Mile to Long Distance Route Optimization
Intelligent Mobility: From Last Mile to Long Distance Route OptimizationIntelligent Mobility: From Last Mile to Long Distance Route Optimization
Intelligent Mobility: From Last Mile to Long Distance Route OptimizationBigML, Inc
 
Intelligent Mobility: Route to the Electric Future
Intelligent Mobility: Route to the Electric FutureIntelligent Mobility: Route to the Electric Future
Intelligent Mobility: Route to the Electric FutureBigML, Inc
 
Intelligent Mobility: Transforming Road Operations and Mobility with Computer...
Intelligent Mobility: Transforming Road Operations and Mobility with Computer...Intelligent Mobility: Transforming Road Operations and Mobility with Computer...
Intelligent Mobility: Transforming Road Operations and Mobility with Computer...BigML, Inc
 

More from BigML, Inc (20)

DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and ComplianceML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
ML in GRC: Cybersecurity versus Governance, Risk Management, and Compliance
 
Intelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Machine Learning in the Mobility IndustryIntelligent Mobility: Machine Learning in the Mobility Industry
Intelligent Mobility: Machine Learning in the Mobility Industry
 
Intelligent Mobility: Embedded Machine Learning, Damage Detection in Rail
Intelligent Mobility: Embedded Machine Learning, Damage Detection in RailIntelligent Mobility: Embedded Machine Learning, Damage Detection in Rail
Intelligent Mobility: Embedded Machine Learning, Damage Detection in Rail
 
Intelligent Mobility: Business Value of IoT and ML in Logistics
Intelligent Mobility: Business Value of IoT and ML in LogisticsIntelligent Mobility: Business Value of IoT and ML in Logistics
Intelligent Mobility: Business Value of IoT and ML in Logistics
 
Intelligent Mobility: The Added Value of Predictions for Transport Delivery
Intelligent Mobility: The Added Value of Predictions for Transport DeliveryIntelligent Mobility: The Added Value of Predictions for Transport Delivery
Intelligent Mobility: The Added Value of Predictions for Transport Delivery
 
Intelligent Mobility: From Last Mile to Long Distance Route Optimization
Intelligent Mobility: From Last Mile to Long Distance Route OptimizationIntelligent Mobility: From Last Mile to Long Distance Route Optimization
Intelligent Mobility: From Last Mile to Long Distance Route Optimization
 
Intelligent Mobility: Route to the Electric Future
Intelligent Mobility: Route to the Electric FutureIntelligent Mobility: Route to the Electric Future
Intelligent Mobility: Route to the Electric Future
 
Intelligent Mobility: Transforming Road Operations and Mobility with Computer...
Intelligent Mobility: Transforming Road Operations and Mobility with Computer...Intelligent Mobility: Transforming Road Operations and Mobility with Computer...
Intelligent Mobility: Transforming Road Operations and Mobility with Computer...
 

Recently uploaded

React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 

Recently uploaded (20)

React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 

BigML Late Summer 2014 Release Webinar - Anomaly Detection!

  • 2. Today’s Webinar • Speaker: • Poul Petersen, CIO • Moderator: • Andrew Shikiar, VP Business Development • Enter questions into chat box – we’ll answer some via text; others at the end of the session • For direct follow-up, email us at info@bigml.com BigML Inc 2
  • 3. Agenda 1 What’s New 2 Anomaly Detection 2 Coming Soon 3 Questions BigML Inc 3
  • 4. Model Clusters Use models to discover rules that describe clusters 5 6 7 3 1 2 4 Spicy Body Nutty 5.1 3.5 1.4 2.6 3.5 6.7 2.5 5.8 … … … Spicy Body Nutty In 5? 5.1 3.5 1.4 TRUE 5.7 2.6 3.5 FALSE 6.7 2.5 5.8 TRUE … … … … In Cluster 5? BigML Inc 4
  • 5. Model Clusters • Dataset of 86 whiskies • Each whiskey scored on a scale from 0 to 4 for each of 12 possible flavor characteristics. GOAL: Cluster the whiskies by flavor profile, then discover rules that distinguish the clusters from each other. BigML Inc 5
  • 6. Missing Splits Missing: 101010 Real World Data … is messy x? • Define missing tokens: N/A, Null, etc • Filter out missing values • Add a new feature to replace missing values • Default numeric values in cluster • Proportional prediction for missing input data • Allow splits on missing values BigML Inc 6
  • 7. Online Predictions • Single predictions • Computed in real-time using browser JS • JS will be open sourced • Available for models, ensembles, and clusters BigML Inc 7
  • 8. Fast(er) Ensembles Fetch Dataset “F” secs Transform Dataset “T” secs Model Dataset “M” secs Store Model “S” secs Insight: if the dataset fits in memory, we can perform the fetch and transform steps once and model quickly in memory Old New Savings Number of Models “n” Time n * [ F + T + M + S ] F + T + n * [ M + S ] ( n - 1 ) * [ F + T ] BigML Inc 8
  • 9. Anomaly Detection An unsupervised algorithm to find unusual data quickly and easily BigML Inc 9
  • 10. Learning Tasks Trees (Supervised Learning) ! Provide: labeled data Learning Task: be able to predict label Cluster (Unsupervised Learning) ! Provide: unlabeled data Learning Task: group data by similarity Anomalies (Unsupervised Learning) ! Provide: unlabeled data Learning Task: Rank data by dissimilarity BigML Inc 10
  • 11. Learning Tasks sepal length sepal width petal length petal width species 5.1 3.5 1.4 0.2 setosa 5.7 2.6 3.5 1.0 versicolor 6.7 2.5 5.8 1.8 virginica … … … … … Inputs “X” “Y” Learning Task: Find function “f” such that: f(X)≈Y sepal length sepal width petal length petal width 5.1 3.5 1.4 0.2 5.7 2.6 3.5 1.0 6.7 2.5 5.8 1.8 … … … … Learning Task: Find “k” clusters such that the data in each cluster is self similar sepal length sepal width petal length petal width 5.1 3.5 1.4 0.2 5.7 2.6 3.5 1.0 6.7 2.5 5.8 1.8 … … … … Learning Task: Assign value from 0 (similar) to 1 (dissimilar) to each instance. BigML Inc 11
  • 12. Anomalies Isolation Forest: Grow a random decision tree until each instance is in its own leaf “easy” to isolate Depth “hard” to isolate Now repeat the process several times and use average Depth to compute anomaly score: 0 (similar) -> 1 (dissimilar) BigML Inc 12
  • 13. cluster anomaly centroid anomalyscore + + batchcentroid batchanomalyscore BigML Inc 13 Workflow Clusters Anomalies ANOMALYSCORE DATASET + CSV DATASET CLUSTER DATASET INSTANCE INSTANCE CENTROID DATASET + CSV ANOMALY CLUSTER ANOMALY CLUSTER ANOMALY
  • 14. Use Cases • Unusual instance discovery • Intrusion Detection • Fraud • Identify Incorrect Data • Remove Outliers • Model Competence / Input Data Drift BigML Inc 14
  • 15. Anomalies • High dimensions - 10,000 fields • Mixed data: • numerical: 3.4 • categorical: red, green, blue • date time: 2014-05-14T12:34:56 Coming • unstructured text: “The quick brown fox…” • Computing anomaly score for new data • Using anomaly detectors programmatically BigML Inc 15
  • 16. Coming Soon • Config panel for anomaly detection • Project Management • In-memory sample server • Dynamic scatterplots BigML Inc 16
  • 18. Get Started Today! RESOURCES Join us for future FEEDBACK webinars & hangouts info@bigml.com TWITTER @bigmlcom BigML Inc 18