SlideShare a Scribd company logo
1 of 55
Download to read offline
Practical Issues in Machine Learning
-how to create an effective ML model?
Prof. Shou-de Lin (林守德)
CSIE, National Taiwan University
sdlin@csie.ntu.edu.tw
Diagnose a Machine Learning Solution
-Why My ML Model Doesn’t Work?
Prof. Shou-de Lin (林守德)
CSIE, National Taiwan University
sdlin@csie.ntu.edu.tw
Machine Discovery and Social Network Mining Lab,
CSIE, NTU
• PI: Shou-de Lin
– B.S. in NTUEE
– M.S. in EECS, UM
– M.S. in Computational Linguistics, USC
– Ph.D. in CS, USC
– Postdoc in Los Alamos National Lab
• Courses:
– Machine Learning and Data Mining-
Theory and Practice
– Machine Discovery
– Social network Analysis
– Technical Writing and Research
Method
– Probabilistic Graphical Model
• Awards:
– All-time ACM KDD Cup Champion
(2008, 2010, 2011, 2012, 2013)
– Google Research Award 2008
– Microsoft Research Award 2009
– Best Paper Award WI2003, TAAI 2010,
ASONAM 2011, TAAI 2014
– US Areospace AROAD Research Grant
Award 2011, 2013, 2014, 2015, 2016
Machine
Learning
with Big
Data
Machine
Discovery
Learning IoT
Data
Applications in
NLP&SNA&
Recommender
Practical
Issues in
Learning
3
Talk Materials Based on Hands-On Experience
in Solving ML Tasks, Including
 Participating ACM KDD Cup for 6 years
 >20 Industrial collaboration
 Visiting Scholar in Microsoft from 2015~2016
Team NTU’s Performance on ACM KDD Cup
KDD Cups 2008 2009 2010 2011 2012 2013
Organizer Siemens Orange
PSLC
Datashop
Yahoo! Tencent Microsoft
Topic
Breast Cancer
Prediction
User Behavior
Prediction
Learner
Performance
Prediction
Recommendation
Internet
advertising
(track 2)
Author-paper &
Author name
Identification
Data Type Medical Telcom Education Music
Search Engine
Log
Academic Search
Data
Challenge
Imbalance
Data
Heterogeneous
Data
Time-
dependent
instances
Large Scale
Temporal +
Taxonomy Info
Click through
rate prediction
Alias in names
# of
records
0.2M 0.1M 30M 300M 155M
250K Authors,
2.5M papers
# of teams >200 >400 >100 >1000 >170 >700
Our Record Champion 3rd
place Champion Champion Champion Champion
5
Industrial Collaboration
企業產學合作
異質探勘技術 Google 2009
結合社群模型與遊戲激
勵機制
IBM 2015
空氣品質自動偵測 Microsoft 2014
感測網路內容分析 INTEL 2011-2015
分散式學習 US Airforce 2011-2017
收視率預測以及交通狀
況預測
資策會 2011-2015
音樂歌詞情緒辨識 中華電信 2014
銀行履歷推薦分析 104人力銀行 2011-2013
客服自動系統 趨勢科技 2016-
大數據徵信及生命週期
預估
KPMG安侯建業 2015
資料探勘顧問 台達電子 2015
技術轉移與技術服務
ranked-based matrix
factorization 工研院資通所 2013
Multi-task Learning Models 工研院巨資中心 2015
rankNet 工研院資通所 2013
Multi-label Classification
Tools
工研院巨資中心 2013
社群網路技術服務
MobiApp 2012
商品推薦服務技術合作
i-True 2012
What is Machine Learning?
General Def of Machine Learning (ML)
• ML tries to optimize a performance criterion
using example data or past experience.
• Mathematically speaking: given data X, we want
to learn a function mapping f(X) for certain
purpose, e.g.
– f(x)= a label y  classification
– f(x)= a set Y in X  clustering
– f(x)=p(x)  distribution estimation
– …
• ML techniques tell us how to produce high
quality f(x), given certain objective and
evaluation metrics
8
Why My Machine Learning Models Fail
(meaning prediction accuracy is low)?
A series of analyses are required to understand why
The ML Diagnose Tree
10
Is it ML?
NoTry other
solutions
Correct ML
Scenario?
NoTry Other
Scenario
Have you
identified a
suitable model?’
Yes Do you
have enough
data?
Yes  Is your
model too
complicated?
Right Feature?
Yes  Performed
feature
engineering?
Suitable
Evaluation
Metrics?
Proper validation
set?
Diagnose 1: Is it an ML task?
• Are you sure Machine Learning is the best
solution for your task?
To ML or not to ML, that is the question !!
11
• X come from a close set with limited
variation
– simply memorize all possible XY mappings
– E.g. Word translation using dictionary
• F(x) can be easily derived by writing rules
– E.g. compression/de-compression
• X is (sort of) independent of Y
– E.g. X<ID, name, wealth>, YHeight
• f(x) is not smooth, or f(x+X) y+ y
Tasks Doubtful for ML
Too hard!!
Too easy!!
12
OK, my task belongs to ML-solvable, but I
still failed
The ML Diagnose Tree
Is it ML?
NoTry other
solutions
Correct ML
Scenario?
NoTry Other
Scenario
Have you
identified a
suitable model?’
Yes Do you
have enough
data?
Yes  Is your
model too
complicated?
Right Feature?
Yes  Performed
feature
engineering?
Suitable
Evaluation
Metrics?
Proper validation
set?
14
Diagnose 2: Which ML Scenario?
• Have you modeled your task into the right ML
scenario?
– ML  Classification/Regression  SVM, DNN, DT
• Which ML toolbox should you choose?
Machine Learning
Supervised
Learning
Classification
SVM
15
Let’s Talk About….. Understanding
Machine Learning in 10 Mins
What
Can Do
A variety of ML Scenarios
Supervised
Learning
Classification
& Regression
Multi-label
learning
Multi-
instance
learning
Cost-sensitive
leering
Semi-supervised
learning
Active
learning
Unsupervised
Learning
Clustering
Learning Data
Distribution
Pattern
Learning
Reinforcement
learning
Variations
Transfer
learning
Online
Learning
Distributed
Learning
17
Supervised Learning
• Given: a set of <input X, output Y> pairs
• Goal: given an unseen input, predict the corresponding
output
• There are two kinds of outputs
– Categorical: classification problem
• Binary classification vs. Multiclass classification
– Real values: regression problem
• Example:
1. Binary classification: the sensor information, output:
whether such sensor is broken (INTEL)
2. Multi-class classification: Lyric of a song, output: the mood:
happy/sad/surprise/angry (CHT)
3. Regression: the weather/traffic/air condition, output: PM
2.5 value (Microsoft Research Asia) 18
Cost-sensitive Learning
• A classification task with non-uniform cost for
different types of classification error.
• Goal: To predict the class C* that minimizes the
expected cost rather than the misclassification rate
• An example cost matrix L : medical diagnosis
Ljk
Actual Cancer Actual Normal
Predict Cancer 0 1
Predict Normal 10000 0
19
A variety of ML Scenarios
Supervised
Learning
Classification
& Regression
Multi-label
learning
Multi-
instance
learning
Cost-sensitive
leering
Semi-supervised
learning
Active
learning
Unsupervised
Learning
Clustering
Learning Data
Distribution
Pattern
Learning
Reinforcement
learning
Variations
Transfer
learning
Online
Learning
Distributed
Learning
20
Unsupervised Learning
• Learning without teachers (presumably harder than
supervised learning)
– Learning “what normally happens”
– E.g. babies learn their first language (unsupervised) vs.
how people learn their 2nd language (supervised).
• Given: a bunch of input X (there is no output Y)
• Goal: depending on the tasks, for example
– Estimate P(X)  then we can find augmax P(X) Bayesian
– Finding P(X2|X1) we can know the dependency between
inputs  Association Rule, Probabilistic Graphical Model
– Finding Sim(X1,X2)  then we can group similar X’s 
clustering
21
A variety of ML Scenarios
Supervised
Learning
Classification
& Regression
Multi-label
learning
Multi-
instance
learning
Cost-sensitive
leering
Semi-supervised
learning
Active
learning
Unsupervised
Learning
Clustering
Learning Data
Distribution
Pattern
Learning
Reinforcement
learning
Variations
Transfer
learning
Online
Learning
Distributed
Learning
22
Reinforcement Learning (RL)
• RL is a “decision
making” process.
– How an agent should
make decision to
maximize the long-
term rewards
• RL is associated with a
sequence of states X
and actions Y (i.e.
think about Markov
Decision Process) with
certain “rewards”.
• It’s goal is to find an
optimal policy to
guide the decision. Figure from Mark Chang
23
• 1st Stage:天下棋手為我師 (Supervised Learning)
– Data: 過去棋譜
– Learning: f(X)=Y, X: 盤面, Y:next move
– Results: AI can play Go now, but not an expert
• 2nd Stage:超越過去的自己(Reinforcement
Learning)
– Data: generating from playing with 1st Stage AI
– Learning: Observation盤面, rewardif win,
action next move
AlphaGo: SL+RL
24
Key: Finding which ML Scenario
Best Suits the Current Task
• CTR: the ratio that users click into a displayed
advertisement
• It looks like a regression task, but indeed it is
not.
– CTR= #click/#view  3/10 =? 300/1000
– Better solution: treating it as a binary classification,
transfer 3/10 to 3 positive cases and 7 negative
cases
Case Study: Click Through Rate
Prediction (KDD Cup 2012)
26
FAQ: I have a lot of Data, but they are
not labelled
• This is by far the most common question I have
been asked.
• My honest answer: try to get them labelled
because you anyway need the ground truth for
evaluation.
• In several cases, labelling are too costly
– Semi-supervised solution
– Transfer learning (using labelled data in other domains)
– Active learning (query a subset of labels)
28
A variety of ML Scenarios
Supervised
Learning
Classification
& Regression
Multi-label
learning
Multi-
instance
learning
Cost-sensitive
leering
Semi-supervised
learning
Active
learning
Unsupervised
Learning
Clustering
Learning Data
Distribution
Pattern
Learning
Reinforcement
learning
Variations
Transfer
learning
Online
Learning
Distributed
Learning
29
Semi-supervised Learning
• Only a small
portion of data
are annotated
(usually due to
high annotation
cost)
• Leverage the
unlabeled data
for better
performance
[Zhu2008]
30
Transfer Learning
Learning Process of
Traditional ML
Learning Process of
Transfer Learning
training items
Learning System Learning System
training items
Learning System
Learning SystemKnowledge
Improving a learning task via incorporating
knowledge from learning tasks in other
domains (with different feature space and data
distribution).
More labeled
data
Fewer labeled
data
31Pan et al.
A Label for that Example
Request for the Label of another Example
A Label for that Example
Request for the Label of an Example
Active Learning
Unlabeled
Data
...
Algorithm outputs a classifier
Learning
Algorithm
Expert / Oracle
• Achieves better learning with fewer labeled training data via
actively selecting a subset of unlabeled data to be annotated
32
OK, I have identified a suitable ML
scenario, but my model still doesn’t work
The ML Diagnose Tree
Is it ML?
NoTry other
solutions
Correct ML Scenario?
NoTry Other
Scenario
Have you
identified a
suitable model?’
Yes Do you
have enough
data?
Yes  Is your
model too
complicated?
Right Feature?
Yes  Performed
feature
engineering?
Suitable
Evaluation
Metrics?
Proper validation
set?
34
Did you choose a proper model?
• A proper model considers
– The size of data
• Small data  linear (or simpler) model
• Large data  linear model or non-linear model
– The sparsity of data
• Sparse data  more tricks to perform better and faster
• Dense data  requires light algorithm that consumes less
memory
– The balance condition of data
• Imbalanced data  special treatment for minority class
– The quality of data (whether there are noise, missing
values, etc)
• Some loss function (e.g. 0/1 loss or L2) are more robust to
noise than others (e.g. hinge loss or exponential loss)
35
Do you have enough data to train the
given model?
• Draw the learning curve to understand
whether your data is sufficient
Data Size
Accuracy Enough
Data
Need
more
data!!
36
Is your model too complicated?
• Increasing the model complexity and check for
the training/testing performance
– E.g. increasing the latent dimension k in matrix
factorization
Model Complexity
37
FAQ: How to avoid overfitting?
– Occam's Razor: simpler model first
– Regularization: a way to constraint the complexity
of the model
– Carefully design your experiment (see later)
– Being aware of the drifting phenomenon
Overfitting: the Biggest Enemy of ML !!
38
I still cannot do well given a reasonable
model is identified
The ML Diagnose Tree
Is it ML?
NoTry other
solutions
Correct ML Scenario?
NoTry Other
Scenario
Have you
identified a
suitable model?’
Yes Do you
have enough
data?
Yes  Is your
model too
complicated?
Right Feature?
Yes  Performed
feature
engineering?
Suitable
Evaluation
Metrics?
Proper validation
set?
40
• Have you identified all potentially useful
features (X)?
Diagnose : Quality of Features X
41
• Use domain knowledge and human
judgement to determine which features to
obtain for training.
• The rule of thumb: If you don’t know whether
a feature is useful, then try it!!
– Human judgement can be misleading
Which Information shall be extracted
as Features?
Let ML algorithms (e.g. feature selection) determine
whether certain information is useful !!
42
Feature Engineering
• Feature engineering is arguably the best
strategy to improve the performance.
• The goal is to explicitly reveal important
information to the model
– domain knowledge might or might not be useful
• Original features  different encoding of the
features  combined features
43
FAQ: What shall I do if I absolutely
need to boost the accuracy
• Combining a variety of models:
– If the models are diverse enough with similar
performance, there is a high chance the
performance will be boosted.
• Additional validation data are needed to learn
the ensemble weights.
44
I am really happy that my ML model
FINALLY works so well!!
The ML Diagnose Tree
Is it ML?
NoTry other
solutions
Correct ML Scenario?
NoTry Other
Scenario
Have you
identified a
suitable model?’
Yes Do you
have enough
data?
Yes  Is your
model too
complicated?
Right Feature?
Yes  Performed
feature
engineering?
Suitable
Evaluation
Metrics?
Proper validation
set?
46
Is your model evaluated correctly?
• Accuracy is NOT always the best way to
evaluate a machine learning model
47
Case Study: An 99.999% accurate system
in Detecting Malicious Personnel
• Randomly picked person  not likely a
terrorist.
• Thus, a model that always guess ‘non-terrorist’
will achieve very high accuracy
– But it is useless !!
• “Area under ROC Curve” (or AUC) is generally
used to evaluate such system.
48
Have the Right Evaluation Data Been
Used?
• We normally divide labeled data (or ground truth
data) into three parts: training, validation (or
heldout), and testing
• Performance on training data is obviously biased
since the model was constructed to fit this data.
– Accuracy must be evaluated on an independent
(usually disjoint) test set.
– Cannot peak the test set labels!!
• Use validation set to adjust hyper-parameters
How the validation set is chosen can affect the performance of the model!!
Training Validation Testing
49
Case Study: KDD Cup 2008 (identify
potential cancer patients)
• Training set: a set of positive and negative
instances (each contains 118 features)
– Each positive patient contain a set of negative
instances (i.e. an ROI in the X-ray) and at least one
positive instances.
– ALL instances in a negative patient are negative
– It’s a multi-instance classification problem.
• Random division for CV:
– training: 90%, testing: 72%  peeking the testing
• Patient-based CV:
– training: 80%, testing: 77%
50
1. Determine whether it shall be model as an
ML task
2. Model the task into a suitable ML scenario
3. Given the known scenario, determine a
suitable model
4. Feature Engineering
5. Parameter learning
6. Evaluation (e.g. cross validation)
7. Apply to unseen data
Take Home Point: How to Solve My
Task Using ML?
51
Final Remark: The (weak) AI Era Has Arrived
Big
Data
Knowledge
Representation
Machine
Learning
AI
科學人2015
52
So, What’s Next for ML & AI?
-My three bold predictions about the future
Halle Tecco
On AI Revolution
Machine will take over some non-trivial jobs
that are used to be conducted by human beings
– Those jobs (unfortunately?) include data analytics
54
On Decision Making
• The decision makers will gradually shift from
humans to machines
55科學人2015
On the Next of Machine Learning
• The evolution of human intelligence:
memorizing
learning
discovering
• Machine Discovery (Shou-De Lin, Fall 2016,
NTU CSIE)
Machine
Machine
Machine
56
Thank you for your attention !!

More Related Content

What's hot

A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningHaptik
 
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...Edge AI and Vision Alliance
 
Machine learning the next revolution or just another hype
Machine learning   the next revolution or just another hypeMachine learning   the next revolution or just another hype
Machine learning the next revolution or just another hypeJorge Ferrer
 
Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedJonathan Mugan
 
Introduction to machine learning and applications (1)
Introduction to machine learning and applications (1)Introduction to machine learning and applications (1)
Introduction to machine learning and applications (1)Manjunath Sindagi
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...ananth
 
Machine Learning Overview
Machine Learning OverviewMachine Learning Overview
Machine Learning OverviewMykhailo Koval
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning AnalyticsXavier Ochoa
 
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...Aseda Owusua Addai-Deseh
 
Machine learning on Hadoop data lakes
Machine learning on Hadoop data lakesMachine learning on Hadoop data lakes
Machine learning on Hadoop data lakesDataWorks Summit
 
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...Natalia Díaz Rodríguez
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onDony Riyanto
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningPruet Boonma
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksBICA Labs
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionDony Riyanto
 
Big-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunitiesBig-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunities台灣資料科學年會
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentationDavid Raj Kanthi
 

What's hot (20)

A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine Learning
 
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
 
Machine learning the next revolution or just another hype
Machine learning   the next revolution or just another hypeMachine learning   the next revolution or just another hype
Machine learning the next revolution or just another hype
 
Moving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow ExtendedMoving Your Machine Learning Models to Production with TensorFlow Extended
Moving Your Machine Learning Models to Production with TensorFlow Extended
 
Introduction to machine learning and applications (1)
Introduction to machine learning and applications (1)Introduction to machine learning and applications (1)
Introduction to machine learning and applications (1)
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
 
Machine Learning Overview
Machine Learning OverviewMachine Learning Overview
Machine Learning Overview
 
Multimodal Learning Analytics
Multimodal Learning AnalyticsMultimodal Learning Analytics
Multimodal Learning Analytics
 
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
 
Machine learning on Hadoop data lakes
Machine learning on Hadoop data lakesMachine learning on Hadoop data lakes
Machine learning on Hadoop data lakes
 
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An Introduction
 
Big-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunitiesBig-data analytics: challenges and opportunities
Big-data analytics: challenges and opportunities
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
 
Icml2017 overview
Icml2017 overviewIcml2017 overview
Icml2017 overview
 

Similar to Here are some suggestions for diagnosing why your machine learning model is still not working well even after identifying the right ML scenario:- Do you have enough labeled training data? Many ML algorithms require large amounts of high quality labeled data to learn effectively. Lack of sufficient data is a common cause of poor performance. - Is your model too complex for the amount of data you have? Complex models like deep neural networks need huge datasets to avoid overfitting. Simpler models may generalize better with limited data.- Are you using the right features? Make sure the features you are using as input to the model are relevant to the prediction task and capture the underlying patterns in the data. Feature engineering is important

Chapter01.ppt
Chapter01.pptChapter01.ppt
Chapter01.pptbutest
 
Week_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxWeek_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxmuhammadsamroz
 
Engineering Intelligent Systems using Machine Learning
Engineering Intelligent Systems using Machine Learning Engineering Intelligent Systems using Machine Learning
Engineering Intelligent Systems using Machine Learning Saurabh Kaushik
 
Machine Learning - Lecture1.pptx.pdf
Machine Learning - Lecture1.pptx.pdfMachine Learning - Lecture1.pptx.pdf
Machine Learning - Lecture1.pptx.pdfNsitTech
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statisticsSpotle.ai
 
Overview of machine learning
Overview of machine learning Overview of machine learning
Overview of machine learning SolivarLabs
 
Data Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesData Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesYuchen Zhao
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsXavier Amatriain
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureIvo Andreev
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Jeet Das
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningSSSSSS354882
 
PyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darknessPyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darknessChia-Chi Chang
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfAschalewAyele2
 

Similar to Here are some suggestions for diagnosing why your machine learning model is still not working well even after identifying the right ML scenario:- Do you have enough labeled training data? Many ML algorithms require large amounts of high quality labeled data to learn effectively. Lack of sufficient data is a common cause of poor performance. - Is your model too complex for the amount of data you have? Complex models like deep neural networks need huge datasets to avoid overfitting. Simpler models may generalize better with limited data.- Are you using the right features? Make sure the features you are using as input to the model are relevant to the prediction task and capture the underlying patterns in the data. Feature engineering is important (20)

LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
Chapter01.ppt
Chapter01.pptChapter01.ppt
Chapter01.ppt
 
Week_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptxWeek_1 Machine Learning introduction.pptx
Week_1 Machine Learning introduction.pptx
 
Engineering Intelligent Systems using Machine Learning
Engineering Intelligent Systems using Machine Learning Engineering Intelligent Systems using Machine Learning
Engineering Intelligent Systems using Machine Learning
 
Machine_Learning.pptx
Machine_Learning.pptxMachine_Learning.pptx
Machine_Learning.pptx
 
Machine Learning - Lecture1.pptx.pdf
Machine Learning - Lecture1.pptx.pdfMachine Learning - Lecture1.pptx.pdf
Machine Learning - Lecture1.pptx.pdf
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statistics
 
Overview of machine learning
Overview of machine learning Overview of machine learning
Overview of machine learning
 
Data Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world ChallengesData Science in Industry - Applying Machine Learning to Real-world Challenges
Data Science in Industry - Applying Machine Learning to Real-world Challenges
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
 
Lec1 intoduction.pptx
Lec1 intoduction.pptxLec1 intoduction.pptx
Lec1 intoduction.pptx
 
Week 1.pdf
Week 1.pdfWeek 1.pdf
Week 1.pdf
 
Machine Learning - Deep Learning
Machine Learning - Deep LearningMachine Learning - Deep Learning
Machine Learning - Deep Learning
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 
Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)Lecture 09(introduction to machine learning)
Lecture 09(introduction to machine learning)
 
ML
MLML
ML
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
PyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darknessPyData SF 2016 --- Moving forward through the darkness
PyData SF 2016 --- Moving forward through the darkness
 
MachineLearning_AishwaryaCR
MachineLearning_AishwaryaCRMachineLearning_AishwaryaCR
MachineLearning_AishwaryaCR
 
Chapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdfChapter 4 Classification in data sience .pdf
Chapter 4 Classification in data sience .pdf
 

More from 台灣資料科學年會

[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用台灣資料科學年會
 
[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告台灣資料科學年會
 
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰台灣資料科學年會
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機台灣資料科學年會
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機台灣資料科學年會
 
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話台灣資料科學年會
 
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇台灣資料科學年會
 
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 [TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 台灣資料科學年會
 
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵台灣資料科學年會
 
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用台灣資料科學年會
 
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告台灣資料科學年會
 
[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話台灣資料科學年會
 
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人台灣資料科學年會
 
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維台灣資料科學年會
 
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察台灣資料科學年會
 
[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰台灣資料科學年會
 
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT台灣資料科學年會
 
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達台灣資料科學年會
 
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳台灣資料科學年會
 

More from 台灣資料科學年會 (20)

[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用[台灣人工智慧學校] 人工智慧技術發展與應用
[台灣人工智慧學校] 人工智慧技術發展與應用
 
[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告[台灣人工智慧學校] 執行長報告
[台灣人工智慧學校] 執行長報告
 
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
[台灣人工智慧學校] 工業 4.0 與智慧製造的發展趨勢與挑戰
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
 
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
[台灣人工智慧學校] 開創台灣產業智慧轉型的新契機
 
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
[台灣人工智慧學校] 台北總校第三期結業典禮 - 執行長談話
 
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
[TOxAIA台中分校] AI 引爆新工業革命,智慧機械首都台中轉型論壇
 
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察 [TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
[TOxAIA台中分校] 2019 台灣數位轉型 與產業升級趨勢觀察
 
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
[TOxAIA台中分校] 智慧製造成真! 產線導入AI的致勝關鍵
 
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用[台灣人工智慧學校] 從經濟學看人工智慧產業應用
[台灣人工智慧學校] 從經濟學看人工智慧產業應用
 
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
[台灣人工智慧學校] 台中分校第二期開學典禮 - 執行長報告
 
台灣人工智慧學校成果發表會
台灣人工智慧學校成果發表會台灣人工智慧學校成果發表會
台灣人工智慧學校成果發表會
 
[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話[台中分校] 第一期結業典禮 - 執行長談話
[台中分校] 第一期結業典禮 - 執行長談話
 
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
 
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
[TOxAIA新竹分校] AI整合是重點! 竹科的關鍵轉型思維
 
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
[TOxAIA新竹分校] 2019 台灣數位轉型與產業升級趨勢觀察
 
[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰[TOxAIA新竹分校] 深度學習與Kaggle實戰
[TOxAIA新竹分校] 深度學習與Kaggle實戰
 
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
[台灣人工智慧學校] Bridging AI to Precision Agriculture through IoT
 
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
[2018 台灣人工智慧學校校友年會] 產業經驗分享: 如何用最少的訓練樣本,得到最好的深度學習影像分析結果,減少一半人力,提升一倍品質 / 李明達
 
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
[2018 台灣人工智慧學校校友年會] 啟動物聯網新關鍵 - 未來由你「喚」醒 / 沈品勳
 

Recently uploaded

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 

Recently uploaded (20)

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 

Here are some suggestions for diagnosing why your machine learning model is still not working well even after identifying the right ML scenario:- Do you have enough labeled training data? Many ML algorithms require large amounts of high quality labeled data to learn effectively. Lack of sufficient data is a common cause of poor performance. - Is your model too complex for the amount of data you have? Complex models like deep neural networks need huge datasets to avoid overfitting. Simpler models may generalize better with limited data.- Are you using the right features? Make sure the features you are using as input to the model are relevant to the prediction task and capture the underlying patterns in the data. Feature engineering is important

  • 1. Practical Issues in Machine Learning -how to create an effective ML model? Prof. Shou-de Lin (林守德) CSIE, National Taiwan University sdlin@csie.ntu.edu.tw
  • 2. Diagnose a Machine Learning Solution -Why My ML Model Doesn’t Work? Prof. Shou-de Lin (林守德) CSIE, National Taiwan University sdlin@csie.ntu.edu.tw
  • 3. Machine Discovery and Social Network Mining Lab, CSIE, NTU • PI: Shou-de Lin – B.S. in NTUEE – M.S. in EECS, UM – M.S. in Computational Linguistics, USC – Ph.D. in CS, USC – Postdoc in Los Alamos National Lab • Courses: – Machine Learning and Data Mining- Theory and Practice – Machine Discovery – Social network Analysis – Technical Writing and Research Method – Probabilistic Graphical Model • Awards: – All-time ACM KDD Cup Champion (2008, 2010, 2011, 2012, 2013) – Google Research Award 2008 – Microsoft Research Award 2009 – Best Paper Award WI2003, TAAI 2010, ASONAM 2011, TAAI 2014 – US Areospace AROAD Research Grant Award 2011, 2013, 2014, 2015, 2016 Machine Learning with Big Data Machine Discovery Learning IoT Data Applications in NLP&SNA& Recommender Practical Issues in Learning 3
  • 4. Talk Materials Based on Hands-On Experience in Solving ML Tasks, Including  Participating ACM KDD Cup for 6 years  >20 Industrial collaboration  Visiting Scholar in Microsoft from 2015~2016
  • 5. Team NTU’s Performance on ACM KDD Cup KDD Cups 2008 2009 2010 2011 2012 2013 Organizer Siemens Orange PSLC Datashop Yahoo! Tencent Microsoft Topic Breast Cancer Prediction User Behavior Prediction Learner Performance Prediction Recommendation Internet advertising (track 2) Author-paper & Author name Identification Data Type Medical Telcom Education Music Search Engine Log Academic Search Data Challenge Imbalance Data Heterogeneous Data Time- dependent instances Large Scale Temporal + Taxonomy Info Click through rate prediction Alias in names # of records 0.2M 0.1M 30M 300M 155M 250K Authors, 2.5M papers # of teams >200 >400 >100 >1000 >170 >700 Our Record Champion 3rd place Champion Champion Champion Champion 5
  • 6. Industrial Collaboration 企業產學合作 異質探勘技術 Google 2009 結合社群模型與遊戲激 勵機制 IBM 2015 空氣品質自動偵測 Microsoft 2014 感測網路內容分析 INTEL 2011-2015 分散式學習 US Airforce 2011-2017 收視率預測以及交通狀 況預測 資策會 2011-2015 音樂歌詞情緒辨識 中華電信 2014 銀行履歷推薦分析 104人力銀行 2011-2013 客服自動系統 趨勢科技 2016- 大數據徵信及生命週期 預估 KPMG安侯建業 2015 資料探勘顧問 台達電子 2015 技術轉移與技術服務 ranked-based matrix factorization 工研院資通所 2013 Multi-task Learning Models 工研院巨資中心 2015 rankNet 工研院資通所 2013 Multi-label Classification Tools 工研院巨資中心 2013 社群網路技術服務 MobiApp 2012 商品推薦服務技術合作 i-True 2012
  • 7. What is Machine Learning?
  • 8. General Def of Machine Learning (ML) • ML tries to optimize a performance criterion using example data or past experience. • Mathematically speaking: given data X, we want to learn a function mapping f(X) for certain purpose, e.g. – f(x)= a label y  classification – f(x)= a set Y in X  clustering – f(x)=p(x)  distribution estimation – … • ML techniques tell us how to produce high quality f(x), given certain objective and evaluation metrics 8
  • 9. Why My Machine Learning Models Fail (meaning prediction accuracy is low)? A series of analyses are required to understand why
  • 10. The ML Diagnose Tree 10 Is it ML? NoTry other solutions Correct ML Scenario? NoTry Other Scenario Have you identified a suitable model?’ Yes Do you have enough data? Yes  Is your model too complicated? Right Feature? Yes  Performed feature engineering? Suitable Evaluation Metrics? Proper validation set?
  • 11. Diagnose 1: Is it an ML task? • Are you sure Machine Learning is the best solution for your task? To ML or not to ML, that is the question !! 11
  • 12. • X come from a close set with limited variation – simply memorize all possible XY mappings – E.g. Word translation using dictionary • F(x) can be easily derived by writing rules – E.g. compression/de-compression • X is (sort of) independent of Y – E.g. X<ID, name, wealth>, YHeight • f(x) is not smooth, or f(x+X) y+ y Tasks Doubtful for ML Too hard!! Too easy!! 12
  • 13. OK, my task belongs to ML-solvable, but I still failed
  • 14. The ML Diagnose Tree Is it ML? NoTry other solutions Correct ML Scenario? NoTry Other Scenario Have you identified a suitable model?’ Yes Do you have enough data? Yes  Is your model too complicated? Right Feature? Yes  Performed feature engineering? Suitable Evaluation Metrics? Proper validation set? 14
  • 15. Diagnose 2: Which ML Scenario? • Have you modeled your task into the right ML scenario? – ML  Classification/Regression  SVM, DNN, DT • Which ML toolbox should you choose? Machine Learning Supervised Learning Classification SVM 15
  • 16. Let’s Talk About….. Understanding Machine Learning in 10 Mins What Can Do
  • 17. A variety of ML Scenarios Supervised Learning Classification & Regression Multi-label learning Multi- instance learning Cost-sensitive leering Semi-supervised learning Active learning Unsupervised Learning Clustering Learning Data Distribution Pattern Learning Reinforcement learning Variations Transfer learning Online Learning Distributed Learning 17
  • 18. Supervised Learning • Given: a set of <input X, output Y> pairs • Goal: given an unseen input, predict the corresponding output • There are two kinds of outputs – Categorical: classification problem • Binary classification vs. Multiclass classification – Real values: regression problem • Example: 1. Binary classification: the sensor information, output: whether such sensor is broken (INTEL) 2. Multi-class classification: Lyric of a song, output: the mood: happy/sad/surprise/angry (CHT) 3. Regression: the weather/traffic/air condition, output: PM 2.5 value (Microsoft Research Asia) 18
  • 19. Cost-sensitive Learning • A classification task with non-uniform cost for different types of classification error. • Goal: To predict the class C* that minimizes the expected cost rather than the misclassification rate • An example cost matrix L : medical diagnosis Ljk Actual Cancer Actual Normal Predict Cancer 0 1 Predict Normal 10000 0 19
  • 20. A variety of ML Scenarios Supervised Learning Classification & Regression Multi-label learning Multi- instance learning Cost-sensitive leering Semi-supervised learning Active learning Unsupervised Learning Clustering Learning Data Distribution Pattern Learning Reinforcement learning Variations Transfer learning Online Learning Distributed Learning 20
  • 21. Unsupervised Learning • Learning without teachers (presumably harder than supervised learning) – Learning “what normally happens” – E.g. babies learn their first language (unsupervised) vs. how people learn their 2nd language (supervised). • Given: a bunch of input X (there is no output Y) • Goal: depending on the tasks, for example – Estimate P(X)  then we can find augmax P(X) Bayesian – Finding P(X2|X1) we can know the dependency between inputs  Association Rule, Probabilistic Graphical Model – Finding Sim(X1,X2)  then we can group similar X’s  clustering 21
  • 22. A variety of ML Scenarios Supervised Learning Classification & Regression Multi-label learning Multi- instance learning Cost-sensitive leering Semi-supervised learning Active learning Unsupervised Learning Clustering Learning Data Distribution Pattern Learning Reinforcement learning Variations Transfer learning Online Learning Distributed Learning 22
  • 23. Reinforcement Learning (RL) • RL is a “decision making” process. – How an agent should make decision to maximize the long- term rewards • RL is associated with a sequence of states X and actions Y (i.e. think about Markov Decision Process) with certain “rewards”. • It’s goal is to find an optimal policy to guide the decision. Figure from Mark Chang 23
  • 24. • 1st Stage:天下棋手為我師 (Supervised Learning) – Data: 過去棋譜 – Learning: f(X)=Y, X: 盤面, Y:next move – Results: AI can play Go now, but not an expert • 2nd Stage:超越過去的自己(Reinforcement Learning) – Data: generating from playing with 1st Stage AI – Learning: Observation盤面, rewardif win, action next move AlphaGo: SL+RL 24
  • 25. Key: Finding which ML Scenario Best Suits the Current Task
  • 26. • CTR: the ratio that users click into a displayed advertisement • It looks like a regression task, but indeed it is not. – CTR= #click/#view  3/10 =? 300/1000 – Better solution: treating it as a binary classification, transfer 3/10 to 3 positive cases and 7 negative cases Case Study: Click Through Rate Prediction (KDD Cup 2012) 26
  • 27. FAQ: I have a lot of Data, but they are not labelled • This is by far the most common question I have been asked. • My honest answer: try to get them labelled because you anyway need the ground truth for evaluation. • In several cases, labelling are too costly – Semi-supervised solution – Transfer learning (using labelled data in other domains) – Active learning (query a subset of labels) 28
  • 28. A variety of ML Scenarios Supervised Learning Classification & Regression Multi-label learning Multi- instance learning Cost-sensitive leering Semi-supervised learning Active learning Unsupervised Learning Clustering Learning Data Distribution Pattern Learning Reinforcement learning Variations Transfer learning Online Learning Distributed Learning 29
  • 29. Semi-supervised Learning • Only a small portion of data are annotated (usually due to high annotation cost) • Leverage the unlabeled data for better performance [Zhu2008] 30
  • 30. Transfer Learning Learning Process of Traditional ML Learning Process of Transfer Learning training items Learning System Learning System training items Learning System Learning SystemKnowledge Improving a learning task via incorporating knowledge from learning tasks in other domains (with different feature space and data distribution). More labeled data Fewer labeled data 31Pan et al.
  • 31. A Label for that Example Request for the Label of another Example A Label for that Example Request for the Label of an Example Active Learning Unlabeled Data ... Algorithm outputs a classifier Learning Algorithm Expert / Oracle • Achieves better learning with fewer labeled training data via actively selecting a subset of unlabeled data to be annotated 32
  • 32. OK, I have identified a suitable ML scenario, but my model still doesn’t work
  • 33. The ML Diagnose Tree Is it ML? NoTry other solutions Correct ML Scenario? NoTry Other Scenario Have you identified a suitable model?’ Yes Do you have enough data? Yes  Is your model too complicated? Right Feature? Yes  Performed feature engineering? Suitable Evaluation Metrics? Proper validation set? 34
  • 34. Did you choose a proper model? • A proper model considers – The size of data • Small data  linear (or simpler) model • Large data  linear model or non-linear model – The sparsity of data • Sparse data  more tricks to perform better and faster • Dense data  requires light algorithm that consumes less memory – The balance condition of data • Imbalanced data  special treatment for minority class – The quality of data (whether there are noise, missing values, etc) • Some loss function (e.g. 0/1 loss or L2) are more robust to noise than others (e.g. hinge loss or exponential loss) 35
  • 35. Do you have enough data to train the given model? • Draw the learning curve to understand whether your data is sufficient Data Size Accuracy Enough Data Need more data!! 36
  • 36. Is your model too complicated? • Increasing the model complexity and check for the training/testing performance – E.g. increasing the latent dimension k in matrix factorization Model Complexity 37
  • 37. FAQ: How to avoid overfitting? – Occam's Razor: simpler model first – Regularization: a way to constraint the complexity of the model – Carefully design your experiment (see later) – Being aware of the drifting phenomenon Overfitting: the Biggest Enemy of ML !! 38
  • 38. I still cannot do well given a reasonable model is identified
  • 39. The ML Diagnose Tree Is it ML? NoTry other solutions Correct ML Scenario? NoTry Other Scenario Have you identified a suitable model?’ Yes Do you have enough data? Yes  Is your model too complicated? Right Feature? Yes  Performed feature engineering? Suitable Evaluation Metrics? Proper validation set? 40
  • 40. • Have you identified all potentially useful features (X)? Diagnose : Quality of Features X 41
  • 41. • Use domain knowledge and human judgement to determine which features to obtain for training. • The rule of thumb: If you don’t know whether a feature is useful, then try it!! – Human judgement can be misleading Which Information shall be extracted as Features? Let ML algorithms (e.g. feature selection) determine whether certain information is useful !! 42
  • 42. Feature Engineering • Feature engineering is arguably the best strategy to improve the performance. • The goal is to explicitly reveal important information to the model – domain knowledge might or might not be useful • Original features  different encoding of the features  combined features 43
  • 43. FAQ: What shall I do if I absolutely need to boost the accuracy • Combining a variety of models: – If the models are diverse enough with similar performance, there is a high chance the performance will be boosted. • Additional validation data are needed to learn the ensemble weights. 44
  • 44. I am really happy that my ML model FINALLY works so well!!
  • 45. The ML Diagnose Tree Is it ML? NoTry other solutions Correct ML Scenario? NoTry Other Scenario Have you identified a suitable model?’ Yes Do you have enough data? Yes  Is your model too complicated? Right Feature? Yes  Performed feature engineering? Suitable Evaluation Metrics? Proper validation set? 46
  • 46. Is your model evaluated correctly? • Accuracy is NOT always the best way to evaluate a machine learning model 47
  • 47. Case Study: An 99.999% accurate system in Detecting Malicious Personnel • Randomly picked person  not likely a terrorist. • Thus, a model that always guess ‘non-terrorist’ will achieve very high accuracy – But it is useless !! • “Area under ROC Curve” (or AUC) is generally used to evaluate such system. 48
  • 48. Have the Right Evaluation Data Been Used? • We normally divide labeled data (or ground truth data) into three parts: training, validation (or heldout), and testing • Performance on training data is obviously biased since the model was constructed to fit this data. – Accuracy must be evaluated on an independent (usually disjoint) test set. – Cannot peak the test set labels!! • Use validation set to adjust hyper-parameters How the validation set is chosen can affect the performance of the model!! Training Validation Testing 49
  • 49. Case Study: KDD Cup 2008 (identify potential cancer patients) • Training set: a set of positive and negative instances (each contains 118 features) – Each positive patient contain a set of negative instances (i.e. an ROI in the X-ray) and at least one positive instances. – ALL instances in a negative patient are negative – It’s a multi-instance classification problem. • Random division for CV: – training: 90%, testing: 72%  peeking the testing • Patient-based CV: – training: 80%, testing: 77% 50
  • 50. 1. Determine whether it shall be model as an ML task 2. Model the task into a suitable ML scenario 3. Given the known scenario, determine a suitable model 4. Feature Engineering 5. Parameter learning 6. Evaluation (e.g. cross validation) 7. Apply to unseen data Take Home Point: How to Solve My Task Using ML? 51
  • 51. Final Remark: The (weak) AI Era Has Arrived Big Data Knowledge Representation Machine Learning AI 科學人2015 52
  • 52. So, What’s Next for ML & AI? -My three bold predictions about the future Halle Tecco
  • 53. On AI Revolution Machine will take over some non-trivial jobs that are used to be conducted by human beings – Those jobs (unfortunately?) include data analytics 54
  • 54. On Decision Making • The decision makers will gradually shift from humans to machines 55科學人2015
  • 55. On the Next of Machine Learning • The evolution of human intelligence: memorizing learning discovering • Machine Discovery (Shou-De Lin, Fall 2016, NTU CSIE) Machine Machine Machine 56 Thank you for your attention !!