Here are some suggestions for diagnosing why your machine learning model is still not working well even after identifying the right ML scenario:- Do you have enough labeled training data? Many ML algorithms require large amounts of high quality labeled data to learn effectively. Lack of sufficient data is a common cause of poor performance. - Is your model too complex for the amount of data you have? Complex models like deep neural networks need huge datasets to avoid overfitting. Simpler models may generalize better with limited data.- Are you using the right features? Make sure the features you are using as input to the model are relevant to the prediction task and capture the underlying patterns in the data. Feature engineering is important
This document summarizes a talk on practical machine learning issues. It discusses identifying the right machine learning scenario for a given task, such as classification, regression, clustering, or reinforcement learning. It also addresses common reasons why machine learning models may fail, such as using the wrong evaluation metrics, not having enough labeled training data, or not performing proper feature engineering. The document emphasizes the importance of choosing the appropriate machine learning model, having sufficient high-quality data, and selecting useful features.
Similar to Here are some suggestions for diagnosing why your machine learning model is still not working well even after identifying the right ML scenario:- Do you have enough labeled training data? Many ML algorithms require large amounts of high quality labeled data to learn effectively. Lack of sufficient data is a common cause of poor performance. - Is your model too complex for the amount of data you have? Complex models like deep neural networks need huge datasets to avoid overfitting. Simpler models may generalize better with limited data.- Are you using the right features? Make sure the features you are using as input to the model are relevant to the prediction task and capture the underlying patterns in the data. Feature engineering is important
Chapter 4 Classification in data sience .pdfAschalewAyele2
Similar to Here are some suggestions for diagnosing why your machine learning model is still not working well even after identifying the right ML scenario:- Do you have enough labeled training data? Many ML algorithms require large amounts of high quality labeled data to learn effectively. Lack of sufficient data is a common cause of poor performance. - Is your model too complex for the amount of data you have? Complex models like deep neural networks need huge datasets to avoid overfitting. Simpler models may generalize better with limited data.- Are you using the right features? Make sure the features you are using as input to the model are relevant to the prediction task and capture the underlying patterns in the data. Feature engineering is important (20)
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Here are some suggestions for diagnosing why your machine learning model is still not working well even after identifying the right ML scenario:- Do you have enough labeled training data? Many ML algorithms require large amounts of high quality labeled data to learn effectively. Lack of sufficient data is a common cause of poor performance. - Is your model too complex for the amount of data you have? Complex models like deep neural networks need huge datasets to avoid overfitting. Simpler models may generalize better with limited data.- Are you using the right features? Make sure the features you are using as input to the model are relevant to the prediction task and capture the underlying patterns in the data. Feature engineering is important
1. Practical Issues in Machine Learning
-how to create an effective ML model?
Prof. Shou-de Lin (林守德)
CSIE, National Taiwan University
sdlin@csie.ntu.edu.tw
2. Diagnose a Machine Learning Solution
-Why My ML Model Doesn’t Work?
Prof. Shou-de Lin (林守德)
CSIE, National Taiwan University
sdlin@csie.ntu.edu.tw
3. Machine Discovery and Social Network Mining Lab,
CSIE, NTU
• PI: Shou-de Lin
– B.S. in NTUEE
– M.S. in EECS, UM
– M.S. in Computational Linguistics, USC
– Ph.D. in CS, USC
– Postdoc in Los Alamos National Lab
• Courses:
– Machine Learning and Data Mining-
Theory and Practice
– Machine Discovery
– Social network Analysis
– Technical Writing and Research
Method
– Probabilistic Graphical Model
• Awards:
– All-time ACM KDD Cup Champion
(2008, 2010, 2011, 2012, 2013)
– Google Research Award 2008
– Microsoft Research Award 2009
– Best Paper Award WI2003, TAAI 2010,
ASONAM 2011, TAAI 2014
– US Areospace AROAD Research Grant
Award 2011, 2013, 2014, 2015, 2016
Machine
Learning
with Big
Data
Machine
Discovery
Learning IoT
Data
Applications in
NLP&SNA&
Recommender
Practical
Issues in
Learning
3
4. Talk Materials Based on Hands-On Experience
in Solving ML Tasks, Including
Participating ACM KDD Cup for 6 years
>20 Industrial collaboration
Visiting Scholar in Microsoft from 2015~2016
5. Team NTU’s Performance on ACM KDD Cup
KDD Cups 2008 2009 2010 2011 2012 2013
Organizer Siemens Orange
PSLC
Datashop
Yahoo! Tencent Microsoft
Topic
Breast Cancer
Prediction
User Behavior
Prediction
Learner
Performance
Prediction
Recommendation
Internet
advertising
(track 2)
Author-paper &
Author name
Identification
Data Type Medical Telcom Education Music
Search Engine
Log
Academic Search
Data
Challenge
Imbalance
Data
Heterogeneous
Data
Time-
dependent
instances
Large Scale
Temporal +
Taxonomy Info
Click through
rate prediction
Alias in names
# of
records
0.2M 0.1M 30M 300M 155M
250K Authors,
2.5M papers
# of teams >200 >400 >100 >1000 >170 >700
Our Record Champion 3rd
place Champion Champion Champion Champion
5
8. General Def of Machine Learning (ML)
• ML tries to optimize a performance criterion
using example data or past experience.
• Mathematically speaking: given data X, we want
to learn a function mapping f(X) for certain
purpose, e.g.
– f(x)= a label y classification
– f(x)= a set Y in X clustering
– f(x)=p(x) distribution estimation
– …
• ML techniques tell us how to produce high
quality f(x), given certain objective and
evaluation metrics
8
9. Why My Machine Learning Models Fail
(meaning prediction accuracy is low)?
A series of analyses are required to understand why
10. The ML Diagnose Tree
10
Is it ML?
NoTry other
solutions
Correct ML
Scenario?
NoTry Other
Scenario
Have you
identified a
suitable model?’
Yes Do you
have enough
data?
Yes Is your
model too
complicated?
Right Feature?
Yes Performed
feature
engineering?
Suitable
Evaluation
Metrics?
Proper validation
set?
11. Diagnose 1: Is it an ML task?
• Are you sure Machine Learning is the best
solution for your task?
To ML or not to ML, that is the question !!
11
12. • X come from a close set with limited
variation
– simply memorize all possible XY mappings
– E.g. Word translation using dictionary
• F(x) can be easily derived by writing rules
– E.g. compression/de-compression
• X is (sort of) independent of Y
– E.g. X<ID, name, wealth>, YHeight
• f(x) is not smooth, or f(x+X) y+ y
Tasks Doubtful for ML
Too hard!!
Too easy!!
12
13. OK, my task belongs to ML-solvable, but I
still failed
14. The ML Diagnose Tree
Is it ML?
NoTry other
solutions
Correct ML
Scenario?
NoTry Other
Scenario
Have you
identified a
suitable model?’
Yes Do you
have enough
data?
Yes Is your
model too
complicated?
Right Feature?
Yes Performed
feature
engineering?
Suitable
Evaluation
Metrics?
Proper validation
set?
14
15. Diagnose 2: Which ML Scenario?
• Have you modeled your task into the right ML
scenario?
– ML Classification/Regression SVM, DNN, DT
• Which ML toolbox should you choose?
Machine Learning
Supervised
Learning
Classification
SVM
15
17. A variety of ML Scenarios
Supervised
Learning
Classification
& Regression
Multi-label
learning
Multi-
instance
learning
Cost-sensitive
leering
Semi-supervised
learning
Active
learning
Unsupervised
Learning
Clustering
Learning Data
Distribution
Pattern
Learning
Reinforcement
learning
Variations
Transfer
learning
Online
Learning
Distributed
Learning
17
18. Supervised Learning
• Given: a set of <input X, output Y> pairs
• Goal: given an unseen input, predict the corresponding
output
• There are two kinds of outputs
– Categorical: classification problem
• Binary classification vs. Multiclass classification
– Real values: regression problem
• Example:
1. Binary classification: the sensor information, output:
whether such sensor is broken (INTEL)
2. Multi-class classification: Lyric of a song, output: the mood:
happy/sad/surprise/angry (CHT)
3. Regression: the weather/traffic/air condition, output: PM
2.5 value (Microsoft Research Asia) 18
19. Cost-sensitive Learning
• A classification task with non-uniform cost for
different types of classification error.
• Goal: To predict the class C* that minimizes the
expected cost rather than the misclassification rate
• An example cost matrix L : medical diagnosis
Ljk
Actual Cancer Actual Normal
Predict Cancer 0 1
Predict Normal 10000 0
19
20. A variety of ML Scenarios
Supervised
Learning
Classification
& Regression
Multi-label
learning
Multi-
instance
learning
Cost-sensitive
leering
Semi-supervised
learning
Active
learning
Unsupervised
Learning
Clustering
Learning Data
Distribution
Pattern
Learning
Reinforcement
learning
Variations
Transfer
learning
Online
Learning
Distributed
Learning
20
21. Unsupervised Learning
• Learning without teachers (presumably harder than
supervised learning)
– Learning “what normally happens”
– E.g. babies learn their first language (unsupervised) vs.
how people learn their 2nd language (supervised).
• Given: a bunch of input X (there is no output Y)
• Goal: depending on the tasks, for example
– Estimate P(X) then we can find augmax P(X) Bayesian
– Finding P(X2|X1) we can know the dependency between
inputs Association Rule, Probabilistic Graphical Model
– Finding Sim(X1,X2) then we can group similar X’s
clustering
21
22. A variety of ML Scenarios
Supervised
Learning
Classification
& Regression
Multi-label
learning
Multi-
instance
learning
Cost-sensitive
leering
Semi-supervised
learning
Active
learning
Unsupervised
Learning
Clustering
Learning Data
Distribution
Pattern
Learning
Reinforcement
learning
Variations
Transfer
learning
Online
Learning
Distributed
Learning
22
23. Reinforcement Learning (RL)
• RL is a “decision
making” process.
– How an agent should
make decision to
maximize the long-
term rewards
• RL is associated with a
sequence of states X
and actions Y (i.e.
think about Markov
Decision Process) with
certain “rewards”.
• It’s goal is to find an
optimal policy to
guide the decision. Figure from Mark Chang
23
24. • 1st Stage:天下棋手為我師 (Supervised Learning)
– Data: 過去棋譜
– Learning: f(X)=Y, X: 盤面, Y:next move
– Results: AI can play Go now, but not an expert
• 2nd Stage:超越過去的自己(Reinforcement
Learning)
– Data: generating from playing with 1st Stage AI
– Learning: Observation盤面, rewardif win,
action next move
AlphaGo: SL+RL
24
26. • CTR: the ratio that users click into a displayed
advertisement
• It looks like a regression task, but indeed it is
not.
– CTR= #click/#view 3/10 =? 300/1000
– Better solution: treating it as a binary classification,
transfer 3/10 to 3 positive cases and 7 negative
cases
Case Study: Click Through Rate
Prediction (KDD Cup 2012)
26
27. FAQ: I have a lot of Data, but they are
not labelled
• This is by far the most common question I have
been asked.
• My honest answer: try to get them labelled
because you anyway need the ground truth for
evaluation.
• In several cases, labelling are too costly
– Semi-supervised solution
– Transfer learning (using labelled data in other domains)
– Active learning (query a subset of labels)
28
28. A variety of ML Scenarios
Supervised
Learning
Classification
& Regression
Multi-label
learning
Multi-
instance
learning
Cost-sensitive
leering
Semi-supervised
learning
Active
learning
Unsupervised
Learning
Clustering
Learning Data
Distribution
Pattern
Learning
Reinforcement
learning
Variations
Transfer
learning
Online
Learning
Distributed
Learning
29
29. Semi-supervised Learning
• Only a small
portion of data
are annotated
(usually due to
high annotation
cost)
• Leverage the
unlabeled data
for better
performance
[Zhu2008]
30
30. Transfer Learning
Learning Process of
Traditional ML
Learning Process of
Transfer Learning
training items
Learning System Learning System
training items
Learning System
Learning SystemKnowledge
Improving a learning task via incorporating
knowledge from learning tasks in other
domains (with different feature space and data
distribution).
More labeled
data
Fewer labeled
data
31Pan et al.
31. A Label for that Example
Request for the Label of another Example
A Label for that Example
Request for the Label of an Example
Active Learning
Unlabeled
Data
...
Algorithm outputs a classifier
Learning
Algorithm
Expert / Oracle
• Achieves better learning with fewer labeled training data via
actively selecting a subset of unlabeled data to be annotated
32
32. OK, I have identified a suitable ML
scenario, but my model still doesn’t work
33. The ML Diagnose Tree
Is it ML?
NoTry other
solutions
Correct ML Scenario?
NoTry Other
Scenario
Have you
identified a
suitable model?’
Yes Do you
have enough
data?
Yes Is your
model too
complicated?
Right Feature?
Yes Performed
feature
engineering?
Suitable
Evaluation
Metrics?
Proper validation
set?
34
34. Did you choose a proper model?
• A proper model considers
– The size of data
• Small data linear (or simpler) model
• Large data linear model or non-linear model
– The sparsity of data
• Sparse data more tricks to perform better and faster
• Dense data requires light algorithm that consumes less
memory
– The balance condition of data
• Imbalanced data special treatment for minority class
– The quality of data (whether there are noise, missing
values, etc)
• Some loss function (e.g. 0/1 loss or L2) are more robust to
noise than others (e.g. hinge loss or exponential loss)
35
35. Do you have enough data to train the
given model?
• Draw the learning curve to understand
whether your data is sufficient
Data Size
Accuracy Enough
Data
Need
more
data!!
36
36. Is your model too complicated?
• Increasing the model complexity and check for
the training/testing performance
– E.g. increasing the latent dimension k in matrix
factorization
Model Complexity
37
37. FAQ: How to avoid overfitting?
– Occam's Razor: simpler model first
– Regularization: a way to constraint the complexity
of the model
– Carefully design your experiment (see later)
– Being aware of the drifting phenomenon
Overfitting: the Biggest Enemy of ML !!
38
38. I still cannot do well given a reasonable
model is identified
39. The ML Diagnose Tree
Is it ML?
NoTry other
solutions
Correct ML Scenario?
NoTry Other
Scenario
Have you
identified a
suitable model?’
Yes Do you
have enough
data?
Yes Is your
model too
complicated?
Right Feature?
Yes Performed
feature
engineering?
Suitable
Evaluation
Metrics?
Proper validation
set?
40
40. • Have you identified all potentially useful
features (X)?
Diagnose : Quality of Features X
41
41. • Use domain knowledge and human
judgement to determine which features to
obtain for training.
• The rule of thumb: If you don’t know whether
a feature is useful, then try it!!
– Human judgement can be misleading
Which Information shall be extracted
as Features?
Let ML algorithms (e.g. feature selection) determine
whether certain information is useful !!
42
42. Feature Engineering
• Feature engineering is arguably the best
strategy to improve the performance.
• The goal is to explicitly reveal important
information to the model
– domain knowledge might or might not be useful
• Original features different encoding of the
features combined features
43
43. FAQ: What shall I do if I absolutely
need to boost the accuracy
• Combining a variety of models:
– If the models are diverse enough with similar
performance, there is a high chance the
performance will be boosted.
• Additional validation data are needed to learn
the ensemble weights.
44
44. I am really happy that my ML model
FINALLY works so well!!
45. The ML Diagnose Tree
Is it ML?
NoTry other
solutions
Correct ML Scenario?
NoTry Other
Scenario
Have you
identified a
suitable model?’
Yes Do you
have enough
data?
Yes Is your
model too
complicated?
Right Feature?
Yes Performed
feature
engineering?
Suitable
Evaluation
Metrics?
Proper validation
set?
46
46. Is your model evaluated correctly?
• Accuracy is NOT always the best way to
evaluate a machine learning model
47
47. Case Study: An 99.999% accurate system
in Detecting Malicious Personnel
• Randomly picked person not likely a
terrorist.
• Thus, a model that always guess ‘non-terrorist’
will achieve very high accuracy
– But it is useless !!
• “Area under ROC Curve” (or AUC) is generally
used to evaluate such system.
48
48. Have the Right Evaluation Data Been
Used?
• We normally divide labeled data (or ground truth
data) into three parts: training, validation (or
heldout), and testing
• Performance on training data is obviously biased
since the model was constructed to fit this data.
– Accuracy must be evaluated on an independent
(usually disjoint) test set.
– Cannot peak the test set labels!!
• Use validation set to adjust hyper-parameters
How the validation set is chosen can affect the performance of the model!!
Training Validation Testing
49
49. Case Study: KDD Cup 2008 (identify
potential cancer patients)
• Training set: a set of positive and negative
instances (each contains 118 features)
– Each positive patient contain a set of negative
instances (i.e. an ROI in the X-ray) and at least one
positive instances.
– ALL instances in a negative patient are negative
– It’s a multi-instance classification problem.
• Random division for CV:
– training: 90%, testing: 72% peeking the testing
• Patient-based CV:
– training: 80%, testing: 77%
50
50. 1. Determine whether it shall be model as an
ML task
2. Model the task into a suitable ML scenario
3. Given the known scenario, determine a
suitable model
4. Feature Engineering
5. Parameter learning
6. Evaluation (e.g. cross validation)
7. Apply to unseen data
Take Home Point: How to Solve My
Task Using ML?
51
51. Final Remark: The (weak) AI Era Has Arrived
Big
Data
Knowledge
Representation
Machine
Learning
AI
科學人2015
52
52. So, What’s Next for ML & AI?
-My three bold predictions about the future
Halle Tecco
53. On AI Revolution
Machine will take over some non-trivial jobs
that are used to be conducted by human beings
– Those jobs (unfortunately?) include data analytics
54
54. On Decision Making
• The decision makers will gradually shift from
humans to machines
55科學人2015
55. On the Next of Machine Learning
• The evolution of human intelligence:
memorizing
learning
discovering
• Machine Discovery (Shou-De Lin, Fall 2016,
NTU CSIE)
Machine
Machine
Machine
56
Thank you for your attention !!