SlideShare a Scribd company logo
1 of 67
Important Classification
and Regression Metrics
By chode Amarnath
Important Links referred
1) https://www.analyticsvidhya.com/blog/2020/09/precision-recall-machine-learning/
2) https://www.javatpoint.com/confusion-matrix-in-machine-learning
3) https://medium.com/analytics-vidhya/confusion-matrix-accuracy-precision-recall-
f1-score-ade299cf63cd
4) https://www.freecodecamp.org/news/evaluation-metrics-for-regression-problems-
machine-learning/
Why do we use different evaluation metrics
There are plenty of ways to measure the quality of an algorithm and each company
decides for themselves
→ What is the most appropriate way for their particular problem.
Example:
Let’s say an online shop is trying to maximize effectiveness of their website.
→ we need to formalize what is effectiveness.
→ we need to define a metric how effectiveness is measured.
→ It can be a number of times a website was visited, or the number of times
something was ordered using this website.
→ So the company usually decides for itself what quantity is most important
When assessing how well a model fits a dataset, we use the RMSE more often because
it is measured in the same units as the response variable
Regression & Classification Metrics
1) Regression
a) MSE
b) RMSE
c) R-squared
d) MAE
e) RMSPE,MAPE
2) Classification
a) Confusion Matrix
b) Accuracy
c) Precision
d) Recall
e) F1 Score
f) AUC
Regression Metrics - Mean Square Error(MSE)
Mean or Average of the square of the difference between actual and estimated values
A high value of MSE means that the model is not performing well,
whereas a MSE of 0 would mean that you have a perfect model that predicts the
target without any error.
Example :
Why we Square the difference
Example : Model Comparison
When we compare Model A with Mobel B is having extreme errors
Advantages & Disadvantages
Advantages of using MSE
Easy to calculate in Python
Simple to understand calculation for end users
Designed to punish large errors
Disadvantages of using MSE
Error value not given in terms of the target
Difficult to interpret
Not comparable across use cases
RMSE
RMSE is the square root of the mean of the square of all of the error
→ RMSE has the benefit of penalizing large errors more so can be more
appropriate in some cases,
→ On the other hand, one distinct advantage of RMSE over MAE is that RMSE
avoids the use of taking the absolute value
Example :
Let’s understand the above statement with the two examples:
Case 1 : Actual Value = [2,4,6,8], Predicted Values = [4,6,8,10]
Case 2: Actual Values = [2,4,6,8] , Predicted Values = [4,6,8,12]
MAE for case 1 = 2.0, RMSE for case 1 = 2.0
MAE for case 2 = 2.5, RMSE for case 2 = 2.65
From the above example,
→ we can see that RMSE penalizes the last value prediction more heavily than
MAE. Generally, RMSE will be higher than or equal to MAE.
→ The only case where it equals MAE is when all the differences are equal or zero
(true for case 1 where the difference between actual and predicted is 2 for all
observations).
Mean Absolute Error(MAE)
MAE is the average of the absolute difference between the predicted values and
observed values
→ All the individual differences are weighted equally in the average.
What are the disadvantages of using mean absolute error?
it doesn't tell you whether your model tends to overestimate or underestimate
→ since any direction information is destroyed by taking the absolute value.
Example :
MAE is the sum of absolute differences between actual and predicted values. It doesn’t
consider the direction, that is, positive or negative.
→ When we consider directions also, that is called Mean Bias Error (MBE),
which is a sum of errors(difference).
So which one should you choose and why?
Well, it is easy to understand and interpret MAE because it directly takes the average of
offsets
whereas RMSE penalizes the higher difference more than MAE.
MAE is the sum of absolute differences between actual and predicted values. It doesn’t
consider the direction, that is, positive or negative.
→ When we consider directions also, that is called Mean Bias Error (MBE),
which is a sum of errors(difference).
Residual
→ residual are the difference between the actual and predicted value, you can
think of residuals as being a distance.
→ the closer the residual to zero, the better the model performs in making its
predictions.
R2 Score
The R2 score is a statistical measure that tells us how well our model is making
predictions on a scale of 0 to 1.
→ we can use the R2 square to determine the distance or residual
R-Squared
R-squared is a goodness-of-fit measure for linear regression models. This statistic
indicates the percentage of the variance in the dependent variable that the
independent variables explain collectively.
When to use R2 score
You can use the R2 score to get the accuracy of your model on a percentage
scale, that is 0 - 100, just like in a classification model.
Adjusted R2
Adjusted R2 is the better model when you compare models that have a different
amount of variables
→ The logic behind it is, that R2 always increases when the number of variables
increases. Meaning that even if you add a useless variable to you model, your R2
will still increase. To balance that out, you should always compare models with
different number of independent variables with adjusted R2.
→ Adjusted R2 only increases if the new variable improves the model more than
would be expected by chance.
→ When you compare models use adjusted R2. When you only look at one model
report R2, as it is the not adjusted measure of how much variance is explained by
your model.
Classification Metrics
→ Confusion Matrix
→ Accuracy
→ Precision
→ Recall
→ F1 score
→ AUC(Area under ROC Curve)
TP,TN,FP,FN
We represent prediction as positive(P) or Negative(N) and truth values as True(T) or
False.
→ Representing truth and predicted values together, we get True positive (TP), True
Negative (TN), False Positive (FP), False Negative (FN).
Example : True Positive (TP)
Example : True Negative (TN)
Example : False Positive (FP)
Example : False Negative(FN)
Confusion Matrix
The confusion matrix is used to determine the performance of the classification model.
→ It can only determined if the true values for the test data is known.
→ It shows error in the model performance in the form of a matrix.
Need for confusion matrix
→ It evaluate the performance of the classification model, when they make
predictions on test data and tells how good your model is.
→ with help of confusion matrix we can calculate the different parameters of the
model, such as Accuracy, Precision,Recall.
Example :
Accuracy
Accuracy is the quintessential classification metric. It is pretty easy to understand. And
easily suited for binary as well as a multiclass classification problem.
Accuracy = (TP+TN)/(TP+FP+FN+TN)
Accuracy is the proportion of true results among the total number of cases examined.
When to use?
Accuracy is a valid choice of evaluation for classification problems which are well
balanced and not skewed or No class imbalance.
Accuracy
"What percentage of my predictions are correct?"
True Positives (TP): should be TRUE, you predicted TRUE, These are cases in
which we predicted yes (they have the disease), and they do have the disease.
True Negative (TN): should be FALSE, you predicted FALSE, We predicted no,
and they don't have the disease.
False Positives (FP): should be FALSE, you predicted TRUE, We predicted yes,
but they don't actually have the disease. (Also known as a "Type I error.")
False Negatives (FN): should be TRUE, you predicted FALSE, We predicted no,
but they actually do have the disease. (Also known as a "Type II error.")
Caveats
Let us say that our target class is very sparse. Do we want accuracy as a metric of our
model performance? What if we are predicting if an asteroid will hit the earth? Just say
No all the time. And you will be 99% accurate. My model can be reasonably accurate, but
not at all valuable.
Example :
→ When a search engine returns 30 pages, only 20 of which are relevant, while
failing to return 40 additional relevant pages, its precision is 20/30 = 2/3,
→ which tells us how valid the results are, while its recall is 20/60 = 1/3, which tells
us how complete the results are.
Precision
Let’s start with precision, which answers the following question: what proportion of
predicted Positives is truly Positive?
Precision = (TP)/(TP+FP)
What is the precision of your model ?
→ Yes it is 0.843 or When it is predict that a patient has heart disease, it is
correct around 84% of the time.
When to use?
Precision is a valid choice of evaluation metric when we want to be very sure of our
prediction.
For example:
If we are building a system to predict if we should decrease the credit limit on
a particular account, we want to be very sure about our prediction or it may result in
customer dissatisfaction.
Caveats
Being very precise means our model will leave a lot of credit defaulters untouched and
hence lose money.
Recall
Another very useful measure is recall, which answers a different question: what
proportion of actual Positives is correctly classified?
For your model, Recall = 0.86, recall gives a measure of how accurately your model is
able to identify the relevant data.
Precision
"Of the points that I predicted TRUE, how many are actually TRUE?"
Good for multi-label / multi-class classification and information retrieval
Good for unbalanced datasets
Recall
"Of all the points that are actually TRUE, how many did I correctly
predict?"
Good for multi-label / multi-class classification and information retrieval Good for
unbalanced datasets
Precision / Recall
Let’s say we are evaluating a classifier on the test set.
→ The Actual class of that example in the test set is going to be “1” or “0”.
→ If there is a binary classification problem.
→ High precision would be good.
→ High recall would be a good thing.
True Positive
Your algorithm predicted that’s positive(1) and in reality the example is
positive.
True Negative
Your learning algorithm predicted that something is negative class “Zero” and the
Actual class is “Zero” is called a true negative.
False positive
If our learning algorithm predicts that the class is positive(1) but the actual
class is Negative(0). Then that’s called a False positive.
False Negative
Algorithm predicted as Negative(0), but actual is positive(1)
Suppose we want to predict that the patient has cancer only if we’re very confident that
they really do
→ So maybe we want to tell someone that we think they have cancer only if they are
very confident.
One way to do this would be modify the algorithm, so that instead of setting this
threshold at 0.5 to 0.7.
→ Then you’re predicting someone has cancer only when you’re more
confident.
How to compare precision/recall numbers?
When we are trying to compare Algorithm 1 and algorithm 2 and Algorithm 3 we don’t
have a single real number evaluation metric.
→ If we have a single real number evaluation metric like a number that just tells us
is algorithm 1 or algorithm 2 is better.
→ That helps us to much more quickly decide which algorithm to go with.
F1 Score
F1 score Can you give a single metric that balances precision and recall.
→ Gives equal weight to precision and recall
→ Good for unbalanced datasets
What is AUC - ROC Curve?
AUC - ROC curve is a performance measurement for classification problem at various
thresholds settings.
→ It tells how much model is capable of distinguishing between classes.
→ Higher the AUC, better the model is at predicting 0s as 0s and 1s as 1s.
ROC Curve
Receiver Operating Characteristic curve represent a probability graph to show the
performance of a classification model at different thresholds levels
1) True positive rate or TPR
2) False positive rate
An excellent model has AUC near to the 1 which means it has good measure of
separability.
A poor model has AUC near to the 0 which means it has worst measure of separability.
In fact it means it is reciprocating the result.
→ It is predicting 0s as 1s and 1s as 0s.
→ And when AUC is 0.5, it means model has no class separation capacity
whatsoever.
https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5
As we know, ROC is a curve of probability, so let’s plot the distribution of those
probability
→ Red distribution curve is of the positive class and the green distribution curve is of
the negative class
Example : AUC = 0.7
Example : AUC = 0.5
Example : AUC = 0
When to Use ROC vs. Precision-Recall Curves?
Generally, the use of ROC curves and precision-recall curves are as follows:
● ROC curves should be used when there are roughly equal numbers of observations for each class.
● Precision-Recall curves should be used when there is a moderate to large class imbalance.
The reason for this recommendation is that ROC curves present an optimistic picture of the model on datasets with a class
imbalance.

More Related Content

What's hot

What's hot (20)

Notes on attention mechanism
Notes on attention mechanismNotes on attention mechanism
Notes on attention mechanism
 
Sequence models
Sequence modelsSequence models
Sequence models
 
Module 4: Model Selection and Evaluation
Module 4: Model Selection and EvaluationModule 4: Model Selection and Evaluation
Module 4: Model Selection and Evaluation
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
Bias and variance trade off
Bias and variance trade offBias and variance trade off
Bias and variance trade off
 
Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science Machine Learning Deep Learning AI and Data Science
Machine Learning Deep Learning AI and Data Science
 
Time series forecasting with machine learning
Time series forecasting with machine learningTime series forecasting with machine learning
Time series forecasting with machine learning
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework Machine Learning Explanations: LIME framework
Machine Learning Explanations: LIME framework
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Predictive Model
Predictive ModelPredictive Model
Predictive Model
 
A Gentle Introduction to the EM Algorithm
A Gentle Introduction to the EM AlgorithmA Gentle Introduction to the EM Algorithm
A Gentle Introduction to the EM Algorithm
 
Model selection and cross validation techniques
Model selection and cross validation techniquesModel selection and cross validation techniques
Model selection and cross validation techniques
 
machine-learning-with-python (1).ppt
machine-learning-with-python (1).pptmachine-learning-with-python (1).ppt
machine-learning-with-python (1).ppt
 
GIJC19 - NodeXL Tutorial - Session 1
GIJC19 - NodeXL Tutorial - Session 1GIJC19 - NodeXL Tutorial - Session 1
GIJC19 - NodeXL Tutorial - Session 1
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 

Similar to Important Classification and Regression Metrics.pptx

MACHINE LEARNING PPT K MEANS CLUSTERING.
MACHINE LEARNING PPT K MEANS CLUSTERING.MACHINE LEARNING PPT K MEANS CLUSTERING.
MACHINE LEARNING PPT K MEANS CLUSTERING.
AmnaArooj13
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptx
GauravSonawane51
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
boyfieldhouse
 
SAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxSAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docx
anhlodge
 
A General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docxA General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docx
evonnehoggarth79783
 

Similar to Important Classification and Regression Metrics.pptx (20)

Assessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideAssessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's Guide
 
Machine learning session5(logistic regression)
Machine learning   session5(logistic regression)Machine learning   session5(logistic regression)
Machine learning session5(logistic regression)
 
PERFORMANCE_PREDICTION__PARAMETERS[1].pptx
PERFORMANCE_PREDICTION__PARAMETERS[1].pptxPERFORMANCE_PREDICTION__PARAMETERS[1].pptx
PERFORMANCE_PREDICTION__PARAMETERS[1].pptx
 
All PERFORMANCE PREDICTION PARAMETERS.pptx
All PERFORMANCE PREDICTION  PARAMETERS.pptxAll PERFORMANCE PREDICTION  PARAMETERS.pptx
All PERFORMANCE PREDICTION PARAMETERS.pptx
 
MACHINE LEARNING PPT K MEANS CLUSTERING.
MACHINE LEARNING PPT K MEANS CLUSTERING.MACHINE LEARNING PPT K MEANS CLUSTERING.
MACHINE LEARNING PPT K MEANS CLUSTERING.
 
Lecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptxLecture-12Evaluation Measures-ML.pptx
Lecture-12Evaluation Measures-ML.pptx
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
chap4_Parametric_Methods.ppt
chap4_Parametric_Methods.pptchap4_Parametric_Methods.ppt
chap4_Parametric_Methods.ppt
 
WEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been LearnedWEKA: Credibility Evaluating Whats Been Learned
WEKA: Credibility Evaluating Whats Been Learned
 
WEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been LearnedWEKA:Credibility Evaluating Whats Been Learned
WEKA:Credibility Evaluating Whats Been Learned
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
 
Lecture note 2
Lecture note 2Lecture note 2
Lecture note 2
 
Performance of the classification algorithm
Performance of the classification algorithmPerformance of the classification algorithm
Performance of the classification algorithm
 
Binary classification metrics_cheatsheet
Binary classification metrics_cheatsheetBinary classification metrics_cheatsheet
Binary classification metrics_cheatsheet
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
Chapter 2
Chapter 2Chapter 2
Chapter 2
 
SAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxSAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docx
 
statistical estimation
statistical estimationstatistical estimation
statistical estimation
 
A General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docxA General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docx
 

More from Chode Amarnath (6)

Vectorization In NLP.pptx
Vectorization In NLP.pptxVectorization In NLP.pptx
Vectorization In NLP.pptx
 
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptxThe 10 Algorithms Machine Learning Engineers Need to Know.pptx
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
 
Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with bagging
 
Feature engineering mean encodings
Feature engineering   mean encodingsFeature engineering   mean encodings
Feature engineering mean encodings
 
Validation and Over fitting , Validation strategies
Validation and Over fitting , Validation strategiesValidation and Over fitting , Validation strategies
Validation and Over fitting , Validation strategies
 
Difference between logistic regression shallow neural network and deep neura...
Difference between logistic regression  shallow neural network and deep neura...Difference between logistic regression  shallow neural network and deep neura...
Difference between logistic regression shallow neural network and deep neura...
 

Recently uploaded

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 

Recently uploaded (20)

Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 

Important Classification and Regression Metrics.pptx

  • 1. Important Classification and Regression Metrics By chode Amarnath
  • 2. Important Links referred 1) https://www.analyticsvidhya.com/blog/2020/09/precision-recall-machine-learning/ 2) https://www.javatpoint.com/confusion-matrix-in-machine-learning 3) https://medium.com/analytics-vidhya/confusion-matrix-accuracy-precision-recall- f1-score-ade299cf63cd 4) https://www.freecodecamp.org/news/evaluation-metrics-for-regression-problems- machine-learning/
  • 3. Why do we use different evaluation metrics There are plenty of ways to measure the quality of an algorithm and each company decides for themselves → What is the most appropriate way for their particular problem. Example: Let’s say an online shop is trying to maximize effectiveness of their website. → we need to formalize what is effectiveness. → we need to define a metric how effectiveness is measured. → It can be a number of times a website was visited, or the number of times something was ordered using this website. → So the company usually decides for itself what quantity is most important
  • 4. When assessing how well a model fits a dataset, we use the RMSE more often because it is measured in the same units as the response variable
  • 5.
  • 6. Regression & Classification Metrics 1) Regression a) MSE b) RMSE c) R-squared d) MAE e) RMSPE,MAPE 2) Classification a) Confusion Matrix b) Accuracy c) Precision d) Recall e) F1 Score f) AUC
  • 7. Regression Metrics - Mean Square Error(MSE) Mean or Average of the square of the difference between actual and estimated values A high value of MSE means that the model is not performing well, whereas a MSE of 0 would mean that you have a perfect model that predicts the target without any error.
  • 8.
  • 9.
  • 11. Why we Square the difference
  • 12. Example : Model Comparison When we compare Model A with Mobel B is having extreme errors
  • 13. Advantages & Disadvantages Advantages of using MSE Easy to calculate in Python Simple to understand calculation for end users Designed to punish large errors Disadvantages of using MSE Error value not given in terms of the target Difficult to interpret Not comparable across use cases
  • 14. RMSE RMSE is the square root of the mean of the square of all of the error → RMSE has the benefit of penalizing large errors more so can be more appropriate in some cases, → On the other hand, one distinct advantage of RMSE over MAE is that RMSE avoids the use of taking the absolute value
  • 16. Let’s understand the above statement with the two examples: Case 1 : Actual Value = [2,4,6,8], Predicted Values = [4,6,8,10] Case 2: Actual Values = [2,4,6,8] , Predicted Values = [4,6,8,12] MAE for case 1 = 2.0, RMSE for case 1 = 2.0 MAE for case 2 = 2.5, RMSE for case 2 = 2.65 From the above example, → we can see that RMSE penalizes the last value prediction more heavily than MAE. Generally, RMSE will be higher than or equal to MAE. → The only case where it equals MAE is when all the differences are equal or zero (true for case 1 where the difference between actual and predicted is 2 for all observations).
  • 17. Mean Absolute Error(MAE) MAE is the average of the absolute difference between the predicted values and observed values → All the individual differences are weighted equally in the average.
  • 18. What are the disadvantages of using mean absolute error? it doesn't tell you whether your model tends to overestimate or underestimate → since any direction information is destroyed by taking the absolute value.
  • 19.
  • 21.
  • 22. MAE is the sum of absolute differences between actual and predicted values. It doesn’t consider the direction, that is, positive or negative. → When we consider directions also, that is called Mean Bias Error (MBE), which is a sum of errors(difference).
  • 23. So which one should you choose and why? Well, it is easy to understand and interpret MAE because it directly takes the average of offsets whereas RMSE penalizes the higher difference more than MAE.
  • 24. MAE is the sum of absolute differences between actual and predicted values. It doesn’t consider the direction, that is, positive or negative. → When we consider directions also, that is called Mean Bias Error (MBE), which is a sum of errors(difference).
  • 25. Residual → residual are the difference between the actual and predicted value, you can think of residuals as being a distance. → the closer the residual to zero, the better the model performs in making its predictions.
  • 26. R2 Score The R2 score is a statistical measure that tells us how well our model is making predictions on a scale of 0 to 1. → we can use the R2 square to determine the distance or residual
  • 27. R-Squared R-squared is a goodness-of-fit measure for linear regression models. This statistic indicates the percentage of the variance in the dependent variable that the independent variables explain collectively.
  • 28. When to use R2 score You can use the R2 score to get the accuracy of your model on a percentage scale, that is 0 - 100, just like in a classification model.
  • 29.
  • 30.
  • 31. Adjusted R2 Adjusted R2 is the better model when you compare models that have a different amount of variables → The logic behind it is, that R2 always increases when the number of variables increases. Meaning that even if you add a useless variable to you model, your R2 will still increase. To balance that out, you should always compare models with different number of independent variables with adjusted R2. → Adjusted R2 only increases if the new variable improves the model more than would be expected by chance. → When you compare models use adjusted R2. When you only look at one model report R2, as it is the not adjusted measure of how much variance is explained by your model.
  • 32. Classification Metrics → Confusion Matrix → Accuracy → Precision → Recall → F1 score → AUC(Area under ROC Curve)
  • 33. TP,TN,FP,FN We represent prediction as positive(P) or Negative(N) and truth values as True(T) or False. → Representing truth and predicted values together, we get True positive (TP), True Negative (TN), False Positive (FP), False Negative (FN).
  • 34. Example : True Positive (TP)
  • 35. Example : True Negative (TN)
  • 36. Example : False Positive (FP)
  • 37. Example : False Negative(FN)
  • 38. Confusion Matrix The confusion matrix is used to determine the performance of the classification model. → It can only determined if the true values for the test data is known. → It shows error in the model performance in the form of a matrix.
  • 39. Need for confusion matrix → It evaluate the performance of the classification model, when they make predictions on test data and tells how good your model is. → with help of confusion matrix we can calculate the different parameters of the model, such as Accuracy, Precision,Recall.
  • 41. Accuracy Accuracy is the quintessential classification metric. It is pretty easy to understand. And easily suited for binary as well as a multiclass classification problem. Accuracy = (TP+TN)/(TP+FP+FN+TN) Accuracy is the proportion of true results among the total number of cases examined.
  • 42. When to use? Accuracy is a valid choice of evaluation for classification problems which are well balanced and not skewed or No class imbalance.
  • 43. Accuracy "What percentage of my predictions are correct?" True Positives (TP): should be TRUE, you predicted TRUE, These are cases in which we predicted yes (they have the disease), and they do have the disease. True Negative (TN): should be FALSE, you predicted FALSE, We predicted no, and they don't have the disease. False Positives (FP): should be FALSE, you predicted TRUE, We predicted yes, but they don't actually have the disease. (Also known as a "Type I error.") False Negatives (FN): should be TRUE, you predicted FALSE, We predicted no, but they actually do have the disease. (Also known as a "Type II error.")
  • 44.
  • 45. Caveats Let us say that our target class is very sparse. Do we want accuracy as a metric of our model performance? What if we are predicting if an asteroid will hit the earth? Just say No all the time. And you will be 99% accurate. My model can be reasonably accurate, but not at all valuable.
  • 46. Example : → When a search engine returns 30 pages, only 20 of which are relevant, while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3, → which tells us how valid the results are, while its recall is 20/60 = 1/3, which tells us how complete the results are.
  • 47. Precision Let’s start with precision, which answers the following question: what proportion of predicted Positives is truly Positive? Precision = (TP)/(TP+FP) What is the precision of your model ? → Yes it is 0.843 or When it is predict that a patient has heart disease, it is correct around 84% of the time.
  • 48. When to use? Precision is a valid choice of evaluation metric when we want to be very sure of our prediction. For example: If we are building a system to predict if we should decrease the credit limit on a particular account, we want to be very sure about our prediction or it may result in customer dissatisfaction. Caveats Being very precise means our model will leave a lot of credit defaulters untouched and hence lose money.
  • 49. Recall Another very useful measure is recall, which answers a different question: what proportion of actual Positives is correctly classified? For your model, Recall = 0.86, recall gives a measure of how accurately your model is able to identify the relevant data.
  • 50. Precision "Of the points that I predicted TRUE, how many are actually TRUE?" Good for multi-label / multi-class classification and information retrieval Good for unbalanced datasets Recall "Of all the points that are actually TRUE, how many did I correctly predict?" Good for multi-label / multi-class classification and information retrieval Good for unbalanced datasets
  • 51. Precision / Recall Let’s say we are evaluating a classifier on the test set. → The Actual class of that example in the test set is going to be “1” or “0”. → If there is a binary classification problem. → High precision would be good. → High recall would be a good thing.
  • 52. True Positive Your algorithm predicted that’s positive(1) and in reality the example is positive. True Negative Your learning algorithm predicted that something is negative class “Zero” and the Actual class is “Zero” is called a true negative. False positive If our learning algorithm predicts that the class is positive(1) but the actual class is Negative(0). Then that’s called a False positive. False Negative Algorithm predicted as Negative(0), but actual is positive(1)
  • 53.
  • 54. Suppose we want to predict that the patient has cancer only if we’re very confident that they really do → So maybe we want to tell someone that we think they have cancer only if they are very confident. One way to do this would be modify the algorithm, so that instead of setting this threshold at 0.5 to 0.7. → Then you’re predicting someone has cancer only when you’re more confident.
  • 55.
  • 56. How to compare precision/recall numbers? When we are trying to compare Algorithm 1 and algorithm 2 and Algorithm 3 we don’t have a single real number evaluation metric. → If we have a single real number evaluation metric like a number that just tells us is algorithm 1 or algorithm 2 is better. → That helps us to much more quickly decide which algorithm to go with.
  • 57.
  • 58. F1 Score F1 score Can you give a single metric that balances precision and recall. → Gives equal weight to precision and recall → Good for unbalanced datasets
  • 59. What is AUC - ROC Curve? AUC - ROC curve is a performance measurement for classification problem at various thresholds settings. → It tells how much model is capable of distinguishing between classes. → Higher the AUC, better the model is at predicting 0s as 0s and 1s as 1s.
  • 60. ROC Curve Receiver Operating Characteristic curve represent a probability graph to show the performance of a classification model at different thresholds levels 1) True positive rate or TPR 2) False positive rate
  • 61. An excellent model has AUC near to the 1 which means it has good measure of separability. A poor model has AUC near to the 0 which means it has worst measure of separability. In fact it means it is reciprocating the result. → It is predicting 0s as 1s and 1s as 0s. → And when AUC is 0.5, it means model has no class separation capacity whatsoever.
  • 62. https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5 As we know, ROC is a curve of probability, so let’s plot the distribution of those probability → Red distribution curve is of the positive class and the green distribution curve is of the negative class
  • 63. Example : AUC = 0.7
  • 64. Example : AUC = 0.5
  • 66.
  • 67. When to Use ROC vs. Precision-Recall Curves? Generally, the use of ROC curves and precision-recall curves are as follows: ● ROC curves should be used when there are roughly equal numbers of observations for each class. ● Precision-Recall curves should be used when there is a moderate to large class imbalance. The reason for this recommendation is that ROC curves present an optimistic picture of the model on datasets with a class imbalance.