SlideShare a Scribd company logo
1 of 25
DA 5230 – Statistical & Machine Learning
Lecture 7 – Bias, Variance and Regularization
Maninda Edirisooriya
manindaw@uom.lk
ML Process
β€’ You split your dataset into 2
β€’ Large proportion to training and rest for testing
β€’ Then train the Training dataset with a suitable learning algorithm
β€’ Once trained, evaluate that model with the Test set and get the
performance numbers (e.g.: accuracy)
β€’ Repeat the Data Collection, EDA, ML Algorithm Selection and Training
phases iteratively till you get the expected level of performance
Model Fit
β€’ The same training dataset can be trained differently by different
learning algorithms that will fit differently to the data
β€’ Even for a given algorithm, the level of explainability achieved by the
model on the given dataset can be different depending on,
β€’ The number model parameters
β€’ Amount of data used for training
β€’ Number of iterations used for training
β€’ Regularization techniques used (will discuss later)
Model Fit
Source: https://www.mathworks.com/discovery/overfitting.html
Explains too much Explains too little
Explains well
Bias and Variance
β€’ When a ML model cannot correctly make the predictions due to the
simplicity of the model, it is known as a Bias Problem
β€’ When a ML model becomes very good at making predictions on its
training dataset but bad (larger error) at the real world data (unseen
data while training), it is a Variance Problem
β€’ As test data represents the unseen data to the model this causes higher error
for test data
β€’ A good ML model should reduce both Bias and the Variance to an
acceptable level
Bias and Variance as forms of Errors
Source: https://towardsdatascience.com/bias-and-variance-in-linear-models-e772546e0c30
Bias – Variance Comparison
Underfitting (i.e. Bias Problem) Overfitting (i.e. Variance Problem)
Can happen when the model is not
complex enough to understand the
dataset. (i.e. small number of
parameters for larger dataset)
Can happen when the model is too
complex for the dataset (i.e. large
number of parameters for a smaller
dataset)
Can be due to Undertraining (i.e.
trained for lesser number of iterations)
Can be due to Overtraining
Results lower performance (e.g. lower
accuracy)
Results higher performance for the
training dataset but much lower
performance for the testing dataset
Problem is lower accuracy Problems is the model not being
generalized for the real world data
Analogous to Humans
Higher Bias People (Low IQ) Higher Variance People (Overthinking)
Bias
β€’ Bias is caused by not learning enough of the insights of the dataset by
the model
β€’ Either due to the lesser expressive power of the model (i.e. lower number of
parameters)
β€’ Or due to the smaller training dataset which does not contain enough
information of the data distribution
β€’ When trained using an iterative method like Gradient Descent this
may due to finishing the training process before completion (i.e.
before the cost is reduced sufficiently)
Bias
β€’ Bias is defined as,
Bias[f(X)] = E[ΰ·‘
𝐘] – Y
β€’ Bias can be reduced by,
β€’ Using a better ML algorithm
β€’ Using a larger model (i.e. with more parameters)
β€’ Training for more iterations if training was stopped earlier
β€’ Using a larger training data set
β€’ Reducing regularization (if exists)
β€’ Example for high bias,
β€’ Using a straight line to model a quadratic polynomial distribution
Variance
β€’ Variance is introduced when the model is too much fitting to the
training dataset
β€’ The model can get highly optimized on the dataset that is being
trained, including the noise
β€’ As the model is highly fitting to the noise information in the dataset,
the model will perform poorly for real world data that are different
from the training set
Variance
β€’ Variance is defined as,
Variance[f(X)] = E[ {E(ΰ· 
𝐘)– ΰ· 
𝐘}2 ]
β€’ Variance can be reduced by
β€’ Using a larger training dataset can reduce the variance as the errors get cancelled out
β€’ Reducing the number of parameters can also reduce variance as the less significant
insights (like noise) will not be included in the model
β€’ Can use Dimensionality Reduction and Feature Selection (will be discussed in future)
β€’ Using Early Stopping to stop training at an optimal point
β€’ Dropout is used in Deep Learning models (not relevant to out subject module ☺)
β€’ Increasing (or introducing, if not at the moment) regularization
β€’ Example for high variance,
β€’ Using a 8 degree polynomial to model a linear distribution
Error Composition
β€’ Mean Square Error,
MSE{መ
𝐟 𝐱 } = [Bias{ෑ
𝐘}]2 + Var{ෑ
𝐘}
β€’ Error in prediction,
E(ΰ·‘
𝐘-Y)2 = MSE{መ
𝐟 𝒙 } + 𝝈
Where 𝝈 is the irreducible error
Source: https://www.geeksforgeeks.org/bias-vs-variance-in-machine-learning/
Bias-Variance Tradeoff
β€’ ML algorithm, number of model parameters, amount of data, number
of training iterations and regularization can be tried to tune to reduce
both bias and variance
β€’ But this is not possible as when bias is reduced variance is increased
and when variance is reduced bias is increased
β€’ This is known as the Bias Variance Tradeoff
β€’ Therefore, a better balance between bias and variance is used to
create a better model
Early Stopping
β€’ When training with iterative
methods like Gradient Descent,
β€’ Training error reduces
monotonically due to increased
fitting
β€’ But testing error reduces up to a
certain level and starts to increase
again due to the increased
variance
β€’ Training the model can be
stopped where the test error is
minimum
β€’ This is known as Early Stopping Source: https://pub.towardsai.net/keras-earlystopping-callback-to-train-the-neural-networks-perfectly-2a3f865148f7
Regularization
β€’ When the high variance is observed during ML we may try to find
more data to train. But that may be expensive
β€’ Then we may try to reduce the number of parameters in the model
instead. Identifying the parameters to be reduced may not be obvious
β€’ The next best option is to apply Regularization during the training
process
β€’ Regularization is a technique to penalize some information in the
model, assuming they are relevant to the noise
Regularization
β€’ For the regularization, a penalty is added to the loss
Loss := Loss + πœ† * ΰ·Œπ‘–=1
𝑛
|𝛽𝑖
π‘˜|
β€’ Where πœ·π’Šis the ith parameter of the model. k is 1 or 2 in general
β€’ This penalty (or regularization term) has a factor 𝝀 known as the
Regularization Strength
β€’ Best value for 𝝀 is found using Cross Validation (will learn in future)
β€’ There are 2 common regularization techniques, L1 (Lasso Regression)
and L2 (Ridge Regression)
β€’ For L1, take k=1 and for L2 take k=2
L1 (Lasso) Regression
β€’ Loss function will be, Loss := Loss + 𝝀 * Οƒπ’Š=𝟏
𝒏
|πœ·π’Š|
β€’ Penalty is proportional to the sum of parameter weights
β€’ Selects Features: most less-significant parameters end up as zero
β€’ Used when only few of the parameters are believed to be relevant to
the the model, among the existing parameters, where other
parameters should be eliminated from the model
β€’ When 𝝀 is getting larger, more features will become zero
β€’ When 𝝀 is very large only the bias 𝜷𝟎 will remain non-zero
L2 (Ridge) Regression
β€’ Loss function will be, Loss := Loss + 𝝀 * ΰ·Œπ’Š=𝟏
𝒏
πœ·π’Š
𝟐
β€’ Penalty is proportional to the sum of square of parameter weights
β€’ Weight Decay: reduces the weights of parameters with higher values
β€’ Used when all the parameters are believed to be contributing to the
model, so need to significantly reduce the weights of excessively large
parameters
β€’ When 𝝀 is getting larger, all the parameters πœ·π’Š will get reduced but
will not get equal to zero
Elastic Net Regression
β€’ Both L1 and L2 functionalities can be used by weighting each of its
values, which results Elastic Net Regression
β€’ This will bring some small parameters to zero (due to the L1 effect)
and reduce some larger parameters (due to the L2 effect)
β€’ Select 𝜢 to adjust the balance of the effect between L1 and L2
Loss := Loss + 𝛼 βˆ—πœ† * σ𝑖=1
𝑛
|𝛽𝑖| + (1-𝛼) βˆ—πœ† *ΰ·Œπ‘—=1
π‘š
𝛽𝑗
2
Where 0 < 𝛼 < 1
Loss := Loss + πœ† * [𝛼 βˆ— σ𝑖=1
𝑛
|𝛽𝑖| + (1-Ξ±) βˆ— ΰ·Œπ‘—=1
m
𝛽𝑗
2
]
Linear Regression with L1
As cost (total loss) function for Linear Regression is Mean Square Error
(MSE), after L1 regularization,
J Ξ² = MSE + πœ† * σ𝑗=1
π‘š
|𝛽𝑗|
J Ξ² =
1
2
෍
i=1
n
Yi βˆ’ ΰ·‘
Yi
2
+ πœ† * σ𝑗=1
π‘š
|𝛽𝑗|
πœ•J Ξ²
πœ•Ξ²j
= ෌i=1
n
ΰ·‘
Yi βˆ’ Yi * Xi,j + πœ†
Linear Regression with L2
As cost function for Linear Regression is Mean Square Error (MSE), after
L2 regularization,
J Ξ² = MSE +
πœ†
2
* ΰ·Œπ‘—=1
π‘š
𝛽𝑗
2
J Ξ² =
1
2
෍
i=1
n
Yi βˆ’ ΰ·‘
Yi
2
+
πœ†
2
* ෌j=1
π‘š
𝛽𝑗
2
πœ•J Ξ²
πœ•Ξ²j
= ෌i=1
n
ΰ·‘
Yi βˆ’ Yi * Xi,j + πœ† * 𝛽𝑗
Logistic Regression with L1 & L2
Though the cost function for the Logistic Regression is the Cross
Entropy function, still the Cost functions and their derivatives seem the
same. (difference lies on ΰ·‘
𝐘=መ
𝐟(X) which is sigmoid for logistic regression)
L1 L2
J Ξ² =
1
2
෍
i=1
n
Yi βˆ’ ΰ·‘
Yi
2
+ πœ† * σ𝑗=1
π‘š
|𝛽𝑗| J Ξ² =
1
2
෍
i=1
n
Yi βˆ’ ΰ·‘
Yi
2
+
πœ†
2
* ΰ·Œπ‘—=1
π‘š
𝛽𝑗
2
πœ•J Ξ²
πœ•Ξ²j
= ෌i=1
n
ΰ·‘
Yi βˆ’ Yi * Xi,j + πœ†
πœ•J Ξ²
πœ•Ξ²j
= ෌i=1
n
ΰ·‘
Yi βˆ’ Yi * Xi,j + πœ† * 𝛽𝑗
One Hour Homework
β€’ Officially we have one more hour to do after the end of the lectures
β€’ Therefore, for this week’s extra hour you have a homework
β€’ Bias and Variance are very important concepts in ML and regularization is
widely used especially in Deep Learning
β€’ Go through the slides and get a clear understanding on Bias-Variance concept
and familiar with regularization
β€’ Refer external sources to clarify all the ambiguities related to it
β€’ Good Luck!
Questions?

More Related Content

Similar to Lecture 7 - Bias, Variance and Regularization, a lecture in subject module Statistical & Machine Learning

Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
Β 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2OSri Ambati
Β 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine LearningKnoldus Inc.
Β 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
Β 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data MiningValerii Klymchuk
Β 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learningNimrita Koul
Β 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Maninda Edirisooriya
Β 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Venturesmicrosoftventures
Β 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptxDr.Shweta
Β 
Regularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptxRegularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptxMohamed Essam
Β 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and NormalizationKush Kulshrestha
Β 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningFINBOURNE Technology
Β 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
Β 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerDatabricks
Β 

Similar to Lecture 7 - Bias, Variance and Regularization, a lecture in subject module Statistical & Machine Learning (20)

Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Β 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
Β 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
Β 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
Β 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
Β 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
Β 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
Β 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
Β 
Nimrita deep learning
Nimrita deep learningNimrita deep learning
Nimrita deep learning
Β 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Β 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Ventures
Β 
introduction to Statistical Theory.pptx
 introduction to Statistical Theory.pptx introduction to Statistical Theory.pptx
introduction to Statistical Theory.pptx
Β 
Regularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptxRegularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptx
Β 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and Normalization
Β 
PCA.pptx
PCA.pptxPCA.pptx
PCA.pptx
Β 
4.1.pptx
4.1.pptx4.1.pptx
4.1.pptx
Β 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Β 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
Β 
OTTO-Report
OTTO-ReportOTTO-Report
OTTO-Report
Β 
Experimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles BakerExperimental Design for Distributed Machine Learning with Myles Baker
Experimental Design for Distributed Machine Learning with Myles Baker
Β 

More from Maninda Edirisooriya

Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...Maninda Edirisooriya
Β 
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Maninda Edirisooriya
Β 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Maninda Edirisooriya
Β 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Maninda Edirisooriya
Β 
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...Maninda Edirisooriya
Β 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Maninda Edirisooriya
Β 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Maninda Edirisooriya
Β 
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...Maninda Edirisooriya
Β 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Maninda Edirisooriya
Β 
Analyzing the effectiveness of mobile and web channels using WSO2 BAM
Analyzing the effectiveness of mobile and web channels using WSO2 BAMAnalyzing the effectiveness of mobile and web channels using WSO2 BAM
Analyzing the effectiveness of mobile and web channels using WSO2 BAMManinda Edirisooriya
Β 
WSO2 BAM - Your big data toolbox
WSO2 BAM - Your big data toolboxWSO2 BAM - Your big data toolbox
WSO2 BAM - Your big data toolboxManinda Edirisooriya
Β 

More from Maninda Edirisooriya (16)

Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Β 
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Β 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Β 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Β 
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Β 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Β 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Β 
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Β 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Β 
Analyzing the effectiveness of mobile and web channels using WSO2 BAM
Analyzing the effectiveness of mobile and web channels using WSO2 BAMAnalyzing the effectiveness of mobile and web channels using WSO2 BAM
Analyzing the effectiveness of mobile and web channels using WSO2 BAM
Β 
WSO2 BAM - Your big data toolbox
WSO2 BAM - Your big data toolboxWSO2 BAM - Your big data toolbox
WSO2 BAM - Your big data toolbox
Β 
Training Report
Training ReportTraining Report
Training Report
Β 
GViz - Project Report
GViz - Project ReportGViz - Project Report
GViz - Project Report
Β 
Mortivation
MortivationMortivation
Mortivation
Β 
Hafnium impact 2008
Hafnium impact 2008Hafnium impact 2008
Hafnium impact 2008
Β 
ChatCrypt
ChatCryptChatCrypt
ChatCrypt
Β 

Recently uploaded

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
Β 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
Β 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...ranjana rawat
Β 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
Β 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
Β 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
Β 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
Β 
Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”
Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”
Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”soniya singh
Β 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
Β 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
Β 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
Β 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
Β 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
Β 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
Β 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
Β 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
Β 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
Β 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
Β 

Recently uploaded (20)

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
Β 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
Β 
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
(TARA) Talegaon Dabhade Call Girls Just Call 7001035870 [ Cash on Delivery ] ...
Β 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Β 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
Β 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
Β 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
Β 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
Β 
Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”
Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”
Model Call Girl in Narela Delhi reach out to us at πŸ”8264348440πŸ”
Β 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
Β 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
Β 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Β 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
Β 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Β 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
Β 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
Β 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
Β 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
Β 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
Β 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Β 

Lecture 7 - Bias, Variance and Regularization, a lecture in subject module Statistical & Machine Learning

  • 1. DA 5230 – Statistical & Machine Learning Lecture 7 – Bias, Variance and Regularization Maninda Edirisooriya manindaw@uom.lk
  • 2. ML Process β€’ You split your dataset into 2 β€’ Large proportion to training and rest for testing β€’ Then train the Training dataset with a suitable learning algorithm β€’ Once trained, evaluate that model with the Test set and get the performance numbers (e.g.: accuracy) β€’ Repeat the Data Collection, EDA, ML Algorithm Selection and Training phases iteratively till you get the expected level of performance
  • 3. Model Fit β€’ The same training dataset can be trained differently by different learning algorithms that will fit differently to the data β€’ Even for a given algorithm, the level of explainability achieved by the model on the given dataset can be different depending on, β€’ The number model parameters β€’ Amount of data used for training β€’ Number of iterations used for training β€’ Regularization techniques used (will discuss later)
  • 5. Bias and Variance β€’ When a ML model cannot correctly make the predictions due to the simplicity of the model, it is known as a Bias Problem β€’ When a ML model becomes very good at making predictions on its training dataset but bad (larger error) at the real world data (unseen data while training), it is a Variance Problem β€’ As test data represents the unseen data to the model this causes higher error for test data β€’ A good ML model should reduce both Bias and the Variance to an acceptable level
  • 6. Bias and Variance as forms of Errors Source: https://towardsdatascience.com/bias-and-variance-in-linear-models-e772546e0c30
  • 7. Bias – Variance Comparison Underfitting (i.e. Bias Problem) Overfitting (i.e. Variance Problem) Can happen when the model is not complex enough to understand the dataset. (i.e. small number of parameters for larger dataset) Can happen when the model is too complex for the dataset (i.e. large number of parameters for a smaller dataset) Can be due to Undertraining (i.e. trained for lesser number of iterations) Can be due to Overtraining Results lower performance (e.g. lower accuracy) Results higher performance for the training dataset but much lower performance for the testing dataset Problem is lower accuracy Problems is the model not being generalized for the real world data
  • 8. Analogous to Humans Higher Bias People (Low IQ) Higher Variance People (Overthinking)
  • 9. Bias β€’ Bias is caused by not learning enough of the insights of the dataset by the model β€’ Either due to the lesser expressive power of the model (i.e. lower number of parameters) β€’ Or due to the smaller training dataset which does not contain enough information of the data distribution β€’ When trained using an iterative method like Gradient Descent this may due to finishing the training process before completion (i.e. before the cost is reduced sufficiently)
  • 10. Bias β€’ Bias is defined as, Bias[f(X)] = E[ΰ·‘ 𝐘] – Y β€’ Bias can be reduced by, β€’ Using a better ML algorithm β€’ Using a larger model (i.e. with more parameters) β€’ Training for more iterations if training was stopped earlier β€’ Using a larger training data set β€’ Reducing regularization (if exists) β€’ Example for high bias, β€’ Using a straight line to model a quadratic polynomial distribution
  • 11. Variance β€’ Variance is introduced when the model is too much fitting to the training dataset β€’ The model can get highly optimized on the dataset that is being trained, including the noise β€’ As the model is highly fitting to the noise information in the dataset, the model will perform poorly for real world data that are different from the training set
  • 12. Variance β€’ Variance is defined as, Variance[f(X)] = E[ {E(ΰ·  𝐘)– ΰ·  𝐘}2 ] β€’ Variance can be reduced by β€’ Using a larger training dataset can reduce the variance as the errors get cancelled out β€’ Reducing the number of parameters can also reduce variance as the less significant insights (like noise) will not be included in the model β€’ Can use Dimensionality Reduction and Feature Selection (will be discussed in future) β€’ Using Early Stopping to stop training at an optimal point β€’ Dropout is used in Deep Learning models (not relevant to out subject module ☺) β€’ Increasing (or introducing, if not at the moment) regularization β€’ Example for high variance, β€’ Using a 8 degree polynomial to model a linear distribution
  • 13. Error Composition β€’ Mean Square Error, MSE{መ 𝐟 𝐱 } = [Bias{ΰ·‘ 𝐘}]2 + Var{ΰ·‘ 𝐘} β€’ Error in prediction, E(ΰ·‘ 𝐘-Y)2 = MSE{መ 𝐟 𝒙 } + 𝝈 Where 𝝈 is the irreducible error Source: https://www.geeksforgeeks.org/bias-vs-variance-in-machine-learning/
  • 14. Bias-Variance Tradeoff β€’ ML algorithm, number of model parameters, amount of data, number of training iterations and regularization can be tried to tune to reduce both bias and variance β€’ But this is not possible as when bias is reduced variance is increased and when variance is reduced bias is increased β€’ This is known as the Bias Variance Tradeoff β€’ Therefore, a better balance between bias and variance is used to create a better model
  • 15. Early Stopping β€’ When training with iterative methods like Gradient Descent, β€’ Training error reduces monotonically due to increased fitting β€’ But testing error reduces up to a certain level and starts to increase again due to the increased variance β€’ Training the model can be stopped where the test error is minimum β€’ This is known as Early Stopping Source: https://pub.towardsai.net/keras-earlystopping-callback-to-train-the-neural-networks-perfectly-2a3f865148f7
  • 16. Regularization β€’ When the high variance is observed during ML we may try to find more data to train. But that may be expensive β€’ Then we may try to reduce the number of parameters in the model instead. Identifying the parameters to be reduced may not be obvious β€’ The next best option is to apply Regularization during the training process β€’ Regularization is a technique to penalize some information in the model, assuming they are relevant to the noise
  • 17. Regularization β€’ For the regularization, a penalty is added to the loss Loss := Loss + πœ† * ΰ·Œπ‘–=1 𝑛 |𝛽𝑖 π‘˜| β€’ Where πœ·π’Šis the ith parameter of the model. k is 1 or 2 in general β€’ This penalty (or regularization term) has a factor 𝝀 known as the Regularization Strength β€’ Best value for 𝝀 is found using Cross Validation (will learn in future) β€’ There are 2 common regularization techniques, L1 (Lasso Regression) and L2 (Ridge Regression) β€’ For L1, take k=1 and for L2 take k=2
  • 18. L1 (Lasso) Regression β€’ Loss function will be, Loss := Loss + 𝝀 * Οƒπ’Š=𝟏 𝒏 |πœ·π’Š| β€’ Penalty is proportional to the sum of parameter weights β€’ Selects Features: most less-significant parameters end up as zero β€’ Used when only few of the parameters are believed to be relevant to the the model, among the existing parameters, where other parameters should be eliminated from the model β€’ When 𝝀 is getting larger, more features will become zero β€’ When 𝝀 is very large only the bias 𝜷𝟎 will remain non-zero
  • 19. L2 (Ridge) Regression β€’ Loss function will be, Loss := Loss + 𝝀 * ΰ·Œπ’Š=𝟏 𝒏 πœ·π’Š 𝟐 β€’ Penalty is proportional to the sum of square of parameter weights β€’ Weight Decay: reduces the weights of parameters with higher values β€’ Used when all the parameters are believed to be contributing to the model, so need to significantly reduce the weights of excessively large parameters β€’ When 𝝀 is getting larger, all the parameters πœ·π’Š will get reduced but will not get equal to zero
  • 20. Elastic Net Regression β€’ Both L1 and L2 functionalities can be used by weighting each of its values, which results Elastic Net Regression β€’ This will bring some small parameters to zero (due to the L1 effect) and reduce some larger parameters (due to the L2 effect) β€’ Select 𝜢 to adjust the balance of the effect between L1 and L2 Loss := Loss + 𝛼 βˆ—πœ† * σ𝑖=1 𝑛 |𝛽𝑖| + (1-𝛼) βˆ—πœ† *ΰ·Œπ‘—=1 π‘š 𝛽𝑗 2 Where 0 < 𝛼 < 1 Loss := Loss + πœ† * [𝛼 βˆ— σ𝑖=1 𝑛 |𝛽𝑖| + (1-Ξ±) βˆ— ΰ·Œπ‘—=1 m 𝛽𝑗 2 ]
  • 21. Linear Regression with L1 As cost (total loss) function for Linear Regression is Mean Square Error (MSE), after L1 regularization, J Ξ² = MSE + πœ† * σ𝑗=1 π‘š |𝛽𝑗| J Ξ² = 1 2 ෍ i=1 n Yi βˆ’ ΰ·‘ Yi 2 + πœ† * σ𝑗=1 π‘š |𝛽𝑗| πœ•J Ξ² πœ•Ξ²j = ෌i=1 n ΰ·‘ Yi βˆ’ Yi * Xi,j + πœ†
  • 22. Linear Regression with L2 As cost function for Linear Regression is Mean Square Error (MSE), after L2 regularization, J Ξ² = MSE + πœ† 2 * ΰ·Œπ‘—=1 π‘š 𝛽𝑗 2 J Ξ² = 1 2 ෍ i=1 n Yi βˆ’ ΰ·‘ Yi 2 + πœ† 2 * ෌j=1 π‘š 𝛽𝑗 2 πœ•J Ξ² πœ•Ξ²j = ෌i=1 n ΰ·‘ Yi βˆ’ Yi * Xi,j + πœ† * 𝛽𝑗
  • 23. Logistic Regression with L1 & L2 Though the cost function for the Logistic Regression is the Cross Entropy function, still the Cost functions and their derivatives seem the same. (difference lies on ΰ·‘ 𝐘=መ 𝐟(X) which is sigmoid for logistic regression) L1 L2 J Ξ² = 1 2 ෍ i=1 n Yi βˆ’ ΰ·‘ Yi 2 + πœ† * σ𝑗=1 π‘š |𝛽𝑗| J Ξ² = 1 2 ෍ i=1 n Yi βˆ’ ΰ·‘ Yi 2 + πœ† 2 * ΰ·Œπ‘—=1 π‘š 𝛽𝑗 2 πœ•J Ξ² πœ•Ξ²j = ෌i=1 n ΰ·‘ Yi βˆ’ Yi * Xi,j + πœ† πœ•J Ξ² πœ•Ξ²j = ෌i=1 n ΰ·‘ Yi βˆ’ Yi * Xi,j + πœ† * 𝛽𝑗
  • 24. One Hour Homework β€’ Officially we have one more hour to do after the end of the lectures β€’ Therefore, for this week’s extra hour you have a homework β€’ Bias and Variance are very important concepts in ML and regularization is widely used especially in Deep Learning β€’ Go through the slides and get a clear understanding on Bias-Variance concept and familiar with regularization β€’ Refer external sources to clarify all the ambiguities related to it β€’ Good Luck!