SlideShare a Scribd company logo
1 of 20
Deepak George
Senior Data Scientist – Machine Learning
Decision Tree Ensembles
Bagging, Random Forest & Gradient Boosting Machines
December 2015
 Education
 Computer Science Engineering – College Of Engineering Trivandrum
 Business Analytics & Intelligence – Indian Institute Of Management Bangalore
 Career
 Mu Sigma
 Accenture Analytics
 Data Science
 1st Prize Best Data Science Project (BAI 5) – IIM Bangalore
 Top 10% (out of 1100) finish Kaggle Coupon Purchase Prediction (Recommender
System)
 SAS Certified Statistical Business Analyst: Regression and Modeling Credentials
 Statistical Learning – Stanford University
 Passion
 Photography, Football, Data Science, Machine Learning
 Contact
 Deepak.george14@iimb.ernet.in
 linkedin.com/in/deepakgeorge7
Copyright @ Deepak George, IIM Bangalore
2
About Me
Copyright @ Deepak George, IIM Bangalore
3
Bias-Variance Tradeoff
Expected test MSE
 Bias
 Error that is introduced by approximating a
complicated relationship, by a much simpler
model.
 Difference between the truth and what you
expect to learn
 Underfitting
 Variance
 Amount by which model would change if we
estimated it using a different training data.
 If a model has high variance then small
changes in the training data can result in
large changes in the model.
 Overfitting
Copyright @ Deepak George, IIM Bangalore
4
Bias-Variance Tradeoff
Underfitting Ideal Learner Overfitting
 Problem: Decision tree have low bias & suffer from high variance
 Goal: Reduce variance of decision trees
 Hint: Given set of n independent observations Z1, . . . , Zn, each
with variance σ2, the variance of the mean of the observations is given
by σ2/n.
 In other words, averaging a set of observations reduces variance.
 Theoretically: Take multiple independent samples S’ from the population
 Fit “bushy”/deep decision trees on each S1,S2…. Sn
 Trees are grown deep and are not pruned
 Variance reduces linearly & Bias remain unchanged
 Practically: We only have one sample/training set & not the population.
 So take bootstrap samples i.e. multiple samples from the
single sample with replacement
 Variance reduces sub-linearly & Bias often increase slightly
because bootstrap samples are correlated.
 Final Classifier: Average of predictions for regression or majority vote
for classification.
 High Variance introduced by deep decision trees are mitigated by
averaging predictions from each decision trees.
Copyright @ Deepak George, IIM Bangalore
5
Bagging
Population
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###
S1
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###
S2
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###
Sn
.
.
.
Samples
Sample
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###
S1
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###
S2
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Sn
.
.
.
Bootstrap Samples
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###
Copyright @ Deepak George, IIM Bangalore
6
Bootstrap sampling
Bootstrap sample
should have same
sample size as the
original sample.
With replacement results
in repetition of values
Bootstrap sample on an
average uses only 2/3 of
the data in the original
sample
Copyright @ Deepak George, IIM Bangalore
7
Random Forest
 Problem: Bagging still have relatively high variance
 Goal: Reduce variance of Bagging
 Solution: Along with sampling of data in Bagging, take samples of features also!
 In other words, in building a random forest, at each split in the tree,
the use only a random subset of features instead of all the features.
 This de-correlates the trees.
 Its mathematically proved that 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑜𝑟𝑠 is a good approximate value for
predictor subset size (mtry/max_features).
 Evaluation: A bootstrap sample uses only approximately 2/3 of the observations of original
sample.
 Remaining training data (OOB) are used to estimate error and variable importance
 Hyperparameters are knobs to control bias & variance tradeoff of any
machine learning algorithm.
 Key Hyper parameters
 Max Features – De-correlates the trees
 Number of Trees in the forest – Higher number reduce more variance
Random Forest - Key Hyperparameters
8
Copyright @ Deepak George, IIM Bangalore
Copyright @ Deepak George, IIM Bangalore
9
Random Forest – R Implementation
library(randomForest)
library(MASS) #Contains Boston dataframe
library(caret)
View(Boston)
#Cross Validation
cv.ctrl <- trainControl(method = "repeatedcv", repeats = 2,number = 5, allowParallel=T)
#GridSeach
rf.grid <- expand.grid(mtry = 2:13)
set.seed(1861) ## make reproducible here, but not if generating many random samples
#Hyper Parametertuning
rf_tune <-train(medv~.,
data=Boston,
method="rf",
trControl=cv.ctrl,
tuneGrid=rf.grid,
ntree = 1000,
importance = TRUE)
#Cross Validation results
rf_tune
plot(rf_tune)
#Variable Importance
varImp(rf_tune)
plot(varImp(rf_tune), top = 10)
Copyright @ Deepak George, IIM Bangalore
10
Boosting
 Intuition: Ensemble many “weak” classifiers (typically decision trees) to
produce a final “strong” classifier
 Weak classifier  Error rate is only slightly better than random
guessing.
 Boosting is a Forward Stagewise Additive model
 Boosting sequentially apply the weak classifiers one by one to repeatedly
reweighted versions of the data.
 Each new weak learner in the sequence tries to correct the
misclassification/error made by the previous weak learners.
 Initially all of the weights are set to Wi = 1/N
 For each successive step the observation weights are individually
modified and a new weak learner is fitted on the reweighted
observations.
 At step m, those observations that were misclassified by the
classifier Gm−1(x) induced at the previous step have their weights
increased, whereas the weights are decreased for those that were
classified correctly.
 Final “strong” classifier is based on weighted vote of weak classifiers
X1
X2
AdaBoost – Illustration
11Copyright @ Deepak George, IIM Bangalore
Step 1
Input Data
Initially all observations are
assigned equal weight (1/N)
Observations that are
misclassified in the ith
iteration is given higher
weights in the (i+1)th iteration
Observations that are correctly
classified in the ith iteration is
given lower weights in the
(i+1)th iteration
Copyright @ Deepak George, IIM Bangalore
12
Copyright @ Deepak George, IIM Bangalore
Step 2
Step 3
AdaBoost – Illustration
13
Copyright @ Deepak George, IIM Bangalore
Final Ensemble/Model
AdaBoost – Illustration
AdaBoost - Algorithm
14
Copyright @ Deepak George, IIM Bangalore
 Generalization of AdaBoost to work with arbitrary loss functions resulted in GBM.
Gradient Boosting = Gradient Descent + Boosting
 GBM uses gradient descent algorithm which can optimize any differentiable loss
function.
 In Adaboost, ‘shortcomings’ are identified by high-weight data points.
 In Gradient Boosting,“shortcomings” are identified by negative gradients (also
called pseudo residuals).
 In GBM instead of reweighting used in adaboost, each new tree is fit to the
negative gradients of the previous tree.
 Each tree in GBM is a successive gradient descent step.
Gradient Boosting Machines
15
Copyright @ Deepak George, IIM Bangalore
 AdaBoost is equivalent to forward stagewise additive modeling using the
exponential loss function.
Gradient Boosting - Algorithm
16
Copyright @ Deepak George, IIM Bangalore
 GBM has 3 types of hyper parameters
 Tree Structure
 Max depth of the trees - Controls the degree of features
interactions
 Min samples leaf – Minimum number of samples in leaf node.
 Number of Trees
 Shrinkage
 Learning rate - Slows learning by shrinking tree predictions.
 Unlike fitting a single large decision tree to the data, which amounts
to fitting the data hard and potentially overfitting, the boosting
approach instead learns slowly
 Stochastic Gradient Boosting
 SubSample: Select random subset of the training set for fitting each
tree than using the complete training data.
 Max features: Select random subset of features for each tree.
GBM – Key Hyperparameters
17
Copyright @ Deepak George, IIM Bangalore
Copyright @ Deepak George, IIM Bangalore
18
Tree Ensembles- Interpretation
library(xgboost)
library(MASS) #Contains Boston dataframe
library(caret)
#Cross Validation
cv.ctrl <- trainControl(method = "repeatedcv", repeats = 2,number = 5, allowParallel=T)
#GridSeach
xgb.grid <- expand.grid(nrounds=1000,eta = c(0.005,0.01,0.05,0.1) ,max_depth = c(4,5,6,7,8))
set.seed(1860)
#Model training
xgb_tune <-train(medv~.,
data=Boston,
method="xgbTree",
trControl=cv.ctrl,
tuneGrid=xgb.grid,
importance = TRUE,
subsample =0.8)
#Cross Validation results
xgb_tune
plot(xgb_tune)
#Variable Importance
plot(varImp(xgb_tune), top = 10)
Copyright @ Deepak George, IIM Bangalore
19
GBM – R Implementation
Copyright @ Deepak George, IIM Bangalore
20
End
Questions ?

More Related Content

What's hot

Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 
Data Science: Applying Random Forest
Data Science: Applying Random ForestData Science: Applying Random Forest
Data Science: Applying Random ForestEdureka!
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learningamalalhait
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithmRashid Ansari
 
Introduction to dataset
Introduction to datasetIntroduction to dataset
Introduction to datasetdatamantra
 
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Edureka!
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descentSuraj Parmar
 
Data preprocessing PPT
Data preprocessing PPTData preprocessing PPT
Data preprocessing PPTANUSUYA T K
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysisKrish_ver2
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning ExplainedMelanie Swan
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
K means Clustering
K means ClusteringK means Clustering
K means ClusteringEdureka!
 

What's hot (20)

Data Analytics for IoT
Data Analytics for IoT Data Analytics for IoT
Data Analytics for IoT
 
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
Decision tree
Decision treeDecision tree
Decision tree
 
Data Science: Applying Random Forest
Data Science: Applying Random ForestData Science: Applying Random Forest
Data Science: Applying Random Forest
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Linear discriminant analysis
Linear discriminant analysisLinear discriminant analysis
Linear discriminant analysis
 
Learning from imbalanced data
Learning from imbalanced data Learning from imbalanced data
Learning from imbalanced data
 
Edge Computing
Edge ComputingEdge Computing
Edge Computing
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Introduction to dataset
Introduction to datasetIntroduction to dataset
Introduction to dataset
 
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
 
Data preprocessing PPT
Data preprocessing PPTData preprocessing PPT
Data preprocessing PPT
 
3.7 outlier analysis
3.7 outlier analysis3.7 outlier analysis
3.7 outlier analysis
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning Explained
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 

Viewers also liked

Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
Gbm.more GBM in H2O
Gbm.more GBM in H2OGbm.more GBM in H2O
Gbm.more GBM in H2OSri Ambati
 
GBM package in r
GBM package in rGBM package in r
GBM package in rmark_landry
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeGilles Louppe
 
Automated data analysis with Python
Automated data analysis with PythonAutomated data analysis with Python
Automated data analysis with PythonGramener
 
Gradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostGradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostJaroslaw Szymczak
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorVivian S. Zhang
 
REV CITYSCAPES 042.GDJ137.V2
REV CITYSCAPES 042.GDJ137.V2REV CITYSCAPES 042.GDJ137.V2
REV CITYSCAPES 042.GDJ137.V2Darryl Moore
 
Landscape architecture
Landscape architectureLandscape architecture
Landscape architectureRaima Hashmi
 
Bird Friendly Architecture
Bird Friendly ArchitectureBird Friendly Architecture
Bird Friendly ArchitectureSurya Ramesh
 
Landscape Architect portfolio
Landscape Architect portfolioLandscape Architect portfolio
Landscape Architect portfolioAhmad Al-khalaqi
 
INTERNAT 014.GDJ137.V1
INTERNAT 014.GDJ137.V1INTERNAT 014.GDJ137.V1
INTERNAT 014.GDJ137.V1Darryl Moore
 
Vegetation in landscape
Vegetation in landscapeVegetation in landscape
Vegetation in landscapeSaima Iqbal
 

Viewers also liked (20)

Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Inlining Heuristics
Inlining HeuristicsInlining Heuristics
Inlining Heuristics
 
Gbm.more GBM in H2O
Gbm.more GBM in H2OGbm.more GBM in H2O
Gbm.more GBM in H2O
 
XGBoost (System Overview)
XGBoost (System Overview)XGBoost (System Overview)
XGBoost (System Overview)
 
GBM package in r
GBM package in rGBM package in r
GBM package in r
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
Automated data analysis with Python
Automated data analysis with PythonAutomated data analysis with Python
Automated data analysis with Python
 
GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
 
Gradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostGradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboost
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
 
REV CITYSCAPES 042.GDJ137.V2
REV CITYSCAPES 042.GDJ137.V2REV CITYSCAPES 042.GDJ137.V2
REV CITYSCAPES 042.GDJ137.V2
 
InternationalNov
InternationalNovInternationalNov
InternationalNov
 
Mangalavanam Bird Sanctuary
Mangalavanam Bird SanctuaryMangalavanam Bird Sanctuary
Mangalavanam Bird Sanctuary
 
this is india
this is indiathis is india
this is india
 
Landscape architecture
Landscape architectureLandscape architecture
Landscape architecture
 
GDJ155.14.v2
GDJ155.14.v2GDJ155.14.v2
GDJ155.14.v2
 
Bird Friendly Architecture
Bird Friendly ArchitectureBird Friendly Architecture
Bird Friendly Architecture
 
Landscape Architect portfolio
Landscape Architect portfolioLandscape Architect portfolio
Landscape Architect portfolio
 
INTERNAT 014.GDJ137.V1
INTERNAT 014.GDJ137.V1INTERNAT 014.GDJ137.V1
INTERNAT 014.GDJ137.V1
 
Vegetation in landscape
Vegetation in landscapeVegetation in landscape
Vegetation in landscape
 

Similar to Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines

Similar to Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines (8)

Ensemble methods.pptx
Ensemble methods.pptxEnsemble methods.pptx
Ensemble methods.pptx
 
Readme
ReadmeReadme
Readme
 
Appendix
AppendixAppendix
Appendix
 
Appendix
AppendixAppendix
Appendix
 
BoD presi w notes-1mar2013
BoD presi w notes-1mar2013BoD presi w notes-1mar2013
BoD presi w notes-1mar2013
 
Volume c
Volume cVolume c
Volume c
 
Volume c
Volume cVolume c
Volume c
 
Volume d
Volume dVolume d
Volume d
 

Recently uploaded

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 

Recently uploaded (20)

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 

Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines

  • 1. Deepak George Senior Data Scientist – Machine Learning Decision Tree Ensembles Bagging, Random Forest & Gradient Boosting Machines December 2015
  • 2.  Education  Computer Science Engineering – College Of Engineering Trivandrum  Business Analytics & Intelligence – Indian Institute Of Management Bangalore  Career  Mu Sigma  Accenture Analytics  Data Science  1st Prize Best Data Science Project (BAI 5) – IIM Bangalore  Top 10% (out of 1100) finish Kaggle Coupon Purchase Prediction (Recommender System)  SAS Certified Statistical Business Analyst: Regression and Modeling Credentials  Statistical Learning – Stanford University  Passion  Photography, Football, Data Science, Machine Learning  Contact  Deepak.george14@iimb.ernet.in  linkedin.com/in/deepakgeorge7 Copyright @ Deepak George, IIM Bangalore 2 About Me
  • 3. Copyright @ Deepak George, IIM Bangalore 3 Bias-Variance Tradeoff Expected test MSE  Bias  Error that is introduced by approximating a complicated relationship, by a much simpler model.  Difference between the truth and what you expect to learn  Underfitting  Variance  Amount by which model would change if we estimated it using a different training data.  If a model has high variance then small changes in the training data can result in large changes in the model.  Overfitting
  • 4. Copyright @ Deepak George, IIM Bangalore 4 Bias-Variance Tradeoff Underfitting Ideal Learner Overfitting
  • 5.  Problem: Decision tree have low bias & suffer from high variance  Goal: Reduce variance of decision trees  Hint: Given set of n independent observations Z1, . . . , Zn, each with variance σ2, the variance of the mean of the observations is given by σ2/n.  In other words, averaging a set of observations reduces variance.  Theoretically: Take multiple independent samples S’ from the population  Fit “bushy”/deep decision trees on each S1,S2…. Sn  Trees are grown deep and are not pruned  Variance reduces linearly & Bias remain unchanged  Practically: We only have one sample/training set & not the population.  So take bootstrap samples i.e. multiple samples from the single sample with replacement  Variance reduces sub-linearly & Bias often increase slightly because bootstrap samples are correlated.  Final Classifier: Average of predictions for regression or majority vote for classification.  High Variance introduced by deep decision trees are mitigated by averaging predictions from each decision trees. Copyright @ Deepak George, IIM Bangalore 5 Bagging Population Alice# 14# 0# 1# Bob# 10# 1# 1# Carol# 13# 0# 1# Dave# 8# 1# 0# Erin# 11# 0# 0# Frank# 9# 1# 1# Gena# 8# 0# 0# James# 11# 1# 1# Jessica# 14# 0# 1# Alice# 14# 0# 1# Amy# 12# 0# 1# Bob# 10# 1# 1# Xavier# 9# 1# 0# Cathy# 9# 0# 1# Carol# 13# 0# 1# Eugene# 13# 1# 0# Rafael# 12# 1# 1# Dave# 8# 1# 0# Peter# 9# 1# 0# Henry# 13# 1# 0# Erin# 11# 0# 0# Rose# 7# 0# 0# Iain# 8# 1# 1# Paulo# 12# 1# 0# Margaret# 10# 0# 1# Frank# 9# 1# 1# Jill# 13# 0# 0# Leon# 10# 1# 0# Sarah# 12# 0# 0# Gena# 8# 0# 0# Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]### Alice# 14# 0# 1# Bob# 10# 1# 1# Carol# 13# 0# 1# Dave# 8# 1# 0# Erin# 11# 0# 0# Frank# 9# 1# 1# Gena# 8# 0# 0# James# 11# 1# 1# Jessica# 14# 0# 1# Alice# 14# 0# 1# Amy# 12# 0# 1# Bob# 10# 1# 1# Xavier# 9# 1# 0# Cathy# 9# 0# 1# Carol# 13# 0# 1# Eugene# 13# 1# 0# Rafael# 12# 1# 1# Dave# 8# 1# 0# Peter# 9# 1# 0# Henry# 13# 1# 0# Erin# 11# 0# 0# Rose# 7# 0# 0# Iain# 8# 1# 1# Paulo# 12# 1# 0# Margaret# 10# 0# 1# Frank# 9# 1# 1# Jill# 13# 0# 0# Leon# 10# 1# 0# Sarah# 12# 0# 0# Gena# 8# 0# 0# Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]### S1 Alice# 14# 0# 1# Bob# 10# 1# 1# Carol# 13# 0# 1# Dave# 8# 1# 0# Erin# 11# 0# 0# Frank# 9# 1# 1# Gena# 8# 0# 0# James# 11# 1# 1# Jessica# 14# 0# 1# Alice# 14# 0# 1# Amy# 12# 0# 1# Bob# 10# 1# 1# Xavier# 9# 1# 0# Cathy# 9# 0# 1# Carol# 13# 0# 1# Eugene# 13# 1# 0# Rafael# 12# 1# 1# Dave# 8# 1# 0# Peter# 9# 1# 0# Henry# 13# 1# 0# Erin# 11# 0# 0# Rose# 7# 0# 0# Iain# 8# 1# 1# Paulo# 12# 1# 0# Margaret# 10# 0# 1# Frank# 9# 1# 1# Jill# 13# 0# 0# Leon# 10# 1# 0# Sarah# 12# 0# 0# Gena# 8# 0# 0# Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]### S2 Alice# 14# 0# 1# Bob# 10# 1# 1# Carol# 13# 0# 1# Dave# 8# 1# 0# Erin# 11# 0# 0# Frank# 9# 1# 1# Gena# 8# 0# 0# James# 11# 1# 1# Jessica# 14# 0# 1# Alice# 14# 0# 1# Amy# 12# 0# 1# Bob# 10# 1# 1# Xavier# 9# 1# 0# Cathy# 9# 0# 1# Carol# 13# 0# 1# Eugene# 13# 1# 0# Rafael# 12# 1# 1# Dave# 8# 1# 0# Peter# 9# 1# 0# Henry# 13# 1# 0# Erin# 11# 0# 0# Rose# 7# 0# 0# Iain# 8# 1# 1# Paulo# 12# 1# 0# Margaret# 10# 0# 1# Frank# 9# 1# 1# Jill# 13# 0# 0# Leon# 10# 1# 0# Sarah# 12# 0# 0# Gena# 8# 0# 0# Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]### Sn . . . Samples Sample Alice# 14# 0# 1# Bob# 10# 1# 1# Carol# 13# 0# 1# Dave# 8# 1# 0# Erin# 11# 0# 0# Frank# 9# 1# 1# Gena# 8# 0# 0# James# 11# 1# 1# Jessica# 14# 0# 1# Alice# 14# 0# 1# Amy# 12# 0# 1# Bob# 10# 1# 1# Xavier# 9# 1# 0# Cathy# 9# 0# 1# Carol# 13# 0# 1# Eugene# 13# 1# 0# Rafael# 12# 1# 1# Dave# 8# 1# 0# Peter# 9# 1# 0# Henry# 13# 1# 0# Erin# 11# 0# 0# Rose# 7# 0# 0# Iain# 8# 1# 1# Paulo# 12# 1# 0# Margaret# 10# 0# 1# Frank# 9# 1# 1# Jill# 13# 0# 0# Leon# 10# 1# 0# Sarah# 12# 0# 0# Gena# 8# 0# 0# Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]### S1 Alice# 14# 0# 1# Bob# 10# 1# 1# Carol# 13# 0# 1# Dave# 8# 1# 0# Erin# 11# 0# 0# Frank# 9# 1# 1# Gena# 8# 0# 0# James# 11# 1# 1# Jessica# 14# 0# 1# Alice# 14# 0# 1# Amy# 12# 0# 1# Bob# 10# 1# 1# Xavier# 9# 1# 0# Cathy# 9# 0# 1# Carol# 13# 0# 1# Eugene# 13# 1# 0# Rafael# 12# 1# 1# Dave# 8# 1# 0# Peter# 9# 1# 0# Henry# 13# 1# 0# Erin# 11# 0# 0# Rose# 7# 0# 0# Iain# 8# 1# 1# Paulo# 12# 1# 0# Margaret# 10# 0# 1# Frank# 9# 1# 1# Jill# 13# 0# 0# Leon# 10# 1# 0# Sarah# 12# 0# 0# Gena# 8# 0# 0# Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]### S2 Alice# 14# 0# 1# Bob# 10# 1# 1# Carol# 13# 0# 1# Dave# 8# 1# 0# Erin# 11# 0# 0# Frank# 9# 1# 1# Gena# 8# 0# 0# James# 11# 1# 1# Jessica# 14# 0# 1# Alice# 14# 0# 1# Amy# 12# 0# 1# Bob# 10# 1# 1# Xavier# 9# 1# 0# Cathy# 9# 0# 1# Carol# 13# 0# 1# Eugene# 13# 1# 0# Rafael# 12# 1# 1# Dave# 8# 1# 0# Peter# 9# 1# 0# Henry# 13# 1# 0# Erin# 11# 0# 0# Rose# 7# 0# 0# Iain# 8# 1# 1# Paulo# 12# 1# 0# Margaret# 10# 0# 1# Frank# 9# 1# 1# Jill# 13# 0# 0# Sn . . . Bootstrap Samples Alice# 14# 0# 1# Bob# 10# 1# 1# Carol# 13# 0# 1# Dave# 8# 1# 0# Erin# 11# 0# 0# Frank# 9# 1# 1# Gena# 8# 0# 0# James# 11# 1# 1# Jessica# 14# 0# 1# Alice# 14# 0# 1# Amy# 12# 0# 1# Bob# 10# 1# 1# Xavier# 9# 1# 0# Cathy# 9# 0# 1# Carol# 13# 0# 1# Eugene# 13# 1# 0# Rafael# 12# 1# 1# Dave# 8# 1# 0# Peter# 9# 1# 0# Henry# 13# 1# 0# Erin# 11# 0# 0# Rose# 7# 0# 0# Iain# 8# 1# 1# Paulo# 12# 1# 0# Margaret# 10# 0# 1# Frank# 9# 1# 1# Jill# 13# 0# 0# Leon# 10# 1# 0# Sarah# 12# 0# 0# Gena# 8# 0# 0# Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###
  • 6. Copyright @ Deepak George, IIM Bangalore 6 Bootstrap sampling Bootstrap sample should have same sample size as the original sample. With replacement results in repetition of values Bootstrap sample on an average uses only 2/3 of the data in the original sample
  • 7. Copyright @ Deepak George, IIM Bangalore 7 Random Forest  Problem: Bagging still have relatively high variance  Goal: Reduce variance of Bagging  Solution: Along with sampling of data in Bagging, take samples of features also!  In other words, in building a random forest, at each split in the tree, the use only a random subset of features instead of all the features.  This de-correlates the trees.  Its mathematically proved that 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑜𝑟𝑠 is a good approximate value for predictor subset size (mtry/max_features).  Evaluation: A bootstrap sample uses only approximately 2/3 of the observations of original sample.  Remaining training data (OOB) are used to estimate error and variable importance
  • 8.  Hyperparameters are knobs to control bias & variance tradeoff of any machine learning algorithm.  Key Hyper parameters  Max Features – De-correlates the trees  Number of Trees in the forest – Higher number reduce more variance Random Forest - Key Hyperparameters 8 Copyright @ Deepak George, IIM Bangalore
  • 9. Copyright @ Deepak George, IIM Bangalore 9 Random Forest – R Implementation library(randomForest) library(MASS) #Contains Boston dataframe library(caret) View(Boston) #Cross Validation cv.ctrl <- trainControl(method = "repeatedcv", repeats = 2,number = 5, allowParallel=T) #GridSeach rf.grid <- expand.grid(mtry = 2:13) set.seed(1861) ## make reproducible here, but not if generating many random samples #Hyper Parametertuning rf_tune <-train(medv~., data=Boston, method="rf", trControl=cv.ctrl, tuneGrid=rf.grid, ntree = 1000, importance = TRUE) #Cross Validation results rf_tune plot(rf_tune) #Variable Importance varImp(rf_tune) plot(varImp(rf_tune), top = 10)
  • 10. Copyright @ Deepak George, IIM Bangalore 10 Boosting  Intuition: Ensemble many “weak” classifiers (typically decision trees) to produce a final “strong” classifier  Weak classifier  Error rate is only slightly better than random guessing.  Boosting is a Forward Stagewise Additive model  Boosting sequentially apply the weak classifiers one by one to repeatedly reweighted versions of the data.  Each new weak learner in the sequence tries to correct the misclassification/error made by the previous weak learners.  Initially all of the weights are set to Wi = 1/N  For each successive step the observation weights are individually modified and a new weak learner is fitted on the reweighted observations.  At step m, those observations that were misclassified by the classifier Gm−1(x) induced at the previous step have their weights increased, whereas the weights are decreased for those that were classified correctly.  Final “strong” classifier is based on weighted vote of weak classifiers
  • 11. X1 X2 AdaBoost – Illustration 11Copyright @ Deepak George, IIM Bangalore Step 1 Input Data Initially all observations are assigned equal weight (1/N) Observations that are misclassified in the ith iteration is given higher weights in the (i+1)th iteration Observations that are correctly classified in the ith iteration is given lower weights in the (i+1)th iteration Copyright @ Deepak George, IIM Bangalore
  • 12. 12 Copyright @ Deepak George, IIM Bangalore Step 2 Step 3 AdaBoost – Illustration
  • 13. 13 Copyright @ Deepak George, IIM Bangalore Final Ensemble/Model AdaBoost – Illustration
  • 14. AdaBoost - Algorithm 14 Copyright @ Deepak George, IIM Bangalore
  • 15.  Generalization of AdaBoost to work with arbitrary loss functions resulted in GBM. Gradient Boosting = Gradient Descent + Boosting  GBM uses gradient descent algorithm which can optimize any differentiable loss function.  In Adaboost, ‘shortcomings’ are identified by high-weight data points.  In Gradient Boosting,“shortcomings” are identified by negative gradients (also called pseudo residuals).  In GBM instead of reweighting used in adaboost, each new tree is fit to the negative gradients of the previous tree.  Each tree in GBM is a successive gradient descent step. Gradient Boosting Machines 15 Copyright @ Deepak George, IIM Bangalore  AdaBoost is equivalent to forward stagewise additive modeling using the exponential loss function.
  • 16. Gradient Boosting - Algorithm 16 Copyright @ Deepak George, IIM Bangalore
  • 17.  GBM has 3 types of hyper parameters  Tree Structure  Max depth of the trees - Controls the degree of features interactions  Min samples leaf – Minimum number of samples in leaf node.  Number of Trees  Shrinkage  Learning rate - Slows learning by shrinking tree predictions.  Unlike fitting a single large decision tree to the data, which amounts to fitting the data hard and potentially overfitting, the boosting approach instead learns slowly  Stochastic Gradient Boosting  SubSample: Select random subset of the training set for fitting each tree than using the complete training data.  Max features: Select random subset of features for each tree. GBM – Key Hyperparameters 17 Copyright @ Deepak George, IIM Bangalore
  • 18. Copyright @ Deepak George, IIM Bangalore 18 Tree Ensembles- Interpretation
  • 19. library(xgboost) library(MASS) #Contains Boston dataframe library(caret) #Cross Validation cv.ctrl <- trainControl(method = "repeatedcv", repeats = 2,number = 5, allowParallel=T) #GridSeach xgb.grid <- expand.grid(nrounds=1000,eta = c(0.005,0.01,0.05,0.1) ,max_depth = c(4,5,6,7,8)) set.seed(1860) #Model training xgb_tune <-train(medv~., data=Boston, method="xgbTree", trControl=cv.ctrl, tuneGrid=xgb.grid, importance = TRUE, subsample =0.8) #Cross Validation results xgb_tune plot(xgb_tune) #Variable Importance plot(varImp(xgb_tune), top = 10) Copyright @ Deepak George, IIM Bangalore 19 GBM – R Implementation
  • 20. Copyright @ Deepak George, IIM Bangalore 20 End Questions ?