SlideShare a Scribd company logo
1 of 64
Linear models with R
Steve Hoang, PhD
UVA R Users Meetup
April 25, 2018
preamble
• Likely a wide range of expertise in
the audience
• This is a deep topic, and I’ll only
scratch the surface. Warning: if
you’re in the far LHS of the
distribution, this talk will be just
enough for you to be a danger to
yourself and others.
• The goal is to provide tools for
interpreting LMs, and a basic
vocabulary for pursuing deeper
topics.
overview
• What are LMs?
• Fitting and interpreting LMs
• Transforming data
• Hypothesis testing
• Mixed-effect models
overview
• What are LMs?
• Fitting and interpreting LMs
• Transforming data
• Hypothesis testing
• Mixed-effect models
overview
• What are LMs?
• Fitting and interpreting LMs
• Transforming data
• Hypothesis testing
• Mixed-effect models
what is a linear model?
regression ANOVA
what is a linear model?
multiple regression multi-way ANOVA
what is a linear model?
• mtcars: Let’s pretend that we would like to model 1/4 mile
time (Y, the “response”) as a function of horsepower (X, the
“predictor”) plus random noise
Y = f(X)+e
what is a linear model?
• mtcars: Let’s pretend that we would like to model 1/4 mile
time (Y, the “response”) as a function of horsepower (X, the
“predictor”) plus random noise
Y = f(X)+e
The LM: yi
= b0
+ xi
b1
+ei
what is a linear model?
yi
= b0
+ xi
b1
+ei
• Now, our task becomes a search for
parameters that minimizes the sum of the
squared residuals
• The R function that does this magic is lm()
what is a linear model?
yi
= b0
+ xi
b1
+ei
slope
residuals
intercept
what is a linear model?
• mtcars: Let’s pretend that we would like to model 1/4 mile
time (Y, the “response”) as a function of horsepower (X, the
“predictor”) plus random noise
Y = f(X)+e
yi
= b0
+ xi
b1
+ei
Y = Xb +e
The LM:
In matrix notation:
what is a linear model?
• Quick note: the “linear” in linear model refers to the fact
that the function linearly transforms the parameters
y = b0
+log(x)b1
+e
y = b0
+ x
b1
+e
y = b0
+(x2
+tanh(x))b1
+e
✔
✗
✔
valid
valid
not valid
overview
• What are LMs?
• Fitting and interpreting LMs
• Transforming data
• Hypothesis testing
• Mixed-effect models
regression ANOVA
Y = Xb +e
two flavors, one function:
lm()
regression ANOVA
Y = Xb +e
two flavors, one function:
lm()
regression ANOVA
Y = Xb +e
the “design matrix”
accessible through
model.matrix()
regression ANOVA
Y = Xb +e
the estimated parameters
accessible through
coef() or coefficients()
regression ANOVA
Y = Xb +e
the residuals
accessible through
resid() or residuals()
regression
regression
function call
regression
summary stats for residuals
regression
summary stats for fitted coefficients
regression
global model statistics
ANOVA
ANOVA
ANOVA
0
coef 1
coef 2
coef 3
ANOVA
no intercept
ANOVA
0
coef 1
coef 2
coef 3
no intercept
ANOVA
The broom package tidies your LMs
• Summarize model outputs into
tidy data frames: tidy()
• Quickly view model-scale
summaries: glance()
• See the original data augmented
with model statistics: augment()
• There’s more to broom, so have a
look for yourself.
ANOVA
ANOVA
ANOVA
some things to be aware of
• LMs make several assumptions about your data, look
them up. You want to be sure your data meets those
assumptions reasonably well.
– Homoscedasticity and normality of variance are the only
assumptions we will discuss.
• Look into “generalized linear models” (GLMs) and/or
quantile regression for non-normally distributed
data.
overview
• What are LMs?
• Fitting and interpreting LMs
• Transforming data
• Hypothesis testing
• Mixed-effect models
NOT HOMOSCEDASTIC!
testing for heteroscedasticity
The ‘car’ package is your friend (Companion to
Applied Regression) .
Use car::ncvTest() to check for heteroscedasticity
using the Breusch-Pagan test. (ncv = Non-Constant
Variance).
testing for heteroscedasticity
The ‘car’ package is your friend (Companion to
Applied Regression) .
Use car::ncvTest() to check for heteroscedasticity
using the Breusch-Pagan test. (ncv = Non-Constant
Variance).
variance-stabilizing transformations
• Variance stabilizing
transformations make it so
that the variance of Y is not
correlated with its mean
value.
• Take the Poisson
distribution, its mean is
equal to its variance. The
square root is the variance
stabilizing transformation of
a Poisson RV.
variance-stabilizing transformations
• Variance stabilizing
transformations make it so
that the variance of Y is not
correlated with its mean
value.
• Take the Poisson
distribution, its mean is
equal to its variance. The
square root is the variance
stabilizing transformation of
a Poisson RV.
variance-stabilizing transformations
• Variance stabilizing
transformations make it so
that the variance of Y is not
correlated with its mean
value.
• Take the Poisson
distribution, its mean is
equal to its variance. The
square root is the variance
stabilizing transformation of
a Poisson RV.
the Box-Cox transformation
• Helps alleviate non-normality
and heteroscedasticity of
residuals
• Find a lambda that normalizes
the data (maximum likelihood
estimation)
y l( ) =
yl
-1
l
if l ¹0
log y( ) if l =0
ì
í
ïï
î
ï
ï
the Box-Cox transformation
• Helps alleviate non-normality
and heteroscedasticity of
residuals
• Find a lambda that normalizes
the data (maximum likelihood
estimation)
y l( ) =
yl
-1
l
if l ¹0
log y( ) if l =0
ì
í
ïï
î
ï
ï
the Box-Cox transformation
• Helps alleviate non-normality
and heteroscedasticity of
residuals
• Find a lambda that normalizes
the data (maximum likelihood
estimation)
y l( ) =
yl
-1
l
if l ¹0
log y( ) if l =0
ì
í
ïï
î
ï
ï
the Box-Cox transformation
• Helps alleviate non-normality
and heteroscedasticity of
residuals
• Find a lambda that normalizes
the data (maximum likelihood
estimation)
y l( ) =
yl
-1
l
if l ¹0
log y( ) if l =0
ì
í
ïï
î
ï
ï
the Box-Cox transformation
• Helps alleviate non-normality
and heteroscedasticity of
residuals
• Find a lambda that normalizes
the data (maximum likelihood
estimation)
y l( ) =
yl
-1
l
if l ¹0
log y( ) if l =0
ì
í
ïï
î
ï
ï
transformations for “curvy” data
• You can often use linear models to fit “curvy” data; you
just need to transform the predictors, the responses, or
both.
transformations for “curvy” data
• You can often use linear models to fit “curvy” data; you
just need to transform the predictors, the responses, or
both.
transformations for “curvy” data
• You can often use linear models to fit “curvy” data; you
just need to transform the predictors, the responses, or
both.
exponential model:
log Y( )= Xb +e
Y = eXb+e
transformations for “curvy” data
• You can often use linear models to fit “curvy” data; you
just need to transform the predictors, the responses, or
both.
additional thoughts
• Not everything can be transformed to be normal / homosecdastic,
and not everything necessarily needs to be.
– Consider nonparametric methods or GLMs.
– ANOVA is somewhat robust to heteroscedasticity when n and/or effect
size is relatively large.
• Use QQ plots to assess normality – qqnorm(); also Shapiro-Wilk test
– shapiro.test()
• The poly() function in conjunction with lm() can be used to fit n-
degree polynomials.
– Generally want to use raw = FALSE with poly()
overview
• What are LMs?
• Fitting and interpreting LMs
• Transforming data
• Hypothesis testing
• Mixed-effect models
multiple comparisons problem
multiple comparisons problem
p-value = 0.04
handling multiple comparisons
• The p.adjust() function is useful
– method = “Bonferroni” controls the “familywise error rate”
(FWER)
– method = “BH” controls the “false discovery rate” (FDR)
• The multcomp package provides a general framework
for simultaneous hyp. Testing
– Simultaneous Inference in General Parametric Models,
Hothorn et al., Biometrical Journal, 2008.
the multcomp package
the multcomp package
p-value = 0.2
the multcomp package
• Can specify contrasts with short
cuts e.g., “Dunnett” and
“Tukey”
• Can specify contrasts as strings,
e.g., “tx 7 – ctl = 0”
multcomp example: superadditivity
• Are any of the drugs synergistic?
Do any of them antagonize each
other?
multcomp example: superadditivity
multcomp example: superadditivity
lots glaring omissions
• Experimental designs
• Interaction terms
• Model parameterization
• Variable selection
• Confidence intervals
• ANCOVA models
• Random effects vs fixed effects
• Much more…
resources
• MOOCs: Lots of good LM courses out there
• Books:
– Linear models with R – Julian Faraway
– Extending the linear model with R – Julian Faraway
– Mixed-Effects Models in S and S-PLUS – Jose Pinheiro & Doug
Bates
– Mixed-Effects models and Extensions in Ecology with R – Alain
Zuur
• http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html
– Ben Bolker’s GLMM FAQ (author of lme4)

More Related Content

Similar to R meetup lm

Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validationgmorishita
 
Complex sampling in latent variable models
Complex sampling in latent variable modelsComplex sampling in latent variable models
Complex sampling in latent variable modelsDaniel Oberski
 
Econometric model ing
Econometric model ingEconometric model ing
Econometric model ingMatt Grant
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Henock Beyene
 
factor-analysis (1).pdf
factor-analysis (1).pdffactor-analysis (1).pdf
factor-analysis (1).pdfYashwanth Rm
 
Factor analysis ppt
Factor analysis pptFactor analysis ppt
Factor analysis pptMukesh Bisht
 
An Introduction to Factor analysis ppt
An Introduction to Factor analysis pptAn Introduction to Factor analysis ppt
An Introduction to Factor analysis pptMukesh Bisht
 
Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Julius Hietala
 
1629 stochastic subgradient approach for solving linear support vector
1629 stochastic subgradient approach for solving linear support vector1629 stochastic subgradient approach for solving linear support vector
1629 stochastic subgradient approach for solving linear support vectorDr Fereidoun Dejahang
 
Generalized Linear Models for Between-Subjects Designs
Generalized Linear Models for Between-Subjects DesignsGeneralized Linear Models for Between-Subjects Designs
Generalized Linear Models for Between-Subjects Designssmackinnon
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Maninda Edirisooriya
 
Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016Christian Robert
 
A presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptA presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptvigia41
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdfBong-Ho Lee
 
Backdoors to Satisfiability
Backdoors to SatisfiabilityBackdoors to Satisfiability
Backdoors to Satisfiabilitymsramanujan
 

Similar to R meetup lm (20)

Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 
Complex sampling in latent variable models
Complex sampling in latent variable modelsComplex sampling in latent variable models
Complex sampling in latent variable models
 
Econometric model ing
Econometric model ingEconometric model ing
Econometric model ing
 
Stats chapter 4
Stats chapter 4Stats chapter 4
Stats chapter 4
 
Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01Spsshelp 100608163328-phpapp01
Spsshelp 100608163328-phpapp01
 
15303589.ppt
15303589.ppt15303589.ppt
15303589.ppt
 
factor-analysis (1).pdf
factor-analysis (1).pdffactor-analysis (1).pdf
factor-analysis (1).pdf
 
Factor analysis ppt
Factor analysis pptFactor analysis ppt
Factor analysis ppt
 
An Introduction to Factor analysis ppt
An Introduction to Factor analysis pptAn Introduction to Factor analysis ppt
An Introduction to Factor analysis ppt
 
Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"
 
1629 stochastic subgradient approach for solving linear support vector
1629 stochastic subgradient approach for solving linear support vector1629 stochastic subgradient approach for solving linear support vector
1629 stochastic subgradient approach for solving linear support vector
 
Generalized Linear Models for Between-Subjects Designs
Generalized Linear Models for Between-Subjects DesignsGeneralized Linear Models for Between-Subjects Designs
Generalized Linear Models for Between-Subjects Designs
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
 
Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016Discussion of Persi Diaconis' lecture at ISBA 2016
Discussion of Persi Diaconis' lecture at ISBA 2016
 
A presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptA presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.ppt
 
Logistical Regression.pptx
Logistical Regression.pptxLogistical Regression.pptx
Logistical Regression.pptx
 
CounterFactual Explanations.pdf
CounterFactual Explanations.pdfCounterFactual Explanations.pdf
CounterFactual Explanations.pdf
 
Backdoors to Satisfiability
Backdoors to SatisfiabilityBackdoors to Satisfiability
Backdoors to Satisfiability
 
An introduction to R
An introduction to RAn introduction to R
An introduction to R
 

Recently uploaded

SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjadimosmejiaslendon
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...Amil baba
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationmuqadasqasim10
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfRobertoOcampo24
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一fztigerwe
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisBoston Institute of Analytics
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...ThinkInnovation
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证a8om7o51
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Klinik Aborsi
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证ju0dztxtn
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证pwgnohujw
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...yulianti213969
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证ppy8zfkfm
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证zifhagzkk
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunksgmuir1066
 

Recently uploaded (20)

SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarjSCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
SCI8-Q4-MOD11.pdfwrwujrrjfaajerjrajrrarj
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Formulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdfFormulas dax para power bI de microsoft.pdf
Formulas dax para power bI de microsoft.pdf
 
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
如何办理哥伦比亚大学毕业证(Columbia毕业证)成绩单原版一比一
 
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
Identify Rules that Predict Patient’s Heart Disease - An Application of Decis...
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
如何办理英国卡迪夫大学毕业证(Cardiff毕业证书)成绩单留信学历认证
 
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
原件一样(UWO毕业证书)西安大略大学毕业证成绩单留信学历认证
 
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
obat aborsi Bontang wa 081336238223 jual obat aborsi cytotec asli di Bontang6...
 
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
1:1原版定制利物浦大学毕业证(Liverpool毕业证)成绩单学位证书留信学历认证
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 

R meetup lm

  • 1. Linear models with R Steve Hoang, PhD UVA R Users Meetup April 25, 2018
  • 2. preamble • Likely a wide range of expertise in the audience • This is a deep topic, and I’ll only scratch the surface. Warning: if you’re in the far LHS of the distribution, this talk will be just enough for you to be a danger to yourself and others. • The goal is to provide tools for interpreting LMs, and a basic vocabulary for pursuing deeper topics.
  • 3. overview • What are LMs? • Fitting and interpreting LMs • Transforming data • Hypothesis testing • Mixed-effect models
  • 4. overview • What are LMs? • Fitting and interpreting LMs • Transforming data • Hypothesis testing • Mixed-effect models
  • 5. overview • What are LMs? • Fitting and interpreting LMs • Transforming data • Hypothesis testing • Mixed-effect models
  • 6. what is a linear model? regression ANOVA
  • 7. what is a linear model? multiple regression multi-way ANOVA
  • 8. what is a linear model? • mtcars: Let’s pretend that we would like to model 1/4 mile time (Y, the “response”) as a function of horsepower (X, the “predictor”) plus random noise Y = f(X)+e
  • 9. what is a linear model? • mtcars: Let’s pretend that we would like to model 1/4 mile time (Y, the “response”) as a function of horsepower (X, the “predictor”) plus random noise Y = f(X)+e The LM: yi = b0 + xi b1 +ei
  • 10. what is a linear model? yi = b0 + xi b1 +ei • Now, our task becomes a search for parameters that minimizes the sum of the squared residuals • The R function that does this magic is lm()
  • 11. what is a linear model? yi = b0 + xi b1 +ei slope residuals intercept
  • 12. what is a linear model? • mtcars: Let’s pretend that we would like to model 1/4 mile time (Y, the “response”) as a function of horsepower (X, the “predictor”) plus random noise Y = f(X)+e yi = b0 + xi b1 +ei Y = Xb +e The LM: In matrix notation:
  • 13. what is a linear model? • Quick note: the “linear” in linear model refers to the fact that the function linearly transforms the parameters y = b0 +log(x)b1 +e y = b0 + x b1 +e y = b0 +(x2 +tanh(x))b1 +e ✔ ✗ ✔ valid valid not valid
  • 14. overview • What are LMs? • Fitting and interpreting LMs • Transforming data • Hypothesis testing • Mixed-effect models
  • 15. regression ANOVA Y = Xb +e two flavors, one function: lm()
  • 16. regression ANOVA Y = Xb +e two flavors, one function: lm()
  • 17. regression ANOVA Y = Xb +e the “design matrix” accessible through model.matrix()
  • 18. regression ANOVA Y = Xb +e the estimated parameters accessible through coef() or coefficients()
  • 19. regression ANOVA Y = Xb +e the residuals accessible through resid() or residuals()
  • 23. regression summary stats for fitted coefficients
  • 25. ANOVA
  • 26. ANOVA
  • 29. ANOVA 0 coef 1 coef 2 coef 3 no intercept
  • 30. ANOVA The broom package tidies your LMs • Summarize model outputs into tidy data frames: tidy() • Quickly view model-scale summaries: glance() • See the original data augmented with model statistics: augment() • There’s more to broom, so have a look for yourself.
  • 31. ANOVA
  • 32. ANOVA
  • 33. ANOVA
  • 34. some things to be aware of • LMs make several assumptions about your data, look them up. You want to be sure your data meets those assumptions reasonably well. – Homoscedasticity and normality of variance are the only assumptions we will discuss. • Look into “generalized linear models” (GLMs) and/or quantile regression for non-normally distributed data.
  • 35. overview • What are LMs? • Fitting and interpreting LMs • Transforming data • Hypothesis testing • Mixed-effect models
  • 36.
  • 38. testing for heteroscedasticity The ‘car’ package is your friend (Companion to Applied Regression) . Use car::ncvTest() to check for heteroscedasticity using the Breusch-Pagan test. (ncv = Non-Constant Variance).
  • 39. testing for heteroscedasticity The ‘car’ package is your friend (Companion to Applied Regression) . Use car::ncvTest() to check for heteroscedasticity using the Breusch-Pagan test. (ncv = Non-Constant Variance).
  • 40. variance-stabilizing transformations • Variance stabilizing transformations make it so that the variance of Y is not correlated with its mean value. • Take the Poisson distribution, its mean is equal to its variance. The square root is the variance stabilizing transformation of a Poisson RV.
  • 41. variance-stabilizing transformations • Variance stabilizing transformations make it so that the variance of Y is not correlated with its mean value. • Take the Poisson distribution, its mean is equal to its variance. The square root is the variance stabilizing transformation of a Poisson RV.
  • 42. variance-stabilizing transformations • Variance stabilizing transformations make it so that the variance of Y is not correlated with its mean value. • Take the Poisson distribution, its mean is equal to its variance. The square root is the variance stabilizing transformation of a Poisson RV.
  • 43. the Box-Cox transformation • Helps alleviate non-normality and heteroscedasticity of residuals • Find a lambda that normalizes the data (maximum likelihood estimation) y l( ) = yl -1 l if l ¹0 log y( ) if l =0 ì í ïï î ï ï
  • 44. the Box-Cox transformation • Helps alleviate non-normality and heteroscedasticity of residuals • Find a lambda that normalizes the data (maximum likelihood estimation) y l( ) = yl -1 l if l ¹0 log y( ) if l =0 ì í ïï î ï ï
  • 45. the Box-Cox transformation • Helps alleviate non-normality and heteroscedasticity of residuals • Find a lambda that normalizes the data (maximum likelihood estimation) y l( ) = yl -1 l if l ¹0 log y( ) if l =0 ì í ïï î ï ï
  • 46. the Box-Cox transformation • Helps alleviate non-normality and heteroscedasticity of residuals • Find a lambda that normalizes the data (maximum likelihood estimation) y l( ) = yl -1 l if l ¹0 log y( ) if l =0 ì í ïï î ï ï
  • 47. the Box-Cox transformation • Helps alleviate non-normality and heteroscedasticity of residuals • Find a lambda that normalizes the data (maximum likelihood estimation) y l( ) = yl -1 l if l ¹0 log y( ) if l =0 ì í ïï î ï ï
  • 48. transformations for “curvy” data • You can often use linear models to fit “curvy” data; you just need to transform the predictors, the responses, or both.
  • 49. transformations for “curvy” data • You can often use linear models to fit “curvy” data; you just need to transform the predictors, the responses, or both.
  • 50. transformations for “curvy” data • You can often use linear models to fit “curvy” data; you just need to transform the predictors, the responses, or both. exponential model: log Y( )= Xb +e Y = eXb+e
  • 51. transformations for “curvy” data • You can often use linear models to fit “curvy” data; you just need to transform the predictors, the responses, or both.
  • 52. additional thoughts • Not everything can be transformed to be normal / homosecdastic, and not everything necessarily needs to be. – Consider nonparametric methods or GLMs. – ANOVA is somewhat robust to heteroscedasticity when n and/or effect size is relatively large. • Use QQ plots to assess normality – qqnorm(); also Shapiro-Wilk test – shapiro.test() • The poly() function in conjunction with lm() can be used to fit n- degree polynomials. – Generally want to use raw = FALSE with poly()
  • 53. overview • What are LMs? • Fitting and interpreting LMs • Transforming data • Hypothesis testing • Mixed-effect models
  • 56. handling multiple comparisons • The p.adjust() function is useful – method = “Bonferroni” controls the “familywise error rate” (FWER) – method = “BH” controls the “false discovery rate” (FDR) • The multcomp package provides a general framework for simultaneous hyp. Testing – Simultaneous Inference in General Parametric Models, Hothorn et al., Biometrical Journal, 2008.
  • 59. the multcomp package • Can specify contrasts with short cuts e.g., “Dunnett” and “Tukey” • Can specify contrasts as strings, e.g., “tx 7 – ctl = 0”
  • 60. multcomp example: superadditivity • Are any of the drugs synergistic? Do any of them antagonize each other?
  • 63. lots glaring omissions • Experimental designs • Interaction terms • Model parameterization • Variable selection • Confidence intervals • ANCOVA models • Random effects vs fixed effects • Much more…
  • 64. resources • MOOCs: Lots of good LM courses out there • Books: – Linear models with R – Julian Faraway – Extending the linear model with R – Julian Faraway – Mixed-Effects Models in S and S-PLUS – Jose Pinheiro & Doug Bates – Mixed-Effects models and Extensions in Ecology with R – Alain Zuur • http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html – Ben Bolker’s GLMM FAQ (author of lme4)