R meetup lm

1. Linear models with R Steve Hoang, PhD UVA R Users Meetup April 25, 2018

2. preamble • Likely a wide range of expertise in the audience • This is a deep topic, and I’ll only scratch the surface. Warning: if you’re in the far LHS of the distribution, this talk will be just enough for you to be a danger to yourself and others. • The goal is to provide tools for interpreting LMs, and a basic vocabulary for pursuing deeper topics.

3. overview • What are LMs? • Fitting and interpreting LMs • Transforming data • Hypothesis testing • Mixed-effect models

6. what is a linear model? regression ANOVA

7. what is a linear model? multiple regression multi-way ANOVA

8. what is a linear model? • mtcars: Let’s pretend that we would like to model 1/4 mile time (Y, the “response”) as a function of horsepower (X, the “predictor”) plus random noise Y = f(X)+e

9. what is a linear model? • mtcars: Let’s pretend that we would like to model 1/4 mile time (Y, the “response”) as a function of horsepower (X, the “predictor”) plus random noise Y = f(X)+e The LM: yi = b0 + xi b1 +ei

10. what is a linear model? yi = b0 + xi b1 +ei • Now, our task becomes a search for parameters that minimizes the sum of the squared residuals • The R function that does this magic is lm()

11. what is a linear model? yi = b0 + xi b1 +ei slope residuals intercept

12. what is a linear model? • mtcars: Let’s pretend that we would like to model 1/4 mile time (Y, the “response”) as a function of horsepower (X, the “predictor”) plus random noise Y = f(X)+e yi = b0 + xi b1 +ei Y = Xb +e The LM: In matrix notation:

13. what is a linear model? • Quick note: the “linear” in linear model refers to the fact that the function linearly transforms the parameters y = b0 +log(x)b1 +e y = b0 + x b1 +e y = b0 +(x2 +tanh(x))b1 +e ✔ ✗ ✔ valid valid not valid

15. regression ANOVA Y = Xb +e two flavors, one function: lm()

16. regression ANOVA Y = Xb +e two flavors, one function: lm()

17. regression ANOVA Y = Xb +e the “design matrix” accessible through model.matrix()

18. regression ANOVA Y = Xb +e the estimated parameters accessible through coef() or coefficients()

19. regression ANOVA Y = Xb +e the residuals accessible through resid() or residuals()

20. regression

21. regression function call

22. regression summary stats for residuals

23. regression summary stats for fitted coefficients

24. regression global model statistics

25. ANOVA

26. ANOVA

27. ANOVA 0 coef 1 coef 2 coef 3

28. ANOVA no intercept

29. ANOVA 0 coef 1 coef 2 coef 3 no intercept

30. ANOVA The broom package tidies your LMs • Summarize model outputs into tidy data frames: tidy() • Quickly view model-scale summaries: glance() • See the original data augmented with model statistics: augment() • There’s more to broom, so have a look for yourself.

31. ANOVA

32. ANOVA

33. ANOVA

34. some things to be aware of • LMs make several assumptions about your data, look them up. You want to be sure your data meets those assumptions reasonably well. – Homoscedasticity and normality of variance are the only assumptions we will discuss. • Look into “generalized linear models” (GLMs) and/or quantile regression for non-normally distributed data.

37. NOT HOMOSCEDASTIC!

38. testing for heteroscedasticity The ‘car’ package is your friend (Companion to Applied Regression) . Use car::ncvTest() to check for heteroscedasticity using the Breusch-Pagan test. (ncv = Non-Constant Variance).

39. testing for heteroscedasticity The ‘car’ package is your friend (Companion to Applied Regression) . Use car::ncvTest() to check for heteroscedasticity using the Breusch-Pagan test. (ncv = Non-Constant Variance).

40. variance-stabilizing transformations • Variance stabilizing transformations make it so that the variance of Y is not correlated with its mean value. • Take the Poisson distribution, its mean is equal to its variance. The square root is the variance stabilizing transformation of a Poisson RV.

43. the Box-Cox transformation • Helps alleviate non-normality and heteroscedasticity of residuals • Find a lambda that normalizes the data (maximum likelihood estimation) y l( ) = yl -1 l if l ¹0 log y( ) if l =0 ì í ïï î ï ï

48. transformations for “curvy” data • You can often use linear models to fit “curvy” data; you just need to transform the predictors, the responses, or both.

50. transformations for “curvy” data • You can often use linear models to fit “curvy” data; you just need to transform the predictors, the responses, or both. exponential model: log Y( )= Xb +e Y = eXb+e

52. additional thoughts • Not everything can be transformed to be normal / homosecdastic, and not everything necessarily needs to be. – Consider nonparametric methods or GLMs. – ANOVA is somewhat robust to heteroscedasticity when n and/or effect size is relatively large. • Use QQ plots to assess normality – qqnorm(); also Shapiro-Wilk test – shapiro.test() • The poly() function in conjunction with lm() can be used to fit n- degree polynomials. – Generally want to use raw = FALSE with poly()

54. multiple comparisons problem

55. multiple comparisons problem p-value = 0.04

56. handling multiple comparisons • The p.adjust() function is useful – method = “Bonferroni” controls the “familywise error rate” (FWER) – method = “BH” controls the “false discovery rate” (FDR) • The multcomp package provides a general framework for simultaneous hyp. Testing – Simultaneous Inference in General Parametric Models, Hothorn et al., Biometrical Journal, 2008.

57. the multcomp package

58. the multcomp package p-value = 0.2

59. the multcomp package • Can specify contrasts with short cuts e.g., “Dunnett” and “Tukey” • Can specify contrasts as strings, e.g., “tx 7 – ctl = 0”

60. multcomp example: superadditivity • Are any of the drugs synergistic? Do any of them antagonize each other?

61. multcomp example: superadditivity

62. multcomp example: superadditivity

63. lots glaring omissions • Experimental designs • Interaction terms • Model parameterization • Variable selection • Confidence intervals • ANCOVA models • Random effects vs fixed effects • Much more…

64. resources • MOOCs: Lots of good LM courses out there • Books: – Linear models with R – Julian Faraway – Extending the linear model with R – Julian Faraway – Mixed-Effects Models in S and S-PLUS – Jose Pinheiro & Doug Bates – Mixed-Effects models and Extensions in Ecology with R – Alain Zuur • http://bbolker.github.io/mixedmodels-misc/glmmFAQ.html – Ben Bolker’s GLMM FAQ (author of lme4)

R meetup lm

Recommended

Recommended

More Related Content

Similar to R meetup lm

Similar to R meetup lm (20)

Recently uploaded

Recently uploaded (20)

R meetup lm