Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Why the EPV≥10 sample size rule is rubbish and what to use instead
1. Maarten van Smeden, PhD
2 november 2020
Why the EPV≥10 sample size rule is rubbish
and what to use instead
2. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
• Statistician at Julius Center for Health Sciences and Primary Care
• Main interests (but not limited to):
• prognostic and diagnostic modeling
• measurement error
• missing data
Today’s topic:
EPV≥10 sample size rule (aka 1 in 10 rule) has be one of the leading
sample size rules in prognostic/diagnostic prediction modeling
5. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Outline
• The EPV≥10 rule-of-thumb: where does it come from?
• Evidence the EPV≥10 rule has no rationale
• Evidence that sample size is important (even if you use the fancier methods)
• Actual sample size calculations for prediction models
6. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Ever wondered if AD/BC gives the “best” estimate of the odds ratio?
What if I told you that AD/BC is biased?
7. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Let’s say we have fitted a logistic regression model to a dataset, and obtain
ln
𝑝𝑝𝑖𝑖
1 − 𝑝𝑝𝑖𝑖
= 𝛼𝛼� + 𝛽𝛽̂1 𝑋𝑋1𝑖𝑖 + 𝛽𝛽̂2 𝑋𝑋2𝑖𝑖 + ⋯ + 𝛽𝛽̂𝑘𝑘 𝑋𝑋𝑘𝑘𝑖𝑖
I’m very sorry, but 𝛽𝛽̂1 is a biased estimator, and 𝛽𝛽̂2 too, ….
…. actually they are all finite sample biased
8. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Epidemiology text-books:
• Confounding bias
• Information bias
• Selection bias
… nothing about finite sample bias
9. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Important: bias vs consistency
• Consistency ≈ as sample size increases, estimate converges to truth
• Bias ≈ with repeated samples, the average estimate converges to truth
10. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Log(odds) is consistent but finite sample biased
17. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Illustration by simulation
• Simulate 4 normal covariates with equal multivariable log-odds-ratios of 2
• 1,000 simulation samples of N = 50
• Consistency: create 1,000 meta-dataset of increasing size: meta-dataset
r consists of each created dataset up to r;
• Bias: calculate difference estimate of exposure effect and true value for
each of the created datasets up to r;
20. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Average of 400 studies
with N = 50
1 study with N = 20,000
21. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
With decreasing sample size
How we usually think
22. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
With decreasing sample size
But actually with odds ratios
(and other ratios)
23. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
The origin of the 1 in 10 rule
“For EPV values of 10 or greater, no major problems occurred. For EPV
values less than 10, however, the regression coefficients were biased in
both positive and negative directions”
25. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
More simulation studies
Citations based on Google Scholar, Oct 30 2020
citations: 5,736
“a minimum of 10 EPV […] may be too conservative”
“substantial problems even if the number of EPV exceeds 10”
For EPV values of 10 or greater, no major problems
citations: 2,438
citations: 216
26. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
More simulation studies
Citations based on Google Scholar, Oct 30 2020
citations: 5,736
“a minimum of 10 EPV […] may be too conservative”
“substantial problems even if the number of EPV exceeds 10”
For EPV values of 10 or greater, no major problems
citations: 2,438
citations: 216
!?!
27. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
• Examine the reasons for substantial differences between the earlier EPV
simulation studies
• Evaluate a possible solution to reduce the finite sample bias
28. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
• Examine the reasons for substantial differences between the earlier EPV
simulation studies (simulation technicality: handling of “separation”)
• Evaluate a possible solution to reduce the finite sample bias
29. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
• Examine the reasons for substantial differences between the earlier EPV
simulation studies (simulation technicality: handling of “separation”)
• Evaluate a possible solution to reduce the finite sample bias
30. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
• Firth’s ”correction” aims to reduce finite sample bias in maximum
likelihood estimates, applicable to logistic regression
• It makes clever use of the “Jeffries prior” (from Bayesian literature) to
penalize the log-likelihood, which shrinks the estimated coefficients
• It has a nice theoretical justifications, but does it work well?
31. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Standard
Averaged over 465 simulation conditions with 10,000 replications each
32. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
StandardFirth’scorrection
Averaged over 465 simulation conditions with 10,000 replications each
33. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Firth’s correction
Difficult? No
Example R code:
> require(“logistf”)
> logistf(Y~X1+X2+X3+X4, firth=T, data=df)
Compared to default (maxlik) logistic regression, Firth’s correction generally:
• Narrower confidence intervals
• Lower MSE
• Better predictions*
*requires adjustment of the intercept using flic=TRUE option in logistf
34. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Sample issue size solved?
… not quite!
• Precision of regression coefficients
• Variable selection and functional form
• Ensure predictions are adequate
35. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Sample issue size solved?
… not quite!
• Precision of regression coefficients
• Variable selection and functional form
• Ensure predictions are adequate
• Why would a one-solution fits all rule-of-thumb be appropriate?
• Think of sample size for a randomized clinical trial
Would be odd to suggest all trials should have 100 patients in each arm?
36. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
TRIPOD Item 8. Explain how the study size was arrived at
Moons et al. Ann Intern Med 2015 (TRIPOD Explanation & Elaboration)
“Although there is a consensus on the importance
of having an adequate sample size for model
development, how to determine what counts as
‘adequate’ is not clear …”
37. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Why is sample size important?
• We want to have a large enough sample size to develop a model that
provides accurate risk predictions in new individuals from target
population
• Many (most?) models do not perform well when checked in new data
• small sample sizes
• overfitting
• lack of (internal) validation
• …
38. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Recent example
• Reviewed 232 prediction models
• “All models were rated at high or
unclear risk of bias”
• Sample size: median 338; IQR 134 to 707
• Number of events: median 69; IQR 37 to 160
Living review, doi: 10.1136/bmj.m1328 (these numbers from a soon to appear review update)
39. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Recent example
• External validation 22 COVID-19 related
prognostic models
• Performance: poor to very poor
• “Admission oxygen saturation on room air and patient age are strong
predictors of deterioration and mortality among hospitalised adults with
COVID-19, respectively. None of the prognostic models evaluated here
offered incremental value for patient stratification to these univariable
predictors.”
40. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Small sample size and overfitting
• Spurious predictor-outcome associations
• Important predictors can be missed
• Unimportant predictors can be selected
• Regression coefficients too large and uncertain
• Model doesn’t predict well in new data
• Disappointing discrimination
• Often calibration slope < 1
https://twitter.com/LesGuessing/status/997146590442799105
41. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
With small N: calibration slope often < 1
Predictions too extreme
42. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
“Modern” methods aim to circumvent overfitting
• Penalised regression: e.g. lasso, ridge regression, elastic net
• Standard regression followed by uniform (global) shrinkage
• Target calibrated predicted risks in new data: shrinkage and penalty
terms estimated using bootstrapping or cross-validation
• Sample size problem solved?
43. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
“shrinkage works on the average but may fail in the particular unique
problem on which the statistician is working.”
• Required shrinkage is hard to estimate
• Often large uncertainty correct value to use, especially in small datasets (!)
44. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
“We conclude that, despite improved performance on average, shrinkage often
worked poorly in individual datasets, in particular when it was most needed.
The results imply that shrinkage methods do not solve problems associated
with small sample size or low number of events per variable.”
47. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Our proposal
• Calculate sample size that is needed to
• minimise potential overfitting
• estimate probability (risk) precisely
• Sample size formula’s for
• Continuous outcomes
• Time-to-event outcomes
• Binary outcomes (focus today)
48. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Example
• COVID-19 prognosis hospitalized
patients
• Composite outcome: “deterioration”
(in-hospital death, ventilator support,
ICU)
A priori expectations
• Event fraction at least 30%
• 40 candidate predictor parameters
• C-statistic of 0.71(conservative est)
-> Cox-Snell R2 of 0.24
MedRxiv Preprint (not peer reviewed): 10.1101/2020.10.09.20209957
49. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Restricted cubic splines
with 4 knots: 3 degrees of
freedom
Note: EPV rule also
calculates degrees of
freedom of candidate
predictors, not variables!
50. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Calculate required sample size
Criterion 1. Shrinkage: expected heuristic shrinkage factor, S ≥ 0.9
(calibration slope, target < 10% overfitting)
Criterion 2. Optimism: Cox-Snell R2 apparent - Cox-Snell R2 validation < 0.05
(overfitting)
Criterion 3: A small margin of error in overall risk estimate < 0.05 absolute error
(precision estimated baseline risk)
(Criterion 4: a small margin of absolute error in the estimated risks)
51. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Calculation
R code:
> require(pmsampsize)
> pmsampsize(type="b",rsquared=0.24,parameters=40,prevalence=0.3)
52. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
A few alternative scenarios
• rsquared=0.24,parameters=40,prevalence=0.3 -> EPV≥9.7
• rsquared=0.12,parameters=40,prevalence=0.3 -> EPV≥21.0
• rsquared=0.12,parameters=40,prevalence=0.5 -> EPV≥35.0
• rsquared=0.36,parameters=40,prevalence=0.2 -> EPV≥5
53. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
The sample size that meets all criteria is the MINIMUM required
• Why minimum? Other criteria may be important
e.g. missing data, clustering, variable selection
• May raise required sample size further
• Simulation based approaches
Preprint (not peer reviewed) doi: 10.21203/rs.3.rs-87100/v1
54. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Summary
• Default logistic regression produces finite sample biased estimates
• Finite sample bias can be substantial; easily solved using Firth’s correction
• “Modern” approaches (e.g. Firth, Lasso, Ridge) no compensation for low N
• New sample size criteria to replace the one-size-fits-all EPV≥10 rule
55. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
https://www.prognosisresearch.com/
New website by Richard Riley and Kym Snell
56. M.vanSmeden@umcutrecht.nl | Twitter: @MaartenvSmedenWhy the EPV ≥ 10 rule is rubbish and what to use instead
Work in collaboration with:
• Carl Moons
• Hans Reitsma
• Richard Riley (Keele, materials for this presentation)
• Gary Collins (Oxford)
• Ben Van Calster (Leuven)
• Ewout Steyerberg (Leiden)
• Rishi Gupta (UCL)
• Many others
Contact: M.vanSmeden@umcutrecht.nl