SlideShare a Scribd company logo
1 of 29
Trends towards significance
Stephen Senn
stephen@senns.uk
Senn: Trends toward significance 1
http://www.senns.uk/Blogs.html
Senn: Trends toward significance 2
“Many of the test procedures for
examining model adequacy that are
provided in standard software are best
regarded as defined directly by the test
statistics used rather than by a family of
alternatives.” (p37)
“With large amounts of data, small
departures from H0, of no subject-matter
importance, may be highly significant.”
(p42)
David Cox (1924-2022)
Outline
3
Senn:
Trends
toward
significance
Statement of the problem
Dealing with a red herring
Some properties of P-values
Replication
The real problem?
Conclusions
Statement of the problem
Trends toward significance and replication
Senn: Trends toward significance 4
Trend toward Significance
Examples
The results showed a trend towards significance
P=0.07
The result was not significant due to the trial
being too small
As more trials become available the meta-
analysis will be even more significant
This trial was significant so the next one should
be also
Senn: Trends toward significance 5
When faced with a P value that has failed to
reach some specific threshold (generally
P<0.05), authors of scientific articles may
imply a “trend towards statistical
significance” or otherwise suggest that the
failure to achieve statistical significance was
due to insufficient data. This paper presents a
quantitative analysis to show that such
descriptions give a misleading impression and
undermine the principle of accurate reporting.
Wood J, Freemantle N, King M, Nazareth I.
Trap of trends to statistical significance:
likelihood of near significant P value
becoming more significant with extra
data. BMJ. Mar 31 2014;348:g2215.
doi:10.1136/bmj.g2215
Dealing with a red herring
Significance testing versus hypothesis testing etc. etc.
Senn: Trends toward significance 6
Red Herrings
Supposed differences between frequentists
Fisher
• Significance testing
• Stresses P-values
• Calculates tail area probabilities
• Inferences not decisions
• Has no way of addressing
sensitivity of experiments
• ???
Neyman-Pearson
• Hypothesis testing
• Uses fixed type I error rates
• Determines critical values
• Decisions not inferences
• Solves problem of sensitivity
using alternative hypothesis
• Power
Senn: Trends toward significance Stephen Senn 7
Are P-values really a difference between the
two schools?
Fisher introduces significance levels A disciple of Neyman’s advocates P-values
Senn: Trends toward significance 8
Fisher RA. Statistical Methods for Research Workers.
Oliver and Boyd; 1925.
Lehmann EL. Testing Statistical Hypotheses.
Second ed. Chapman and Hall; 1994.
The real differences
Issue Neyman Pearson Fisher
Power versus likelihood The justification of likelihood lies in
power
Likelihood is fundamental and
power is irrelevant
Alternative hypotheses and test
statistic
The alternative hypothesis dictates
choice of test statistic
H1  Test statistic
The tests statistic is chosen on the
basis of experience
Test statistic  hypotheses
Conditioning Forward-looking properties of the
test are what matters
Conditioning on relevant subsets (if
they exist) is crucial
Sensitivity of experiment Power for given alternative Precision (standard error etc)
Senn: Trends toward significance 9
Some properties of P-values
Power to the P-value?
Senn: Trends toward significance 10
Problems with P-Values as evidence?
• Other things being equal smaller
P-values indicate weaker support
for H0
• and stronger for “H1”
• But be careful! Other things may
not be equal.
• Even if the P-value is small it could
be more likely under H0 than
under H1
• But be doubly careful! Who
knows what “H1”is?
Senn: Trends toward significance 11
My view
• P-values are more useful than significance, even if you want significance
• They provide a useful but limited function as one form of a surprise index
• But there are others
• And although they don’t require much input they often don’t deliver much output
• If you can do better you should
• But can you?
• You should use other approaches also
• However a P-value is a (vaguely defined) position it is not a movement
• There is no such thing as a trend towards significance
• This is a fallacy but also made (implicitly) in some discussions of the ‘replication
crisis’
• The anticipated evidence fallacy: ‘This trial was significant and so the next one should be too’
Senn: Trends toward significance 12
Replication
Being wrong about being right
Senn: Trends toward significance 13
Replication probabilities
from a position of
(Bayesian) ignorance
• Q1. What is the probability that in a future
experiment, taking that experiment’s results
on its own, the results would favour 1 rather
than 2?
• Q2. What is the probability, having conducted
this experiment, and pooled its results with the
current one, the results would favour 1 rather
than 2?
• Q3. What is the probability that having
conducted a future experiment and then
calculated a Bayesian posterior using a
uninformative prior and the results of this
second experiment alone, the probability that
2 would be worse than 1 would be less than or
equal to 0.05?
Senn: Trends toward significance 14
Why if this is a problem it cannot uniquely be
a problem of P-values
• Imagine two similar experiments
considering the same question
• Choose any statistic you like to
express the results of experiments
• Calculate this statistic for
experiment one (independently of
experiment 2) and call it S1
• Calculate this statistic for
experiment two (independently of
experiment 1) and call it S2
• Assume that nothing else is known
(for example you have not yet seen
the results)
Senn: Trends toward significance 15
P(S1 > S2) = P(S2 > S1)
Senn: Trends toward significance 16
Sauce for the Goose and Sauce for the Gander
• This property is shared by Bayesian statements
• It follows from the Martingale property of Bayesian forecasts
• Hence, either
• The property is undesirable and hence is a criticism of Bayesian methods also
• Or it is desirable and is a point in favour of frequentist methods
• My own view is that it is much less important than many suppose
• We are interested in the truth and replication is only a means of
getting there
• A trial that is no more precise than the first one is only a very
imperfect judge
The Two
Trials Rule
Senn: Trends toward significance 18
Parameter H0 H1
Mean,  0 2.8
Variance of
true effect
0 1
Standard error
Of statistic
1 1
Probability of
significance
0.05 0.80
1000 trials. In half the cases H0 is true
Senn: Trends toward significance 19
As before but magnified to
concentrate more easily on the
‘significance’ boundary of 0.05
Senn: Trends toward significance 20
NB Bayes factor calculated at non-centrality of 2.8, which corresponds to the value for planning
Magnified plot only
Comparison of Two Two-Trial Strategies
Statistics Frequentist P<0.05 Bayesian BF<1/10
Number of trials run in total 1372 1299
Number of false positives 0 1
Number of false negatives 68 133
Number of false decisions 68 134
Number of correct decisions 932 866
Trials per correct decision 1.47 1.5
Senn: Trends toward significance 21
Simulation values are as previously
Second trial is only run if first is ‘positive’
Efficacy declared if both trials are ‘positive’
Figures are per 1000 trials for which H0 is true in 500 and H1 in 500
Lessons
• This does not show that significance is better than Bayes factors
• Different parameter settings could show the opposite
• Different thresholds could show the opposite
• It does show that reproducibility is not necessarily the issue
• The strategy with poorer reproducibility is (marginally) better
• NB That does not have to be the case
• But it shows it can be the case
• It raises the issue of calibration
• And it also raises the issue as to whether reproducibility is a useful
goal in itself
• And how it should be measured
Senn: Trends toward significance 22
The real problem
It’s in the analysis
Senn: Trends toward significance 23
Standard errors reported as being smaller
than they are
Block main effect problems
• Incorrect unit of experimental
inference (Pseudoreplication,
Hurlbert, 1984)
• For example allocation of treatments to
cages but counting rats as the ‘n’
• Inappropriate experimental analogue
for observational studies
• Historical data treated as if they were
concurrent
• Thus parallel group trial is used as
inappropriate model rather than cluster
randomised trial
• This may apply more widely to
epidemiological studies
Block interaction problems
• Mistaking causal inference for
predictive inference
• Causal: What happened to the patients in
this trial?
• Interaction does not contribute to the error
• Fixed effects meta-analysis is an example
• Predictive what will happen to patients in
future?
• Interaction should contribute to the error
• Random effects meta-analysis is an
example
Senn: Trends toward significance 24
An example
• Peters et al 2005, describe ways of
combining different
epidemiological (humans) and
toxicological studies (animals)
using Bayesian approaches
• Trihalomethanes and low
birthweight used as an illustration
• However, their analysis of the
toxicology data from rats treats the
pups as independent, which makes
very strong (implicit) and
implausible assumptions
• Pseudoreplication
Discussed in Grieve and Senn (2005)
(Unpublished)
For pseudoreplication See Hurlbert
SH. Pseudoreplication and the design
of ecological field experiments.
Ecological monographs.
1984;54(2):187-211.
Senn: Trends toward significance 25
Other Problems
• Publication bias
• Hacking
• Multiplicity
• In the sense of only concentrating on the good news
• Surrogate endpoints
• Naïve expectations
Senn: Trends toward significance 26
Conclusions
Nearly finished!
Senn: Trends toward significance 27
My ‘Conclusions’
• P-values per se are not the problem
• There may be a harmful culture of ‘significance’ however this is defined
• P-values have a limited use as rough and ready tools using little structure
• Where you have more structure you can often do better
• Likelihood, Bayes etc
• Point estimates and standard errors are extremely useful for future research
synthesizers and should be provided regularly
• We need to make sure that estimates are reported honestly and standard
errors calculate appropriately
• We should not confuse the evidence from a study with the conclusion that
should be made
Senn: Trends toward significance 28
Finally, a thought about laws (or hypotheses)
from The Laws of Thought
Senn: Trends toward significance 29
When the defect of data* is supplied by
hypothesis, the solutions will, in general, vary
with the nature of hypotheses assumed.
Boole (1854) An Investigation of the Laws of Thought: Macmillan.
. P375
Quoted by R. A. Fisher (1956) Statistical methods and scientific inference
In Statistical Methods, Experimental Design and Scientific Inference (ed
J. H. Bennet), Oxford: Oxford University. p23
*Fisher interprets defect of data is supplied by hypothesis as meaning
supplying by hypothesis what is lacking in the data.
In worrying about the hypotheses, let’s not overlook the
defects of data

More Related Content

What's hot

The Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxThe Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxStephen Senn
 
The challenge of small data
The challenge of small dataThe challenge of small data
The challenge of small dataStephen Senn
 
Minimally important differences
Minimally important differencesMinimally important differences
Minimally important differencesStephen Senn
 
First in man tokyo
First in man tokyoFirst in man tokyo
First in man tokyoStephen Senn
 
Minimally important differences v2
Minimally important differences v2Minimally important differences v2
Minimally important differences v2Stephen Senn
 
Thinking statistically v3
Thinking statistically v3Thinking statistically v3
Thinking statistically v3Stephen Senn
 
The revenge of RA Fisher
The revenge of RA Fisher The revenge of RA Fisher
The revenge of RA Fisher Stephen Senn
 
To infinity and beyond v2
To infinity and beyond v2To infinity and beyond v2
To infinity and beyond v2Stephen Senn
 
NNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measuresNNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measuresStephen Senn
 
Seventy years of RCTs
Seventy years of RCTsSeventy years of RCTs
Seventy years of RCTsStephen Senn
 
Understanding randomisation
Understanding randomisationUnderstanding randomisation
Understanding randomisationStephen Senn
 
The Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective StatisticiansThe Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective StatisticiansStephen Senn
 
Numbers needed to mislead
Numbers needed to misleadNumbers needed to mislead
Numbers needed to misleadStephen Senn
 
Real world modified
Real world modifiedReal world modified
Real world modifiedStephen Senn
 
To infinity and beyond
To infinity and beyond To infinity and beyond
To infinity and beyond Stephen Senn
 
Why I hate minimisation
Why I hate minimisationWhy I hate minimisation
Why I hate minimisationStephen Senn
 
The revenge of RA Fisher
The revenge of RA FisherThe revenge of RA Fisher
The revenge of RA FisherStephen Senn
 
In search of the lost loss function
In search of the lost loss function In search of the lost loss function
In search of the lost loss function Stephen Senn
 
De Finetti meets Popper
De Finetti meets PopperDe Finetti meets Popper
De Finetti meets PopperStephen Senn
 

What's hot (20)

The Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradoxThe Rothamsted school meets Lord's paradox
The Rothamsted school meets Lord's paradox
 
The challenge of small data
The challenge of small dataThe challenge of small data
The challenge of small data
 
Minimally important differences
Minimally important differencesMinimally important differences
Minimally important differences
 
First in man tokyo
First in man tokyoFirst in man tokyo
First in man tokyo
 
Minimally important differences v2
Minimally important differences v2Minimally important differences v2
Minimally important differences v2
 
Thinking statistically v3
Thinking statistically v3Thinking statistically v3
Thinking statistically v3
 
The revenge of RA Fisher
The revenge of RA Fisher The revenge of RA Fisher
The revenge of RA Fisher
 
On being Bayesian
On being BayesianOn being Bayesian
On being Bayesian
 
To infinity and beyond v2
To infinity and beyond v2To infinity and beyond v2
To infinity and beyond v2
 
NNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measuresNNTs, responder analysis & overlap measures
NNTs, responder analysis & overlap measures
 
Seventy years of RCTs
Seventy years of RCTsSeventy years of RCTs
Seventy years of RCTs
 
Understanding randomisation
Understanding randomisationUnderstanding randomisation
Understanding randomisation
 
The Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective StatisticiansThe Seven Habits of Highly Effective Statisticians
The Seven Habits of Highly Effective Statisticians
 
Numbers needed to mislead
Numbers needed to misleadNumbers needed to mislead
Numbers needed to mislead
 
Real world modified
Real world modifiedReal world modified
Real world modified
 
To infinity and beyond
To infinity and beyond To infinity and beyond
To infinity and beyond
 
Why I hate minimisation
Why I hate minimisationWhy I hate minimisation
Why I hate minimisation
 
The revenge of RA Fisher
The revenge of RA FisherThe revenge of RA Fisher
The revenge of RA Fisher
 
In search of the lost loss function
In search of the lost loss function In search of the lost loss function
In search of the lost loss function
 
De Finetti meets Popper
De Finetti meets PopperDe Finetti meets Popper
De Finetti meets Popper
 

Similar to Trends towards significance

Senn repligate
Senn repligateSenn repligate
Senn repligatejemille6
 
Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error. Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error. EPINOR
 
Some statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisSome statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisUC Davis
 
P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...David Pratap
 
Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively jemille6
 
Introduction to Hypothesis Testing
Introduction to Hypothesis TestingIntroduction to Hypothesis Testing
Introduction to Hypothesis Testingjasondroesch
 
Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Vlady Fckfb
 
Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Vlady Fckfb
 
Replication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden ControversiesReplication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden Controversiesjemille6
 
Bio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical researchBio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical researchShinjan Patra
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis TestingJeremy Lane
 
Test of significance in Statistics
Test of significance in StatisticsTest of significance in Statistics
Test of significance in StatisticsVikash Keshri
 
How to conduct health technology assessment using Gradepro
How to conduct health technology assessment using GradeproHow to conduct health technology assessment using Gradepro
How to conduct health technology assessment using GradeproArin Basu
 

Similar to Trends towards significance (20)

Senn repligate
Senn repligateSenn repligate
Senn repligate
 
Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error. Lecture by Professor Imre Janszky about random error.
Lecture by Professor Imre Janszky about random error.
 
Some statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysisSome statistical concepts relevant to proteomics data analysis
Some statistical concepts relevant to proteomics data analysis
 
To p or not to p
To p or not to pTo p or not to p
To p or not to p
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...P-values the gold measure of statistical validity are not as reliable as many...
P-values the gold measure of statistical validity are not as reliable as many...
 
Nonparametric Statistics
Nonparametric StatisticsNonparametric Statistics
Nonparametric Statistics
 
Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively Statistical skepticism: How to use significance tests effectively
Statistical skepticism: How to use significance tests effectively
 
Bayes rpp bristol
Bayes rpp bristolBayes rpp bristol
Bayes rpp bristol
 
Introduction to Hypothesis Testing
Introduction to Hypothesis TestingIntroduction to Hypothesis Testing
Introduction to Hypothesis Testing
 
Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2
 
Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2Psbe2 08 research methods 2011-2012 - week 2
Psbe2 08 research methods 2011-2012 - week 2
 
Replication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden ControversiesReplication Crises and the Statistics Wars: Hidden Controversies
Replication Crises and the Statistics Wars: Hidden Controversies
 
Bio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical researchBio-Statistics in Bio-Medical research
Bio-Statistics in Bio-Medical research
 
Hypothesis Testing
Hypothesis TestingHypothesis Testing
Hypothesis Testing
 
Hypothesis Testing.pptx
Hypothesis Testing.pptxHypothesis Testing.pptx
Hypothesis Testing.pptx
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
 
Test of significance in Statistics
Test of significance in StatisticsTest of significance in Statistics
Test of significance in Statistics
 
How to conduct health technology assessment using Gradepro
How to conduct health technology assessment using GradeproHow to conduct health technology assessment using Gradepro
How to conduct health technology assessment using Gradepro
 

Recently uploaded

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxTanveerAhmed817946
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 

Recently uploaded (20)

代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Digi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptxDigi Khata Problem along complete plan.pptx
Digi Khata Problem along complete plan.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 

Trends towards significance

  • 1. Trends towards significance Stephen Senn stephen@senns.uk Senn: Trends toward significance 1 http://www.senns.uk/Blogs.html
  • 2. Senn: Trends toward significance 2 “Many of the test procedures for examining model adequacy that are provided in standard software are best regarded as defined directly by the test statistics used rather than by a family of alternatives.” (p37) “With large amounts of data, small departures from H0, of no subject-matter importance, may be highly significant.” (p42) David Cox (1924-2022)
  • 3. Outline 3 Senn: Trends toward significance Statement of the problem Dealing with a red herring Some properties of P-values Replication The real problem? Conclusions
  • 4. Statement of the problem Trends toward significance and replication Senn: Trends toward significance 4
  • 5. Trend toward Significance Examples The results showed a trend towards significance P=0.07 The result was not significant due to the trial being too small As more trials become available the meta- analysis will be even more significant This trial was significant so the next one should be also Senn: Trends toward significance 5 When faced with a P value that has failed to reach some specific threshold (generally P<0.05), authors of scientific articles may imply a “trend towards statistical significance” or otherwise suggest that the failure to achieve statistical significance was due to insufficient data. This paper presents a quantitative analysis to show that such descriptions give a misleading impression and undermine the principle of accurate reporting. Wood J, Freemantle N, King M, Nazareth I. Trap of trends to statistical significance: likelihood of near significant P value becoming more significant with extra data. BMJ. Mar 31 2014;348:g2215. doi:10.1136/bmj.g2215
  • 6. Dealing with a red herring Significance testing versus hypothesis testing etc. etc. Senn: Trends toward significance 6
  • 7. Red Herrings Supposed differences between frequentists Fisher • Significance testing • Stresses P-values • Calculates tail area probabilities • Inferences not decisions • Has no way of addressing sensitivity of experiments • ??? Neyman-Pearson • Hypothesis testing • Uses fixed type I error rates • Determines critical values • Decisions not inferences • Solves problem of sensitivity using alternative hypothesis • Power Senn: Trends toward significance Stephen Senn 7
  • 8. Are P-values really a difference between the two schools? Fisher introduces significance levels A disciple of Neyman’s advocates P-values Senn: Trends toward significance 8 Fisher RA. Statistical Methods for Research Workers. Oliver and Boyd; 1925. Lehmann EL. Testing Statistical Hypotheses. Second ed. Chapman and Hall; 1994.
  • 9. The real differences Issue Neyman Pearson Fisher Power versus likelihood The justification of likelihood lies in power Likelihood is fundamental and power is irrelevant Alternative hypotheses and test statistic The alternative hypothesis dictates choice of test statistic H1  Test statistic The tests statistic is chosen on the basis of experience Test statistic  hypotheses Conditioning Forward-looking properties of the test are what matters Conditioning on relevant subsets (if they exist) is crucial Sensitivity of experiment Power for given alternative Precision (standard error etc) Senn: Trends toward significance 9
  • 10. Some properties of P-values Power to the P-value? Senn: Trends toward significance 10
  • 11. Problems with P-Values as evidence? • Other things being equal smaller P-values indicate weaker support for H0 • and stronger for “H1” • But be careful! Other things may not be equal. • Even if the P-value is small it could be more likely under H0 than under H1 • But be doubly careful! Who knows what “H1”is? Senn: Trends toward significance 11
  • 12. My view • P-values are more useful than significance, even if you want significance • They provide a useful but limited function as one form of a surprise index • But there are others • And although they don’t require much input they often don’t deliver much output • If you can do better you should • But can you? • You should use other approaches also • However a P-value is a (vaguely defined) position it is not a movement • There is no such thing as a trend towards significance • This is a fallacy but also made (implicitly) in some discussions of the ‘replication crisis’ • The anticipated evidence fallacy: ‘This trial was significant and so the next one should be too’ Senn: Trends toward significance 12
  • 13. Replication Being wrong about being right Senn: Trends toward significance 13
  • 14. Replication probabilities from a position of (Bayesian) ignorance • Q1. What is the probability that in a future experiment, taking that experiment’s results on its own, the results would favour 1 rather than 2? • Q2. What is the probability, having conducted this experiment, and pooled its results with the current one, the results would favour 1 rather than 2? • Q3. What is the probability that having conducted a future experiment and then calculated a Bayesian posterior using a uninformative prior and the results of this second experiment alone, the probability that 2 would be worse than 1 would be less than or equal to 0.05? Senn: Trends toward significance 14
  • 15. Why if this is a problem it cannot uniquely be a problem of P-values • Imagine two similar experiments considering the same question • Choose any statistic you like to express the results of experiments • Calculate this statistic for experiment one (independently of experiment 2) and call it S1 • Calculate this statistic for experiment two (independently of experiment 1) and call it S2 • Assume that nothing else is known (for example you have not yet seen the results) Senn: Trends toward significance 15 P(S1 > S2) = P(S2 > S1)
  • 16. Senn: Trends toward significance 16 Sauce for the Goose and Sauce for the Gander • This property is shared by Bayesian statements • It follows from the Martingale property of Bayesian forecasts • Hence, either • The property is undesirable and hence is a criticism of Bayesian methods also • Or it is desirable and is a point in favour of frequentist methods • My own view is that it is much less important than many suppose • We are interested in the truth and replication is only a means of getting there • A trial that is no more precise than the first one is only a very imperfect judge
  • 18. Senn: Trends toward significance 18 Parameter H0 H1 Mean,  0 2.8 Variance of true effect 0 1 Standard error Of statistic 1 1 Probability of significance 0.05 0.80 1000 trials. In half the cases H0 is true
  • 19. Senn: Trends toward significance 19 As before but magnified to concentrate more easily on the ‘significance’ boundary of 0.05
  • 20. Senn: Trends toward significance 20 NB Bayes factor calculated at non-centrality of 2.8, which corresponds to the value for planning Magnified plot only
  • 21. Comparison of Two Two-Trial Strategies Statistics Frequentist P<0.05 Bayesian BF<1/10 Number of trials run in total 1372 1299 Number of false positives 0 1 Number of false negatives 68 133 Number of false decisions 68 134 Number of correct decisions 932 866 Trials per correct decision 1.47 1.5 Senn: Trends toward significance 21 Simulation values are as previously Second trial is only run if first is ‘positive’ Efficacy declared if both trials are ‘positive’ Figures are per 1000 trials for which H0 is true in 500 and H1 in 500
  • 22. Lessons • This does not show that significance is better than Bayes factors • Different parameter settings could show the opposite • Different thresholds could show the opposite • It does show that reproducibility is not necessarily the issue • The strategy with poorer reproducibility is (marginally) better • NB That does not have to be the case • But it shows it can be the case • It raises the issue of calibration • And it also raises the issue as to whether reproducibility is a useful goal in itself • And how it should be measured Senn: Trends toward significance 22
  • 23. The real problem It’s in the analysis Senn: Trends toward significance 23
  • 24. Standard errors reported as being smaller than they are Block main effect problems • Incorrect unit of experimental inference (Pseudoreplication, Hurlbert, 1984) • For example allocation of treatments to cages but counting rats as the ‘n’ • Inappropriate experimental analogue for observational studies • Historical data treated as if they were concurrent • Thus parallel group trial is used as inappropriate model rather than cluster randomised trial • This may apply more widely to epidemiological studies Block interaction problems • Mistaking causal inference for predictive inference • Causal: What happened to the patients in this trial? • Interaction does not contribute to the error • Fixed effects meta-analysis is an example • Predictive what will happen to patients in future? • Interaction should contribute to the error • Random effects meta-analysis is an example Senn: Trends toward significance 24
  • 25. An example • Peters et al 2005, describe ways of combining different epidemiological (humans) and toxicological studies (animals) using Bayesian approaches • Trihalomethanes and low birthweight used as an illustration • However, their analysis of the toxicology data from rats treats the pups as independent, which makes very strong (implicit) and implausible assumptions • Pseudoreplication Discussed in Grieve and Senn (2005) (Unpublished) For pseudoreplication See Hurlbert SH. Pseudoreplication and the design of ecological field experiments. Ecological monographs. 1984;54(2):187-211. Senn: Trends toward significance 25
  • 26. Other Problems • Publication bias • Hacking • Multiplicity • In the sense of only concentrating on the good news • Surrogate endpoints • Naïve expectations Senn: Trends toward significance 26
  • 28. My ‘Conclusions’ • P-values per se are not the problem • There may be a harmful culture of ‘significance’ however this is defined • P-values have a limited use as rough and ready tools using little structure • Where you have more structure you can often do better • Likelihood, Bayes etc • Point estimates and standard errors are extremely useful for future research synthesizers and should be provided regularly • We need to make sure that estimates are reported honestly and standard errors calculate appropriately • We should not confuse the evidence from a study with the conclusion that should be made Senn: Trends toward significance 28
  • 29. Finally, a thought about laws (or hypotheses) from The Laws of Thought Senn: Trends toward significance 29 When the defect of data* is supplied by hypothesis, the solutions will, in general, vary with the nature of hypotheses assumed. Boole (1854) An Investigation of the Laws of Thought: Macmillan. . P375 Quoted by R. A. Fisher (1956) Statistical methods and scientific inference In Statistical Methods, Experimental Design and Scientific Inference (ed J. H. Bennet), Oxford: Oxford University. p23 *Fisher interprets defect of data is supplied by hypothesis as meaning supplying by hypothesis what is lacking in the data. In worrying about the hypotheses, let’s not overlook the defects of data