3. Introduction
Any study is vulnerable to two types of errors
• Random error
Due to chance
Increasing sample size can reduce
• Systematic error (also called bias)
Consistent, repeatable error in flawed design
Can be attributed to a cause and not by chance
Often cannot be controlled by statistical analysis
4. Example
• Checking BP of 10,000 people
• Known population mean (For e.g., 130mmHg)
Population mean
Random error
Systematic error
5. What is Bias?
• Any systematic error in the design, conduct or
analysis of a study that results in a mistaken
estimate of outcome variable
• Any trend in the collection, analysis, interpretation, publication or
review of data, that can lead to conclusions that are systematically
different from the truth (John M Last 2011)
7. SELECTION BIAS
• Error introduced when the study population
does not represent the target population
• Can be introduced during
Design, due to
bad definition of the eligible population
lack of accuracy of sampling frame
uneven diagnostic procedures
Implementation
8. Selection bias due to inappropriate definition of
eligible population
• Healthcare access bias
• Neyman bias
• Spectrum bias
• Healthy worker effect
• Berkson’s bias
• Exclusion bias
9. Healthcare access bias
Patients admitted to an institution do not represent the cases
originated in the community
Popularity bias
Centripetal bias
Referral filter bias
Diagnostic/treatment access bias
10. Neyman bias
• Also called prevalence-incidence bias or selective survival bias
• Both cross-sectional and case-control studies
• Gap in time occurs between exposure & selection of participants
• In studies of diseases that are quickly fatal, transient or
subclinical
• Introduced as a result of selective survival among prevalent
cases
11. Example:
• A case-control study investigating pneumonia that only
enrolls cases and controls admitted to a hospital
• Those with pneumonia who died prior to admission will not be
included the sample
• The selected sample will, therefore, include moderately severe
cases, but not fatal cases
12. Spectrum bias
• In the assessment of validity of a diagnostic test
• Bias is produced when researchers included only ‘‘clear” or
‘‘definite” cases
• E.g., In a study investigating the ability of MR imaging to detect
cirrhosis, if only advanced clinical cases are included the
sensitivity will be overestimated
13. Healthy worker effect
• Lower mortality observed in the employed population when
compared with the general population
• Any excess risk associated with an occupation will tend to be
underestimated by a comparison with general population
14. Berkson’s bias
• Arises when the study population is selected from a specific
subpopulation, such as hospital
• Individuals in the hospital population more likely to have both
exposure & disease
• Can lead to spurious associations between exposure and
disease
15. • Sackett, 1979: analysed data from 257 hospitalized individuals
• Detected association between locomotor & respiratory disease
(OR 4.06)
• Repeated analysis in 2783 individuals from general population,
no association (OR 1.06)
• Original analysis of hospitalized individuals was biased because
both diseases caused individuals to be hospitalized
• By looking only within the stratum of hospitalized individuals,
observed distorted association
16. Exclusion bias
• Controls with conditions related to the exposure are excluded,
whereas cases with these diseases as comorbidities are kept
• E.g., Reserpine and breast cancer: controls with cardiovascular
disease were excluded but this criterion was not applied to cases
• This yielded a spurious association between reserpine and
breast cancer
17. Selection bias due to lack of accuracy of sampling
frame
Non-random sampling bias
This selection procedure can yield a nonrepresentative sample
in which a parameter estimate differs from the existing at the
target population
18. Selection bias due to uneven diagnostic
procedures in the target population
Diagnostic suspicion bias
Unmasking (detection signal) bias
Mimicry bias
19. Diagnostic suspicion bias
• Suspicions of conditions could influence how quickly people are
investigated, which can affect rates of diagnosis
• Diagnostic test accuracy studies that include selected patients
because they are more likely to have the condition based on
clinical suspicion typically overestimate the accuracy of the test
20. Unmasking (detection signal) bias
• Some exposures cause people to be given a diagnosis earlier,
and these might not be causes of the disease
• If a medication can cause vaginal bleeding
people with this symptom go sooner to the doctor
receive earlier or more intensive examination
investigations to diagnose cancer
it may appear that the medication caused the cancer
21. Mimicry bias
• When there is condition mimicking the disease, it could lead to
false conclusions about the causes of the disease of interest
• E.g., Sackett 1979 – oral contraceptive & hepatitis
22. Selection bias during study implementation
Losses/withdrawals to follow up
Non-response bias
Healthy volunteer effect
23. Withdrawal/Lost to follow-up (Attrition bias)
• Losses/withdrawals are uneven in both the exposure and
outcome categories
• E.g., trial to evaluate effectiveness of new medication for
disease
100 each in treatment & control group
30 dropout in treatment group, 10 in control group
If dropouts in treatment group experience more severe
side effects underestimation of true adverse effects
24. Non-response bias
• Non-responders from a sample differ in a meaningful way to
responders
• E.g., those with poorer health tend to avoid taking part in health
surveys and those who do take part report better health status
and behaviours (healthy volunteer effect)
25. INFORMATION BIAS
• Occurs during data collection
• Flaw in measuring exposure or outcome
variable that results in different quality
(accuracy) of information
• Three main types
Misclassification bias
Ecological fallacy
Regression to the mean
26. Misclassification bias
• Individuals are assigned to a different category than the one
they should be in
• Can lead to incorrect associations between assigned categories
and outcomes of interest
• Two types:
1. Differential or non-random
2. Non-differential or random
27. Differential / Non-random misclassification bias
Recall bias
• Person with disease/outcome tend to
recall exposure better
• Differential memory for the exposure
in the cases relative to the controls
• More likely to misclassify the
exposure in the controls than in the
cases
Case-control
Cases
Birth defect
Controls
Exposure? Exposure?
28. Surveillance bias
• More testing among exposure
group, leading to more detection
• Misclassify non-exposure group
as having less disease
• Also called detection bias
Cohort
Exposure
Smoking
Non-smokers
Emphysema? Emphysema?
29. Non-differential / random misclassification bias
• Exposure and disease equally misclassified
• Impact: dilution of effect, estimates become closer to null
Case-control
Cases Controls
Exposure? Exposure?
Cohort
Exposed Non-exposed
Emphysema? Emphysema?
30. Effect of non-differential misclassification bias
Correct classification
Heart attack
Yes No
High
fat diet
Ye
s
250 100
No 450 900
𝑅𝑅 =
250 350
450 1350
=
0.71
0.33
= 2.16
Suppose there is non-differential misclassification (20% No Yes)
Heart attack
Yes No
High
fat diet
Ye
s
340 280
No 360 620
𝑅𝑅 =
340 620
360 980
=
0.55
0.37
= 1.49
20%
31. Other biases producing misclassification
• Observer/Interviewer bias
Systematic difference between a true value and the value
observed due to observer variation
• Reporting bias
Social desirability bias
32. Ecological fallacy
• Analyses realised in an ecological (group level) analysis are used
to make inferences at the individual level
• E.g., higher prevalence of disease does not necessarily imply
that individuals have higher risk
• E.g., Boys score better in maths than girls is a group
generalisation
33. Regression to mean
• Variables that are initially extreme tend to move closer to the
average on subsequent measurements
• E.g., effectiveness of new BP medication
• Initial readings high BP
• Subsequent measurements lower BP
• Overestimating effectiveness of drug if regression to
mean not considered
34. Other information biases
Hawthorn effect
Lead-time bias
Protopathic bias
Temporal ambiguity
Will Rogers phenomenon
Verification bias
35. Hawthorn effect
• People behave differently because they know they are being
watched
• E.g., A survey of smoking by watching people during work
breaks might lead to observing much lower smoking rates than is
genuinely representative of the population under study
36. Lead time bias
• Survival time will appear to be longer in screen-detected people
37.
38. Protopathic bias
• Occurs when the applied treatment for a disease or outcome
appears to cause the outcome
• E.g., patients may take NSAIDS to relieve pain prior to the date
of diagnosis of the condition
• This may cause biased results, which could be misinterpreted as
a reverse causality effect whereby the drug causes the disease
39. Will Rogers phenomenon
• Improvement in diagnostic tests refines disease staging in
diseases such as cancer
• This produces a stage migration from early to more advanced
stages and an apparent higher survival
• This bias is relevant when comparing cancer survival rates
across time or even among centres with different diagnostic
capabilities
40. Verification bias
• Occurs when there is a difference in testing strategy between
groups of individuals
• E.g., D-dimer testing for diagnosing pulmonary embolism
• positive D-dimer: ventilation–perfusion scans
• negative D-dimer: routine clinical follow up
• asymptomatic pulmonary embolisms but negative D-dimer
results may not have been diagnosed by routine follow up
42. The Latin confundere – to mix together
“Confounding is confusion, or mixing, of effects;
the effect of the exposure is mixed together with
the effect of another variable, leading to bias”
(Rothman, 2002)
43.
44. CRITERIA
• It must be associated with both exposure and
outcome
• It is independently capable of giving the outcome
• It does not lie in the causal pathway
• It must be distributed unequally among
the groups being compared
45. EFFECTS OF CONFOUNDING
• An apparent association despite no real association
• An apparent absence of association despite a real existing
association
• May cause an overestimate of the true association (positive
confounding) or an underestimate of the association (negative
confounding)
46. IDENTIFYING CONFOUNDING
• Compare the estimated measure of association before and after
adjusting for confounding
• Determine whether a potential confounding variable is
associated with the exposure and also with the outcome
• Perform formal tests of hypothesis
47. RESIDUAL
CONFOUNDING
• Distortion that remains after
controlling for confounding in the
design and / or analysis of a
study
Coffee
drinking
Heart
health
Age, gender, smoking
Physical activity
48. • Unknown confounders or data on
these factors were not collected
• Control for confounding was not tight
enough
• Many errors in the classification of
subjects with respect to confounding
variables
49. Distortion that modifies
association between
exposure and outcome,
caused by the
presence of an
indication for the
exposure
Anti-
depressant
drug
Infertility
Depression
CONFOUNDING BY INDICATION
53. RESTRICTION
• Including the study participants of a
certain confounder category, thereby
eliminating its confounding effect
• Limitation
Reduces sample size
Residual confounding
Limits generalizability
54. MATCHING
•Pair each exposed subject with an
unexposed subject that shares the same
characteristic regarding the variable that we
want to control for
•Limitation
Time consuming
Limits sample size
56. Age
(confounder)
<50 years ≥50 years
Estimate and compare the relationship between exposure
and outcome in both strata and also with the crude estimate
57. CVD NO
CVD
TOTAL
Active 48 800 848
Not
active
69 625 694
Crude RR
=(48/848)/(69/694)
=0.57
<50 yrs
CVD NO
CVD
Active 25 600 625
Not
active
11 225 236
≥50 yrs
CVD NO
CVD
Active 23 200 223
Not
active
58 400 458
RR<50yr=0.8
6
RR≥50yr=0.8
1
60. • “When the incidence rate of disease in the presence of two
or more risk factors differ from the incidence rated expected
to result from their individual effects”
(MacMahon)
• The association between exposure and outcome is different
at different levels of 3rd variable (effect modifier)
62. • Effect can be
Synergism
Antagonism
• To detect, stratified analysis is used
The stratum specific estimates are different
63. Confounding Interaction
Distortion of the association
between an exposure and
outcome by a 3rd variable
Effect of 1 explanatory variable
on the outcome depends on the
level of another variable
Variables are not dependent on
each other
Variables are dependent on
each other
Needs to remove the effect Needs to report the effect
65. • Shows the connection between two
variables, explains the process in which
two variables relate
• Conditions
The independent variable must cause
or predict the mediator
The mediator must influence the
dependent variable
A case-control study investigating pneumonia that only enrols cases and controls admitted to a hospital. Those with pneumonia who died prior to admission will not be included the sample. The selected sample will, therefore, include moderately severe cases, but not fatal cases
The association between exposure and outcome is altered/distorted in the presence of another factor
The association between exposure and outcome is altered/distorted in the presence of another factor
If its distributed equally( both alcohol & non alcohol drinkers, we wont be able to know the impact
2. It usually happens when there are unknown confounders
To assess whether the variable is associated with the exposure and outcome, formal test of hypothesis- chi square test
Not considered/not attempted
Not considered/not attempted
Not considered/not attempted
Not considered/not attempted
Likely that the groups will have similar distribution of likely confounders like age, gender, lifestyles etc
If noy restrict tight or narrow enough, age and gender, to ensure age distributions are similar to groups being compared