This document discusses statistical analysis methods for measuring educational outcomes in continuing medical education (CME). It addresses common statistical questions around determining if there was an educational effect from a CME activity, quantifying the size of any effect, and comparing effects across activities. Specific statistical tests are outlined for analyzing categorical and ordinal data from pre-/post-activity assessments, including knowledge questions, case studies, and ratings of clinical practice strategies. Effect size is presented as a standardized measure for quantifying and comparing the magnitude of educational effects both within and across CME activities. Examples are provided demonstrating how to calculate effect sizes using online statistical calculators and Excel.
How to Add a New Field in Existing Kanban View in Odoo 17
Statistical Analysis for Educational Outcomes Measurement in CME
1. Statistical Analysis for Educational
Outcomes Measurement in CME
Jason J. Olivieri, MPH
Director, Outcomes Services
Imedex, LLC
(404) 319-9782
j.olivieri@imedex.com
www.assessCME.wordpress.com
2. Statistical Questions in CME
1. Was there an educational effect?
2. If so, how big was the effect?
3. How does this effect compare with other CME
activities?
3. Was there an educational effect?
Statistical tests of significance
• Determine whether any observed differences (e.g., pre- vs. post) are
due to random chance.
• Observed differences not attributed to random chance are
considered “statistically significant” and reflect a true difference
between groups that could be associated with participation in this
educational activity.
• Statistical significance is reported as a “P value”. A P value ≤ .05 is
considered statistically significant. Generally speaking, a P value ≤
.05 means that there is a 5% chance or less that the result of a
particular comparison is due to random chance.
4. Statistical tests of significance:
Choosing the right test
• Define variable type (typically either categorical or
ordinal)
• Is the comparison group data (e.g., pre vs. post) paired
or unpaired?
• What is the sample size?
5. Statistical tests of significance
• Categorical variables (e.g., knowledge “test” question)
– Unpaired data (comparison groups not matched)
• Chi square (samples ≥ 5)
• Fisher’s exact test (samples < 5)
– Paired data (matched comparison groups)
• McNemar’s test
6. Knowledge Change
Which of the following is NOT a complication associated with bisphosphonate therapy?
19%
10%
6%
17%
48%
8%
4%
2% 1%
85%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Arm pain Renal dysfunction Osteonecrosis of the jaw Atypical fx of the femur Peripheral neuropathy
(correct)
Pre-activity (n = 111) Post-activity (n = 157)
37% increase
7. Competency Change (case vignette)
Frontline therapy for a former smoker with symptomatic advanced stage adenocarcinoma of the lung
(EGFR+)
51%
12%
19%
12%
6%
75%
6%
3%
9%
6%
0%
10%
20%
30%
40%
50%
60%
70%
80%
Erlotinib alone
(evidence-based, preferred)
Erlotinib + bevacizumab Carboplatin-paclitaxel &
bevacizumab
Carboplatin-pemetrexed &
bevacizumab
Full house: erlotinib +
carboplatin-paclitaxel &
bevacizumab
Control (n = 65) Participants (n = 65)
24% increase
8. Statistical tests of significance
• Ordinal variables (e.g., rating scale)
– Unpaired data (comparison groups not matched)
• Mann-Whitney U
– Pair data (matched comparison groups)
• Wilcoxon signed-rank test
9. 5.9
6
6.5
5.8
5.6
4.8
6.3
4.4
4.3
4.7
3.2
3.8
2.9
5.5
0 1 2 3 4 5 6 7
Achieving hemostatis with Coblation for adenotonsillectomy
Using Coblation for adenoidectomy
Using Coblation for complete tonsillectomy
Using Coblation for partial tonsillectomy
Using a microdebrider for adenoidectomy
Using a microdebrider for intracapsular tonsillectomy
Selecting an appropriate surgical technique for
adenotonsillectomy for specific indications in patients
Confidence in performing seven clinical tasks in adenotonsillectomy
(1 = not at all confident, 7 = extremely confident)
Pre (n = 57) Post (n = 49)
Knowledge (self-efficacy) Change
10. Competency Change
Using bevacizumab-based combo therapy for non-squamous NSCLC
32%
9%
41%
14%
5%
14%
18%
23%
32%
14%
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
1 = Never Not very often Sometimes Often 5 = Always
Current vs. planned use of bevacizumab-based combo therapy in nonsquamous NSCLC
Current (mean = 2.5, n = 44) Planned (mean = 3.1, n = 39)
Overall 26% increase in
planned vs current use
11. Once you know which statistical test to use,
where do you go?
www.VassarStats.net
12. Statistical test of significance:
Categorical variable example
•Participants in a CME activity were administered a paper-based pre-
and post-activity survey which contained five knowledge “test” questions
based on the CME content
•Survey participant responses were anonymous (i.e., pre/post not
matched)
•You want to determine if the proportion of respondents answering the
question correctly pre- vs. post-activity is significantly different
13. Knowledge Change
Which of the following is NOT a complication associated with bisphosphonate therapy?
19%
10%
6%
17%
48%
8%
4%
2% 1%
85%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Arm pain Renal dysfunction Osteonecrosis of the jaw Atypical fx of the femur Peripheral neuropathy
(correct)
Pre-activity (n = 111) Post-activity (n = 157)
37% increase
14. Statistical tests of significance
• Categorical variables (e.g., knowledge “test” question)
– Unpaired data (comparison groups not matched)
• Chi square (samples ≥ 5)
• Fisher’s exact test (samples < 5)
– Paired data (matched comparison groups)
• McNemar’s test
15. Calculating significance for a categorical variable
• Determine # of correct / incorrect answers for each
group (e.g., pre- and post-activity)
– Pre-activity: .48*111 = 53 correct / 58 incorrect
– Post-activity: .85*157 = 133 correct / 25 incorrect
• Plug these values into online calculator
(www.vassarstats.net)
– Click “frequency data”
– Click “Chi-Square, Cramer’s V and Lambda”
18. Knowledge Change
Which of the following is NOT a complication associated with bisphosphonate therapy?
19%
10%
6%
17%
48%
8%
4%
2% 1%
85%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Arm pain Renal dysfunction Osteonecrosis of the jaw Atypical fx of the femur Peripheral neuropathy
(correct)
Pre-activity (n = 111) Post-activity (n = 157)
37% increase
The proportion of respondents
answering this question correctly
pre- vs. post activity was not likely
due to random chance, P < . 0001
19. Statistical test of significance:
Ordinal variable example
•Participants in a CME activity were asked via ARS to rate their pre- and
post-activity use of five clinical practice strategies tied to the CME
content
•Survey participant responses were anonymous (i.e., pre/post not
matched)
•You want to determine whether the difference in rating pre- vs. post-
activity is significantly different
20. Competency Change
Using bevacizumab-based combo therapy for non-squamous NSCLC
32%
9%
41%
14%
5%
14%
18%
23%
32%
14%
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
1 = Never Not very often Sometimes Often 5 = Always
Current vs. planned use of bevacizumab-based combo therapy in nonsquamous NSCLC
Current (mean = 2.5, n = 44) Planned (mean = 3.1, n = 39)
Overall 26% increase in
planned vs current use
21. Statistical tests of significance
• Ordinal variables (e.g., rating scale)
– Unpaired data (comparison groups not matched)
• Mann-Whitney U
– Pair data (matched comparison groups)
• Wilcoxon signed-rank test
22. Calculating significance for an ordinal variable
• Go to www.vassarstats.net
– Click “ordinal data”
– Click “Mann-Whitney U test”
• Enter in the total number of pre-activity (“sample A”) and post-
activity (“sample B”) respondents
• Copy and paste pre- and post-activity responses into “sample A”
and “sample B” boxes
• Click “Import data to data cells”
• Click “Calculate from Raw Data”
27. Competency Change
Using bevacizumab-based combo therapy for non-squamous NSCLC
32%
9%
41%
14%
5%
14%
18%
23%
32%
14%
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
1 = Never Not very often Sometimes Often 5 = Always
Current vs. planned use of bevacizumab-based combo therapy in nonsquamous NSCLC
Current (mean = 2.5, n = 44) Planned (mean = 3.1, n = 39)
Overall 26% increase in
planned vs current use
The difference between pre- and
post-activity rating was statistically
significant, P = . 02
28. That’s it! You’re now able to
answer whether there was an
educational effect.
Now onto quantifying the
magnitude of such effect…
29. CME
activity
EOM plan
Pre- vs. post-activity assessment
via ARS or paper survey
Data 8 case vignette or clinical
practice strategy questions
Didactic presentation followed
by case-based discussion
30. Example paper survey question (clinical practice strategy)
Using bevacizumab-based combo therapy for non-squamous NSCLC
31. CME
activity
EOM plan
Pre- vs. post-activity assessment
via ARS or paper survey
Data 8 case vignette questions or
clinical practice strategies
Didactic presentation followed
by case-based discussion
How do we summarize
this data?
And how do we then
compare this result to
results of other activities?
32. What is effect size?
•Quantifies the magnitude of effect (maximum expected range: -3 to +3)
•Difference in means (e.g., pre-test and post-test) divided by the square
root of the pooled-group variances (Cohen’s d)
•Enables the comparison of CME effects across activities on a common
dimensionless scale
•Calculated from comparison data (e.g., pre/post, post/control) linked
directly to CME content
−Knowledge questions
−Case vignettes
−Self-reported frequency of use of key clinical practice strategies
34. Calculating effect size
• Can be done using only MS Excel® and
free, online resources
• Approach dependent upon variable type:
– ordinal (e.g., clinical practice strategy)
– categorical (e.g., case vignette)
35. Example paper survey question (clinical practice strategy)
Using bevacizumab-based combo therapy for non-squamous NSCLC
36. Calculating effect size for an ordinal variable
(e.g., clinical practice strategy)
• Calculate average and standard deviation for each group
(e.g., pre- and post-activity)
– Pre-activity: mean (SD) = 2.5 (1.3)
– Post-activity: mean (SD) = 3.1 (1.2)
• Plug these values into an online calculator
(http://www.uccs.edu/~lbecker/)
38. Clinical practice strategy (CPS) use rating (1= never, 5 = always)
Pre-test Post-test Effect size
CPS #1
Standard
deviation
CPS #1
Standard
deviation
Cohen’s d
2.5 1.3 3.1 1.2 .48
*d = .2 (small effect), d = .5 (medium effect), d = .8 (large effect)
How is effect size interpreted? Ordinal variable example
How big was the educational effect?
Expressed in standard deviation units: The average
score of a post-test respondent was .48 standard
deviations above the average score of a pre-test
respondent
Effect sizes are proportional (.48 is twice as much
effect as .24)
How does the effect compare to other activities?
Cohen (1988): .2 = small, .5 = medium, .8 = large
Wolf (1986): .25 = educationally significant, .50 = clinically significant
39. Example ARS question (case vignette)
Frontline therapy for a former smoker with symptomatic advanced stage adenocarcinoma of the lung
(EGFR+)
40. Calculating effect size for a categorical variable
(e.g., case vignette)
• Determine # of correct / incorrect answers for each
group (e.g., pre- and post-activity)
– Pre-activity: .51*65 = 33 correct / 32 incorrect
– Post-activity: .75*65 = 49 correct / 26 incorrect
• Plug these values into online calculator
(www.vassarstats.net)
– Click “frequency data”
– Click “Chi-Square, Cramer’s V and Lambda”
42. Calculating effect size for a categorical
variable, continued
• Visit www.lyonsmorris.com/ma1/index.cfm
• Select “Correlation coefficient (r) to Effect Size”
• Enter Cramer’s V (.1474)
• Enter total number of pre-activity and post-activity
respondents
44. How effective was your live CME in 2013?
10 live CME
activities
8 ARS questions
per activity
80 slides 10 effect sizes
One summary
effect size for
live CME
46. Overall EIS (by format)
June 2010 – Sept 2013
n=9
1. Cohen. J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Earlbaum Associates.
2. Mansouri & Lockyer. J Contin Educ Health Prof 2007;27:6-15.
3. Drexel et al. Int J Chron Obstruct Pulmon Dis 2011; 6: 297–307.
4. Casebeer et al. BMC Med Educ 2010;10: 42.
4Casebeer et al 2010. Knowledge
effect size (eLearning) = .82
3Drexel et al 2011.
Competence effect size = .85
2Mansouri & Lockyer 2007.
Knowledge effect size = .6
1Cohen J 1988. Small effect = .2, Medium effect = .5, Large effect = .8