Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
How (Not) To Lie With Statistics.pptx
1. How (Not) To Lie With Statistics
Crina Boroş
European Investigative Reporting & Data Harvest Conference 2020
2. Data Integrity
Checks
Are these data complete? If not:
• Fill in the blanks. FOI for it if necessary
• Focus on a narrower story while you
fill in the blanks
• Understand the meaning and size of
the missing data
• Understand how the data are collected
- i.e. forms – how often and how
exposed to human errors
• If all else fails, report the lack of
accountability over the missing data.
E.g. health services that privatise a
national health system
• Don’t settle for summarised data, you
want it n its smallest detail and in full
• You cannot have a real average
without a complete dataset
• Always start your research with a set
of questions in mind
3. Who or what is
AVERAGE?
Mean, Median, Mode | Frequency – which one is
“typical”?
Average = Mean – the most commonly reported one = the
sum of elements divided by the count of the elements
Median – Shows the middle value of a group of values.
Splits the data in two – half of the values are above and
half below the median
Mode – The most frequent identical value – problematic
when there’s several of the same count
Frequency – buckets of value ranges
4. Money Matters
• Let’s look at some salary data: (link)
• Median requires a list of all values
• Trimmed Mean – gets rid of problematic outliers
• Interpreting outliers
• Look out for typos in numbers!
• Averages may hide important information – e.g. fire alarm responses time
• Salaries // house prices/values/taxes // money matters in general
• When the MODE is the story - https://theblacksea.eu/stories/how-romania-sold-
out-its-workers-to-foreign-investors-for-imf-and-eu-cash/
• When Mean = Median, we have a perfect Bell curve
• Standard Deviation (below the MEAN in normal distribution)
6. Standard Deviation
• Standard Deviation (SD) - whether values cluster around
the mean, or whether there’s a lot of variation from the
mean
• The higher the SD, the higher the variation. This could be
an interesting source of story ideas for reporting
• An example: Women’s Rights in the Arab League
countries https://news.trust.org/item/20131108170910-
qacvu/
SD can help see themes coming out. Some of the strongest
and most uniform responses here were for:
• Women don't feel comfortable reporting crimes to the
police
• A lack of social protection for women exposed to violence
or rape
• Impunity for perpetrators
It suggests criminal justice institutions play a significant
role. This should be followed with dedicated reporting.
7. POLLS – a weapon of mass-
distortion?
• METHODOLOGY - the mother and father of
polls/surveys
• RANDOMISED INTERVIEWEE SELECTION –
everyone has to have the same chance to be
selected
• Scientific respondents pool selection – drawing
names from a hat
• REPRESENTATIVE – all: age groups, sexes,
education level, professional and societal classes,
earning levels, relevant geography, ethnicities…
• This is tied with accessibility: a combination of
field, phone and internet is ideal
• Beware of experts – a poll could mean they’re in
blatant conflict of interest
• Methodology fully transparent. Report caveats.
Protect vulnerable sources.
8. POLLS
• Sample size calculators
RAOSOFT -
http://www.raosoft.com/samplesize.html
THE SURVEY SYSTEM -
https://www.surveysystem.com/sscalc.htm
SURVEY MONKEY -
https://www.surveymonkey.co.uk/mp/sampl
e-size-calculator/
• 95% Confidence Interval = 19 times out of
20
• Margin of error – nothing is pure
• Population size – surprising results e.g.
London vs. Bucharest
9. Medical stats – scratch all of
the above. Almost.
• Confidence levels are usually higher than 95%
• Certain populations are more susceptible to some affections than others (genes,
vaccination, lifestyle)
• Some viruses mutate – which strand are you looking at? E.g. C19
• Surgeons’ patients mortality rates – a hoax
• Surgeons’ patients morbidity rates – it’s actually group work
• Patients’ complaints survey
• WHO vs. NHS on Group B Streptococcus (GBS): 53/year vs. 50/50
• The case of Beethoven – every case is unique
11. POLLS
• Do questions include “I Don’t know?”
• Is the methodology published in a clear, transparent and
sufficiently detailed manner?
• Is the polling company happy to engage with you?
• Are there any biases in how the survey was conceived?
And if so, are these biases reported anywhere?
• Are you being bullied into revealing your sources?
• Are sources safe to take your poll?
• Who has financed the survey and to what purpose? The
case of Cambridge Analytica
• Are you lying through omission?
12. Apples-to-
apples vs.
apples-to-pears
• Rates – crime rates, infection rates
Making sense: per capita? Per 100,000? Per
10,000?
Narrow down per social groupings – does
this reveal anything?
• Ratios – loans customers’ ethnicity,
incarceration – sex or ethnicity, asylum
seeking via trafficking route male vs.
female, exam failures
This can reveal systemic bias.
Minority in minority – but is it a fair
number? Report the number as a % of that
ethnicity’s representatives in a place for a
more accurate measure.
• Percentage changes – changes over time;
Property value vs. market value, but does it
explain how expensive it is to live in a place?
Number of pupils graduating with high
literacy scores – but have the exams
changed in a substantial way?
13. Recommendation
• Numbers in the Newsroom, by Sarah
Cohen
• Computer Assisted Reporting, by Brant
Houston
• Statistics Essentials for Dummies, by
Deborah Rumsey