Presentation at the Truth and Trust Online Conference of the paper http://oro.open.ac.uk/62771/
Abstract:
With misinformation being one of the biggest issues of current times, many organisations are emerging to offer verifications of information and assessments of news sources. However, it remains unclear how they relate in terms of coverage, overlap and agreement. In this paper we introduce a comparison of the assessments produced by different organisations, in order to measure their overlap and agreement on news sources. Relying on the general term of credibility, we map each of the different assessments to a unified scale. Then we compare two different levels of credibility assessments (source level and document level) by using the data published by various organisations, including fact-checkers, to see which sources they assess more than others, how much overlap there is between them, and how much agreement there is between their verdicts. Our results show that the overlap between the different origins is generally quite low, meaning that different experts and tools provide evaluations for a rather disjoint set of sources, also when considering fact-checking. For agreement, instead we find that there are origins that agree more than others on the verdicts.
2. Assessments from journalists, fact-checkers, communities, tools
How do they relate?
1. Overlap: do they evaluate the same sources of information?
2. Agreement: do they have similar verdicts?
3. Granularity level: how does the verification of claims compare to the credibility of sources where they appear*?
2
Introduction
citizen Information
Expert
assessments
* https://schema.org/appearance
5. What is credibility?
- Trustworthiness (Hovland and Weiss 1951, Web of Trust) – more perception, by gut
- Expertise (Hovland and Weiss 1951) – related to the public image, history of news outlet
- Believability (Meyer, 1988) – ability to be believed, with reputation brings to credibility
- Community affiliation (Meyer, 1988) – context of the source: point of view / bias / opinion
- Factuality (fact-checkers, NewsGuard) – not publishing false information
- Safety (Web Of Trust) – safe content / virus / scam
- Popularity (PageRank) – how “well-known” is the source
- Transparency (NewsGuard, Newsroom Transparency Tracker) – adhering to a set of standards, accountability
- W3C discussion group “Credibility signals”
https://credweb.org/signals/
Credibility definition
5
6. 1. Which specific factors of credibility we are considering (credibility definition)
• Factuality
• Believability
• Trustworthiness
• Transparency
2. The dimensions used:
• Value: how much positive/negative is the assessment ⊂ [−1, +1]
• Confidence: how certain is the value ⊂ [0, +1]
Credibility formulation
Approach (1/2)
6
7. Mapping the assessments
Approach (2/2)
Assessor Credibility value Credibility confidence
Web Of Trust trust.score[0; 100] → cred[−1; 1] trust.conf[0; 100] → conf[0; 1]
NewsGuard linear score[0; 100] → cred[−1; 1]
exception Platform, Satire → cred = 0
conf = 1
exception Platform, Satire → conf = 0
Media Bias/Fact Check factuality{LOW, MIXED, HIGH} → cred{−1, 0, 1} conf = 1 if factuality, otherwise conf = 0
OpenSources fake → −1, reliable → 1
conspiracy, junksci → −0.8 clickbait, bias → −0.5
rumor, hate → −0.3 all other tags → 0
conf = 1 when credibility is not null
otherwise conf = 0
International Fact-
Checking Network
starting from cred = 1 apply penalties for partially (0.05)
and none (0.1) compliant with lower bound cred = 0
if expired signatory conf = 0.5
otherwise conf = 1
Newsroom Transparency
Tracker
proportionally to the number of indicators satisfied,
partial compliance counts half
conf = 1
ClaimReview cred =
)*+,-./*012345)6+7*+,-.
826+7*+,-.345)6+7*+,-.
∗ 2 − 1
otherwise try mapping the alternateName
if the mapping is successful conf = 1
otherwise conf = 0
7
8. *ClaimReview assessments are at the claim level, this number reflects the source level aggregation
Analysis: data statistics
Assessor Number of sources rated Average credibility
Web Of Trust 308155 0.4264
NewsGuard 2795 0.5433
Media Bias/Fact Check 2404 0.3874
OpenSources 811 -0.6618
International Fact-Checking Network 86 0.8786
Newsroom Transparency Tracker 52 0.4256
ClaimReview 379* -0.3349
8
9. Comparing the assessments:
1. Overlap: do they evaluate the same sources of information?
2. Agreement: do they have similar verdicts?
3. Granularity level: how does the verification of claims compare to the sources of appearance?
Recap RQs
9
10. RQ1: do they evaluate the same sources of information?
Overlap definitions:
- Symmetrical: Jaccard index 𝐽 𝐴, 𝐵 =
?∩A
?∪A
- Asymmetrical: 𝑜𝑣𝑒𝑟𝑙𝑎𝑝 𝐴 → 𝐵 =
?∩A
A
𝑜𝑣𝑒𝑟𝑙𝑎𝑝 ⊂ [0, +1]
Definition
Analysis: Overlap
|A| = 100000
|B| = 3000
|A∩B| = 2000
J A, B
2000
101000
= 1.98%
𝑜𝑣𝑒𝑟𝑙𝑎𝑝 𝐴 → 𝐵 =
2000
3000
= 66.67%
𝑜𝑣𝑒𝑟𝑙𝑎𝑝 𝐵 → 𝐴 =
2000
100000
= 2.00% 10
11. 1. How to read the figure
The value of each cell: percentage of how many
sources evaluated from the assessor on the row
have also been evaluated by the assessor on the
column
2. Notable examples:
• Several don’t provide ratings for platforms
(facebook.com, twitter.com, youtube.com, …)
3. The main problems:
• Most of the assessors just rate a few sources
• Most of the assessors rate disjoint sets of sources
Analysis: Overlap
11
12. RQ2: do they have similar verdicts?
Agreement definition: pairwise cosine similarity,
evaluated on the sources rated by both assessors
Example:
𝑎𝑔𝑟𝑒𝑒𝑚𝑒𝑛𝑡(𝐴, 𝐵) = 𝑐𝑜𝑠𝛼 = 0.375
𝑎𝑔𝑟𝑒𝑒𝑚𝑒𝑛𝑡 ⊂ [−1, +1]
Definition and results
Analysis: Agreement
12
Assessor Source_1 Source_2 Source_3 Source_4
A +1.0 +0.2 -0.8 -0.9
B +0.9 +0.5 +0.5 -1.0
13. Disagreement examples:
• weeklystandard.com
• IFCN: expired signatory
• NewsGuard: “generally maintains basic standards of credibility and transparency”
• NewsGuard: “does not handle the difference between news and opinion”
• OpenSources: political, bias
• Media Bias/Fact Check: factual reporting HIGH
• zerohedge.com
• NewsGuard: “severely violates basic standards of credibility and transparency”
• Media Bias/Fact Check: factual reporting MIXED
• Web of Trust: reputation 4.5/5
• breitbart.com
• NewsGuard: “generally maintains basic standards of credibility and transparency, with some significant exceptions”
• Web of Trust: reputation 4/5
• OpenSources: political, unreliable, bias
Why?
• They evaluate different criteria / features
• How can we validate / modify our mappings to a single scale? (Intuitiveness for the users)
• How can we prioritise disagreeing assessors?
Problems
Analysis: Agreement
13
14. RQ3: Do fact-checker verdicts match with source credibility?
Compare the extracted assessment with native source-level assessments
- Credibility: mean value of the fact-checking verdicts for the source considered
- Having a fact-checked article compensates having a false one?
- Selection bias of the fact-checkers?
Analysis: different granularities
14
Fact-check claim
Claim
appearance
Source
(breitbart.com)
Source
(politifact.com)
extracted source-level
assessment
15. 1. Examples:
• breitbart.com
• Positive: NewsGuard, Web Of Trust
• Negative fact-checks: Politifact, Les Decodeurs
• bbc.co.uk
2. Problems:
• Claim appearing on news outlets, just reporting
• Example: James Cleverly on BBC “Today” https://fullfact.org/europe/free-ports/
“The EU has stopped the UK from having free ports.” à Incorrect (FullFact)
• Claim appearance annotations in ClaimReview are few
• Platforms with user content: define more granularity levels
• Accounts / pages
• Subdomains
Analysis: different granularities
15
16. 1. How to handle disagreement? Which assessments to trust more?
• Credibility Propagation Graph:
• Using the credibility of the assessor itself (recursive)
• Confidence of the origin of the assessment
• Granularity
• Default and customizable credibility of starting nodes
2. How to present to the public
• Intuitiveness
• Avoid backfire
• Stimulate interest
• Level of details
Open challenges
16