SlideShare a Scribd company logo
Web Archives for Verifying Attribution in
Twitter Screenshots
Presented By:
Tarannum Zaki, PhD Student
Advisors: Dr. Michael L. Nelson & Dr. Michele C. Weigle
Department of Computer Science
Old Dominion University, Norfolk, Virginia
April 26, 2024
@tarannum_zaki @WebSciDL
2024 Web Science and Digital Libraries Research Group Expo
Screenshots are commonly used to annotate the social media of others
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
2
https://twitter.com/BetteMidler/status/1541472225341198338
https://twitter.com/MahyarTousi/status/1534307163073658881 https://twitter.com/urbanachievr/status/1505944201208516612
Why screenshots?
To use as an evidence for deleted posts
3
https://web.archive.org/web/20220525125749/https://twitter.com/DanielDefense/status/1526237750277681154
Controversial posts
may be deleted.
https://twitter.com/ashtonpittman/status/1530243294868930560
https://twitter.com/DanielDefense/status/1526237750277681154
Other reasons: To deny cross-platform engagement, to aggregate, to mark-up etc.
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
Did they really post that?
Screenshots can also be used for humor, satire, and disinformation
4
https://twitter.com/Shayan86/status/1515753937139388418
https://twitter.com/paulthacker11/status/1495436489492090881
https://twitter.com/elonmusk/status/1544051155562598401
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
Creating fake tweets using Tweetgen
5
https://www.tweetgen.com/
https://www.tweetgen.com/create/tweet.html
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
Using the live web and web archives to validate attribution of
screenshots
6
https://www.google.com/search
https://archive.org/web/
https://www.reuters.com/
https://www.snopes.com/
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
Motivation
➢ Fake tweets can be responsible for misinformation/disinformation spread.
➢ Fake tweets are easy to create using online tools.
➢ There are no tools currently available to evaluate the authenticity of
attribution of screenshots.
7
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
Aim
To develop a tool that would automatically provide a probability
whether screenshot of a social media post was actually posted by the
alleged author using the services of live web and web archives.
8
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
To search for a tweet in the Wayback Machine, you must first
know its URL
9
https://web.archive.org/web/20220323185843/https://twitter.com/annaturley/status/1506706947239817224
URL of the tweet:
https://twitter.com/annaturley/status/1506706947239817224
https://web.archive.org/
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
But, URL of a tweet is not present in most screenshots
10
https://twitter.com/AaronBastani/status/1507391218854117377
@annaturley
March 23, 2022
March 25, 2022
https://twitter.com/TWITTER_HANDLE/status/TWEET_ID
https://web.archive.org/web/20220323185843/https://twitter.com/annaturley/status/1506706947239817224
Tweet ID encodes the timestamp of when
the tweet was created
Construction of a tweet URL
- Use the Twitter handle and approximate a time window based
on the timestamp.
- Construct URL for the tweet.
- Search for the tweet in the Wayback Machine using the URL.
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
Verifying if screenshot exists in the Wayback Machine
11
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
Creating a dataset of screenshots collected from Twitter
12
Fields
Shared post’s URL Original post’s URL
Category Reason
Content category Structural features
Post type Social media
Search strategy Annotated images
Screenshot Remarks
- Screenshot images shared on Twitter.
- 200 examples
- Examples include both real and fake screenshots
https://ws-dl.blogspot.com/2022/12/2022-12-12-disinformation-spread-on.html
https://twitter.com/rvawonk/status/1503227687917305863
https://twitter.com/RealCandaceO/status/1501576
352587292673
Category: Real
Reason: Found in the live web
Content category: Politics
Post Type: Tweet
Structural features: Single author, single
post
Search strategy: Searched on Twitter
interface
Social media: Twitter
Original post’s URL
Shared post’s URL
Screenshot
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
OCRing screenshots: Single tweet images
13
OCR
Optical Character Recognition extracts information as text from digital image.
Example screenshot image OCR extracted output
Twitter Handle
Timestamp
Tweet Text
Zaki, T., Nelson, M.L., and Weigle, M.C. (2023, Jun 14). Extracting Information from Twitter Screenshots. Tech Report arXiv:2306.08236. https://doi.org/10.48550/arXiv.2306.08236
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
Computing a time window based on the screenshot timestamp
14
The maximum difference between two time zones on Earth is 26 hours.
Example screenshot image OCR extracted output
Twitter handle and computed timestamps
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
Using CDX API to retrieve archived tweets with left hand boundary
15
request = "http://web.archive.org/cdx/search/cdx?url=" + urir + params
urir = "https://twitter.com/"+randyhillier+"/status"
params = "&matchType=prefix&from="+20220218154100
CDX API prefix search process
Twitter handle and computed timestamps
Output: Retrieved archived tweets with the left hand boundary(cropped).
https://archive.org/help/wayback_api.php
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
Extracting tweet IDs and determining tweet creation
timestamp using TweetedAt
16
https://web.archive.org/web/20220222163926/https://twitter.com/randyhillier/status/1006984708109099008
https://ws-dl.blogspot.com/2019/08/2019-08-03-tweetedat-finding-tweet.html
Each tweet ID encodes its
creation timestamp
An archived tweet’s URL
https://oduwsdl.github.io/tweetedat/#1006984708109099008
Tweet ID Tweet Creation Date
1006984708109099008 20180613194037
………… …………..
Mapping between all the tweet IDs and
tweet creation timestamps
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
Determining the final set of archived tweets by filtering the
tweet creation timestamps within the time window
17
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
Output: 917 archived tweets with left hand boundary (cropped)
Mapping between tweet ID and
tweet creation timestamp
Output: 29 archived tweets within 52 hours time window (cropped)
Creation timestamp of
tweets which does not
fall within the 52 hours
time window are filtered
out.
449 archived tweets
Multiple mementos are
filtered out.
29 archived tweets
Extracting tweet text from archived tweets using
BeautifulSoup and Selenium
18
https://web.archive.org/web/20220220024223/https://twitter.com/randyhillier/status/1495226962058649603
TweetTextSize TweetTextSize--jumbo js-tweet-text tweet-text
An archived tweet’s URL
Extracted text from archived tweet
HTML tag containing
the tweet text
https://www.selenium.dev/
https://pypi.org/project/beautifulsoup4/
Selenium automates web scraping and BeautifulSoup parses text from HTML.
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
Computing text similarity score between tweet text from
screenshot and archived tweets using Python’s difflib library
19
https://docs.python.org/3/library/difflib.html
Example screenshot image Extracted text from archived tweet Extracted tweet text from screenshot
match_score(Archived_Tweet_Text, Screenshot_Tweet_Text)= 81.40%
Text similarity score is computed based on longest common subsequence
Archived_Tweet_Text1 Screenshot_Tweet_Text match _score = 81.40%
Archived_Tweet_Text2 Screenshot_Tweet_Text match_score = 30.78%
Archived_Tweet_Text3 Screenshot_Tweet_Text match_score = 5.67%
……………..
A match score of 81.40% helps us to prove the existence of the screenshot tweet posted by the alleged
author.
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
A threshold of 60% produced the highest F1 (0.69)
20
Threshold Value Precision Recall F1 Score
90% 1.00 0.42 0.59
80% 1.00 0.49 0.66
70% 1.00 0.51 0.67
60% 1.00 0.53 0.69
Experimented on 108 single tweet images from the collected dataset.
Performance of the overlap between the tweet text from the
screenshot and the archived tweets.
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
Limitations & Future Work
21
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL
OCR
Complex screenshot images Extracted output mostly results in
garbage value.
Summary
22
➢ Screenshots are an easy way to share content on social media.
➢ Since screenshots can be easily faked, it is a critical task to detect a fabricated post.
➢ Services of web archives could be useful to verify attribution of a screenshot by finding
an archived version of the screenshot content.
➢ Our research will mitigate misinformation and disinformation spread on social media.
Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots
@tarannum_zaki @WebSciDL

More Related Content

Similar to Web Archives for Verifying Attribution in Twitter Screenshots

Twitter Presentation: #APIConSF
Twitter Presentation: #APIConSFTwitter Presentation: #APIConSF
Twitter Presentation: #APIConSF
Ryan Choi
 
Nick Ray - The Video Effect: Presentation
Nick Ray - The Video Effect: PresentationNick Ray - The Video Effect: Presentation
Nick Ray - The Video Effect: Presentation
Nick Ray
 
Free social media resources
Free social media resourcesFree social media resources
Free social media resources
Vivastream
 
Coolhunting - ThinkVis 2012
Coolhunting - ThinkVis 2012 Coolhunting - ThinkVis 2012
Coolhunting - ThinkVis 2012
Steve Lock
 

Similar to Web Archives for Verifying Attribution in Twitter Screenshots (20)

Twitter Presentation: #APIConSF
Twitter Presentation: #APIConSFTwitter Presentation: #APIConSF
Twitter Presentation: #APIConSF
 
Useful Twitter Tools
Useful Twitter ToolsUseful Twitter Tools
Useful Twitter Tools
 
Interlinking Multimedia: How to Apply Linked Data Principles to Multimedia F...
Interlinking Multimedia: How to Apply Linked Data Principles to Multimedia F...Interlinking Multimedia: How to Apply Linked Data Principles to Multimedia F...
Interlinking Multimedia: How to Apply Linked Data Principles to Multimedia F...
 
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...
Curiosity Bits Python Tutorial: Mining Facebook Fan Page - getting posts and ...
 
Sitecore SPEAK3 presentation
Sitecore SPEAK3 presentationSitecore SPEAK3 presentation
Sitecore SPEAK3 presentation
 
How to start SPEAK3 development
How to start SPEAK3 developmentHow to start SPEAK3 development
How to start SPEAK3 development
 
Student Activities and Social Media: Twitter and Foursquare
Student Activities and Social Media: Twitter and FoursquareStudent Activities and Social Media: Twitter and Foursquare
Student Activities and Social Media: Twitter and Foursquare
 
Introduction
IntroductionIntroduction
Introduction
 
Uncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pagesUncertainty in replaying archived Twitter pages
Uncertainty in replaying archived Twitter pages
 
Nick Ray - The Video Effect: Presentation
Nick Ray - The Video Effect: PresentationNick Ray - The Video Effect: Presentation
Nick Ray - The Video Effect: Presentation
 
Web 2.0 Resources in Adult Education Classroom
Web 2.0 Resources in Adult Education Classroom Web 2.0 Resources in Adult Education Classroom
Web 2.0 Resources in Adult Education Classroom
 
Why Tweet?
Why Tweet?Why Tweet?
Why Tweet?
 
Final presentation switter
Final presentation switterFinal presentation switter
Final presentation switter
 
YQL - HackU IIT Madras 2012
YQL - HackU IIT Madras 2012YQL - HackU IIT Madras 2012
YQL - HackU IIT Madras 2012
 
Free social media resources
Free social media resourcesFree social media resources
Free social media resources
 
Free Social Media Resources
Free Social Media ResourcesFree Social Media Resources
Free Social Media Resources
 
News Innovation Lightning Talk
News Innovation Lightning TalkNews Innovation Lightning Talk
News Innovation Lightning Talk
 
Coolhunting - ThinkVis 2012
Coolhunting - ThinkVis 2012 Coolhunting - ThinkVis 2012
Coolhunting - ThinkVis 2012
 
Bitrzr - Ignite Portugal Tecnológico
Bitrzr  - Ignite Portugal TecnológicoBitrzr  - Ignite Portugal Tecnológico
Bitrzr - Ignite Portugal Tecnológico
 
Power Up Your Professional Learning Network
Power Up Your Professional Learning NetworkPower Up Your Professional Learning Network
Power Up Your Professional Learning Network
 

Recently uploaded

一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
zahraomer517
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Domenico Conte
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 

Recently uploaded (20)

一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Introduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxxIntroduction-to-Cybersecurit57hhfcbbcxxx
Introduction-to-Cybersecurit57hhfcbbcxxx
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
Professional Data Engineer Certification Exam Guide  _  Learn  _  Google Clou...
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 

Web Archives for Verifying Attribution in Twitter Screenshots

  • 1. Web Archives for Verifying Attribution in Twitter Screenshots Presented By: Tarannum Zaki, PhD Student Advisors: Dr. Michael L. Nelson & Dr. Michele C. Weigle Department of Computer Science Old Dominion University, Norfolk, Virginia April 26, 2024 @tarannum_zaki @WebSciDL 2024 Web Science and Digital Libraries Research Group Expo
  • 2. Screenshots are commonly used to annotate the social media of others Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL 2 https://twitter.com/BetteMidler/status/1541472225341198338 https://twitter.com/MahyarTousi/status/1534307163073658881 https://twitter.com/urbanachievr/status/1505944201208516612
  • 3. Why screenshots? To use as an evidence for deleted posts 3 https://web.archive.org/web/20220525125749/https://twitter.com/DanielDefense/status/1526237750277681154 Controversial posts may be deleted. https://twitter.com/ashtonpittman/status/1530243294868930560 https://twitter.com/DanielDefense/status/1526237750277681154 Other reasons: To deny cross-platform engagement, to aggregate, to mark-up etc. Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
  • 4. Did they really post that? Screenshots can also be used for humor, satire, and disinformation 4 https://twitter.com/Shayan86/status/1515753937139388418 https://twitter.com/paulthacker11/status/1495436489492090881 https://twitter.com/elonmusk/status/1544051155562598401 Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
  • 5. Creating fake tweets using Tweetgen 5 https://www.tweetgen.com/ https://www.tweetgen.com/create/tweet.html Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
  • 6. Using the live web and web archives to validate attribution of screenshots 6 https://www.google.com/search https://archive.org/web/ https://www.reuters.com/ https://www.snopes.com/ Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
  • 7. Motivation ➢ Fake tweets can be responsible for misinformation/disinformation spread. ➢ Fake tweets are easy to create using online tools. ➢ There are no tools currently available to evaluate the authenticity of attribution of screenshots. 7 Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
  • 8. Aim To develop a tool that would automatically provide a probability whether screenshot of a social media post was actually posted by the alleged author using the services of live web and web archives. 8 Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
  • 9. To search for a tweet in the Wayback Machine, you must first know its URL 9 https://web.archive.org/web/20220323185843/https://twitter.com/annaturley/status/1506706947239817224 URL of the tweet: https://twitter.com/annaturley/status/1506706947239817224 https://web.archive.org/ Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
  • 10. But, URL of a tweet is not present in most screenshots 10 https://twitter.com/AaronBastani/status/1507391218854117377 @annaturley March 23, 2022 March 25, 2022 https://twitter.com/TWITTER_HANDLE/status/TWEET_ID https://web.archive.org/web/20220323185843/https://twitter.com/annaturley/status/1506706947239817224 Tweet ID encodes the timestamp of when the tweet was created Construction of a tweet URL - Use the Twitter handle and approximate a time window based on the timestamp. - Construct URL for the tweet. - Search for the tweet in the Wayback Machine using the URL. Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
  • 11. Verifying if screenshot exists in the Wayback Machine 11 Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
  • 12. Creating a dataset of screenshots collected from Twitter 12 Fields Shared post’s URL Original post’s URL Category Reason Content category Structural features Post type Social media Search strategy Annotated images Screenshot Remarks - Screenshot images shared on Twitter. - 200 examples - Examples include both real and fake screenshots https://ws-dl.blogspot.com/2022/12/2022-12-12-disinformation-spread-on.html https://twitter.com/rvawonk/status/1503227687917305863 https://twitter.com/RealCandaceO/status/1501576 352587292673 Category: Real Reason: Found in the live web Content category: Politics Post Type: Tweet Structural features: Single author, single post Search strategy: Searched on Twitter interface Social media: Twitter Original post’s URL Shared post’s URL Screenshot Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
  • 13. OCRing screenshots: Single tweet images 13 OCR Optical Character Recognition extracts information as text from digital image. Example screenshot image OCR extracted output Twitter Handle Timestamp Tweet Text Zaki, T., Nelson, M.L., and Weigle, M.C. (2023, Jun 14). Extracting Information from Twitter Screenshots. Tech Report arXiv:2306.08236. https://doi.org/10.48550/arXiv.2306.08236 Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
  • 14. Computing a time window based on the screenshot timestamp 14 The maximum difference between two time zones on Earth is 26 hours. Example screenshot image OCR extracted output Twitter handle and computed timestamps Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
  • 15. Using CDX API to retrieve archived tweets with left hand boundary 15 request = "http://web.archive.org/cdx/search/cdx?url=" + urir + params urir = "https://twitter.com/"+randyhillier+"/status" params = "&matchType=prefix&from="+20220218154100 CDX API prefix search process Twitter handle and computed timestamps Output: Retrieved archived tweets with the left hand boundary(cropped). https://archive.org/help/wayback_api.php Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
  • 16. Extracting tweet IDs and determining tweet creation timestamp using TweetedAt 16 https://web.archive.org/web/20220222163926/https://twitter.com/randyhillier/status/1006984708109099008 https://ws-dl.blogspot.com/2019/08/2019-08-03-tweetedat-finding-tweet.html Each tweet ID encodes its creation timestamp An archived tweet’s URL https://oduwsdl.github.io/tweetedat/#1006984708109099008 Tweet ID Tweet Creation Date 1006984708109099008 20180613194037 ………… ………….. Mapping between all the tweet IDs and tweet creation timestamps Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
  • 17. Determining the final set of archived tweets by filtering the tweet creation timestamps within the time window 17 Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL Output: 917 archived tweets with left hand boundary (cropped) Mapping between tweet ID and tweet creation timestamp Output: 29 archived tweets within 52 hours time window (cropped) Creation timestamp of tweets which does not fall within the 52 hours time window are filtered out. 449 archived tweets Multiple mementos are filtered out. 29 archived tweets
  • 18. Extracting tweet text from archived tweets using BeautifulSoup and Selenium 18 https://web.archive.org/web/20220220024223/https://twitter.com/randyhillier/status/1495226962058649603 TweetTextSize TweetTextSize--jumbo js-tweet-text tweet-text An archived tweet’s URL Extracted text from archived tweet HTML tag containing the tweet text https://www.selenium.dev/ https://pypi.org/project/beautifulsoup4/ Selenium automates web scraping and BeautifulSoup parses text from HTML. Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
  • 19. Computing text similarity score between tweet text from screenshot and archived tweets using Python’s difflib library 19 https://docs.python.org/3/library/difflib.html Example screenshot image Extracted text from archived tweet Extracted tweet text from screenshot match_score(Archived_Tweet_Text, Screenshot_Tweet_Text)= 81.40% Text similarity score is computed based on longest common subsequence Archived_Tweet_Text1 Screenshot_Tweet_Text match _score = 81.40% Archived_Tweet_Text2 Screenshot_Tweet_Text match_score = 30.78% Archived_Tweet_Text3 Screenshot_Tweet_Text match_score = 5.67% …………….. A match score of 81.40% helps us to prove the existence of the screenshot tweet posted by the alleged author. Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
  • 20. A threshold of 60% produced the highest F1 (0.69) 20 Threshold Value Precision Recall F1 Score 90% 1.00 0.42 0.59 80% 1.00 0.49 0.66 70% 1.00 0.51 0.67 60% 1.00 0.53 0.69 Experimented on 108 single tweet images from the collected dataset. Performance of the overlap between the tweet text from the screenshot and the archived tweets. Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
  • 21. Limitations & Future Work 21 Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL OCR Complex screenshot images Extracted output mostly results in garbage value.
  • 22. Summary 22 ➢ Screenshots are an easy way to share content on social media. ➢ Since screenshots can be easily faked, it is a critical task to detect a fabricated post. ➢ Services of web archives could be useful to verify attribution of a screenshot by finding an archived version of the screenshot content. ➢ Our research will mitigate misinformation and disinformation spread on social media. Tarannum Zaki WSDL Research Group Expo 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL