Users rarely think about verifying screenshots of social media posts before sharing them on social media. This eventually leads to the spread of misinformation and disinformation. We are developing an automated tool to estimate the probability that a screenshot of a social media post is fake. In many cases, web archives can be used to validate the attribution of such screenshots.
Web Archives for Verifying Attribution in Twitter ScreenshotsTarannum Zaki
Users rarely think about verifying screenshots of social media posts before sharing them on social media. This eventually leads to the spread of misinformation and disinformation. We are developing an automated tool to estimate the probability that a screenshot of a social media post is fake. In many cases, web archives can be used to validate the attribution of such screenshots.
Extracting Information from Twitter ScreenshotsTarannum Zaki
Screenshots are prevalent on social media as a common approach for information sharing. Users rarely verify before sharing screenshots whether they are fake or real. Information sharing through fake screenshots can be highly responsible for misinformation and disinformation spread on social media. There are services of the live web and web archives that could be used to validate the content of a screenshot. We are going to develop a tool that would automatically provide a probability whether a screenshot is fake by using the services of the live web and web archives.
Challenges in Replaying Archived Twitter PagesKritika Garg
Historians and researchers rely on web archives to preserve social media content that no longer exists on the live web. However, what we see on the live web and how it is replayed in the archive are not always the same. In this study, we document and analyze the problems in archiving Twitter after Twitter switched to a new user interface (UI) in June 2020. Most web archives were unable to archive the new UI, resulting in archived Twitter pages displaying Twitter’s “Something went wrong” error. The challenges in archiving the new UI forced web archives to continue using the old UI. But, features such as Twitter labels were a part of the new UI, hence web archives archiving Twitter’s old UI would be missing these labels. To analyze the potential loss of information in web archival data due to this change, we used the personal Twitter account of the 45th President of the United States, @realDonaldTrump, which was suspended by Twitter on January 8, 2021. Trump’s account was heavily labeled by Twitter for spreading misinformation, however we discovered that there is no evidence in web archives to prove that some of his tweets ever had a label assigned to them. We also studied the possibility of temporal violations in archived versions of the new UI, which may result in the replay of pages that never existed on the live web. We also discovered that when some tweets with embedded media are replayed, portions of the rewritten t.co URL, which is meant to be hidden from the end-user, is partially exposed in the replayed page. Our goal is to educate researchers who may use web archives and caution them when drawing conclusions based on archived Twitter pages.
This content shows how to get Twitter geo-located data using QGIS (1. Installation of QGIS and Plugin 2. Twitter API application, and 3. Example of getting data from Twitter API).
Web Archives for Verifying Attribution in Twitter ScreenshotsTarannum Zaki
Users rarely think about verifying screenshots of social media posts before sharing them on social media. This eventually leads to the spread of misinformation and disinformation. We are developing an automated tool to estimate the probability that a screenshot of a social media post is fake. In many cases, web archives can be used to validate the attribution of such screenshots.
Extracting Information from Twitter ScreenshotsTarannum Zaki
Screenshots are prevalent on social media as a common approach for information sharing. Users rarely verify before sharing screenshots whether they are fake or real. Information sharing through fake screenshots can be highly responsible for misinformation and disinformation spread on social media. There are services of the live web and web archives that could be used to validate the content of a screenshot. We are going to develop a tool that would automatically provide a probability whether a screenshot is fake by using the services of the live web and web archives.
Challenges in Replaying Archived Twitter PagesKritika Garg
Historians and researchers rely on web archives to preserve social media content that no longer exists on the live web. However, what we see on the live web and how it is replayed in the archive are not always the same. In this study, we document and analyze the problems in archiving Twitter after Twitter switched to a new user interface (UI) in June 2020. Most web archives were unable to archive the new UI, resulting in archived Twitter pages displaying Twitter’s “Something went wrong” error. The challenges in archiving the new UI forced web archives to continue using the old UI. But, features such as Twitter labels were a part of the new UI, hence web archives archiving Twitter’s old UI would be missing these labels. To analyze the potential loss of information in web archival data due to this change, we used the personal Twitter account of the 45th President of the United States, @realDonaldTrump, which was suspended by Twitter on January 8, 2021. Trump’s account was heavily labeled by Twitter for spreading misinformation, however we discovered that there is no evidence in web archives to prove that some of his tweets ever had a label assigned to them. We also studied the possibility of temporal violations in archived versions of the new UI, which may result in the replay of pages that never existed on the live web. We also discovered that when some tweets with embedded media are replayed, portions of the rewritten t.co URL, which is meant to be hidden from the end-user, is partially exposed in the replayed page. Our goal is to educate researchers who may use web archives and caution them when drawing conclusions based on archived Twitter pages.
This content shows how to get Twitter geo-located data using QGIS (1. Installation of QGIS and Plugin 2. Twitter API application, and 3. Example of getting data from Twitter API).
Leveling Up Your Social Analytics Program: Resourceswordsbywallace
Accompanying resources for "Leveling Up Your Social Analytics Program," presented by Michelle Wallace at Tableau Conference 2016. More info: http://tc16.tableau.com/learn/sessions/3479
The Next Big Thing is Web 3.0. Catch It If You Can Judy O'Connell
The best minds on our planet are suggesting that the Internet will continue to be arguably the most influential invention of our time. We are in the midst of a highly dynamic and dramatically changing landscape. Where Web 1.0 made us consumers of information, Web 2.0 allowed us to be participators and creators. Web 3.0 and the Semantic Web technologies are beginning to play a larger and more significant role in the search and filtering of the content fire hose that teachers and students encounter each day. How will the semantic web influence our learning and teaching encounters on the web? What is the connection between meaning and data? Will search or discovery be the main driving force in the 3.0 information revolution? How will information and knowledge creation in a semantic-powered online world develop? This session will draw on Semantic Web research and developments and show how connecting, collaborating and networking in a Web 3.0 world is changing the ground-rules once again.
WebRTC From Asterisk to Headline - MoNageChad Hart
The realtime communications VoIP technology known as WebRTC is only 5 years old, but has accomplished great things already. With hundreds of millions of active users and an explosion of new use cases, WebRTC is in a good place. However, it does still face a few challenges as it expands like Apple support. This talk from MoNage in Boston gives some background on WebRTC, highlights major users, emerging use cases and challenges.
Uncertainty in replaying archived Twitter pagesMichael Nelson
Michael L. Nelson
@phonedude_mln
with: Sawood Alam, Kritika Garg, Himarsha Jayanetti,
Shawn M. Jones, Nauman Siddique, Michele C. Weigle
@WebSciDL
Ethics and Archiving the Web: How to ethically collect and use web archives
2021-03-30
Tired of “just use JWT!” tutorials? Learn how you could move your existing legacy authn/authz to a centralised service working together with your ingress gateway. Convert basic, bearer or other authentication mechanisms into a common format, even handling multiple auth types for all your endpoints.
Tired of “just use JWT!” tutorials? Learn how you could move your existing legacy authn/authz to a centralised service working together with your ingress gateway. Convert basic, bearer or other authentication mechanisms into a common format, even handling multiple auth types for all your endpoints.
Leveling Up Your Social Analytics Program: Resourceswordsbywallace
Accompanying resources for "Leveling Up Your Social Analytics Program," presented by Michelle Wallace at Tableau Conference 2016. More info: http://tc16.tableau.com/learn/sessions/3479
The Next Big Thing is Web 3.0. Catch It If You Can Judy O'Connell
The best minds on our planet are suggesting that the Internet will continue to be arguably the most influential invention of our time. We are in the midst of a highly dynamic and dramatically changing landscape. Where Web 1.0 made us consumers of information, Web 2.0 allowed us to be participators and creators. Web 3.0 and the Semantic Web technologies are beginning to play a larger and more significant role in the search and filtering of the content fire hose that teachers and students encounter each day. How will the semantic web influence our learning and teaching encounters on the web? What is the connection between meaning and data? Will search or discovery be the main driving force in the 3.0 information revolution? How will information and knowledge creation in a semantic-powered online world develop? This session will draw on Semantic Web research and developments and show how connecting, collaborating and networking in a Web 3.0 world is changing the ground-rules once again.
WebRTC From Asterisk to Headline - MoNageChad Hart
The realtime communications VoIP technology known as WebRTC is only 5 years old, but has accomplished great things already. With hundreds of millions of active users and an explosion of new use cases, WebRTC is in a good place. However, it does still face a few challenges as it expands like Apple support. This talk from MoNage in Boston gives some background on WebRTC, highlights major users, emerging use cases and challenges.
Uncertainty in replaying archived Twitter pagesMichael Nelson
Michael L. Nelson
@phonedude_mln
with: Sawood Alam, Kritika Garg, Himarsha Jayanetti,
Shawn M. Jones, Nauman Siddique, Michele C. Weigle
@WebSciDL
Ethics and Archiving the Web: How to ethically collect and use web archives
2021-03-30
Tired of “just use JWT!” tutorials? Learn how you could move your existing legacy authn/authz to a centralised service working together with your ingress gateway. Convert basic, bearer or other authentication mechanisms into a common format, even handling multiple auth types for all your endpoints.
Tired of “just use JWT!” tutorials? Learn how you could move your existing legacy authn/authz to a centralised service working together with your ingress gateway. Convert basic, bearer or other authentication mechanisms into a common format, even handling multiple auth types for all your endpoints.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Web Archives for Verifying Attribution in Twitter Screenshots
1. Modeling Simulation & Visualization Student
Capstone Conference 2024
Web Archives for Verifying
Attribution in Twitter Screenshots
Track: AI and Autonomous Systems
Authors: Tarannum Zaki, Michael L. Nelson, and Michele C. Weigle
Presented by Tarannum Zaki
Department of Computer Science
Old Dominion University, Norfolk, Virginia
April 11, 2024
2. Screenshots are commonly used to annotate the social media of others
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL 2
https://twitter.com/BetteMidler/status/1541472225341198338
https://twitter.com/MahyarTousi/status/1534307163073658881 https://twitter.com/urbanachievr/status/1505944201208516612
3. Why screenshots?
To use as an evidence for deleted posts
3
https://web.archive.org/web/20220525125749/https://twitter.com/DanielDefense/status/1526237750277681154
Controversial posts
may be deleted.
https://twitter.com/ashtonpittman/status/1530243294868930560
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
https://twitter.com/DanielDefense/status/1526237750277681154
Other reasons: To deny cross-platform engagement, to aggregate, to mark-up etc.
4. Did they really post that?
Screenshots can also be used for humor, satire, and disinformation
4
https://twitter.com/Shayan86/status/1515753937139388418
https://twitter.com/paulthacker11/status/1495436489492090881
https://twitter.com/elonmusk/status/1544051155562598401
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
5. Creating fake tweets using Tweetgen
5
https://www.tweetgen.com/
https://www.tweetgen.com/create/tweet.html
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
6. Motivation
➢ Fake tweets can be responsible for misinformation/disinformation spread.
➢ Fake tweets are easy to create using online tools.
➢ There are no tools currently available to evaluate the authenticity of
attribution of screenshots.
6
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
7. Aim
To develop a tool that would automatically provide a probability
whether a screenshot of a social media post is fake using the
services of web archives.
7
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
8. To search for a tweet in the Wayback Machine, you must first
know its URL
8
https://web.archive.org/web/20220323185843/https://twitter.com/annaturley/status/1506706947239817224
URL of the tweet:
https://twitter.com/annaturley/status/1506706947239817224
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
https://web.archive.org/
9. But, URL of a tweet is not present in most screenshots
9
https://twitter.com/AaronBastani/status/1507391218854117377
@annaturley
March 23, 2022
March 25, 2022
https://twitter.com/TWITTER_HANDLE/status/TWEET_ID
https://web.archive.org/web/20220323185843/https://twitter.com/annaturley/status/1506706947239817224
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
Tweet ID encodes the timestamp of when
the tweet was created
Construction of a tweet URL
- Use the Twitter handle and approximate a time window based
on the timestamp.
- Construct URL for the tweet.
- Search for the tweet in the Wayback Machine using the URL.
10. Process to verify whether content of a screenshot exists in the
Wayback Machine
10
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
11. Creating a dataset of screenshots collected from Twitter
11
Fields
Shared post’s URL Original post’s URL
Category Reason
Content category Structural features
Post type Social media
Search strategy Annotated images
Screenshot Remarks
- Screenshot images shared on Twitter.
- 200 examples
- Examples include both real and fake screenshots
https://ws-dl.blogspot.com/2022/12/2022-12-12-disinformation-spread-on.html
https://twitter.com/rvawonk/status/1503227687917305863
https://twitter.com/RealCandaceO/status/1501576
352587292673
Category: Real
Reason: Found in the live web
Content category: Politics
Post Type: Tweet
Structural features: Single author, single
post
Search strategy: Searched on Twitter
interface
Social media: Twitter
Original post’s URL
Shared post’s URL
Screenshot
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
12. OCRing screenshots: Single tweet images
12
OCR
Optical Character Recognition extracts information as text from digital image.
Example screenshot image OCR extracted output
Twitter Handle
Timestamp
Tweet Text
Zaki, T., Nelson, M.L., and Weigle, M.C. (2023, Jun 14). Extracting Information from Twitter Screenshots. Tech Report arXiv:2306.08236. https://doi.org/10.48550/arXiv.2306.08236
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
13. Computing a time window based on the screenshot timestamp
13
The maximum difference between two time zones on Earth is 26 hours.
Example screenshot image OCR extracted output
Twitter handle and computed timestamps
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
14. Using CDX API to retrieve archived tweets within the time window
14
request = "http://web.archive.org/cdx/search/cdx?url=" + urir + params
urir = "https://twitter.com/"+randyhillier+"/status"
params =
"&matchType=prefix&from="+20220218154100+"&to="+20220220174100
CDX API prefix search process
Twitter handle and computed timestamps
Output: Retrieved archived tweets within the timeframe (cropped).
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
https://archive.org/help/wayback_api.php
15. Extracting tweet IDs and determining tweet creation
timestamp using TweetedAt
15
https://web.archive.org/web/20220218163926/https://twitter.com/randyhillier/status/1006984708109099008
https://ws-dl.blogspot.com/2019/08/2019-08-03-tweetedat-finding-tweet.html
Each tweet ID encodes its
creation timestamp
An archived tweet’s URL
https://oduwsdl.github.io/tweetedat/#1006984708109099008
Tweet ID Tweet Creation Date
1006984708109099008 20180613194037
………… …………..
Mapping between all the tweet IDs and
tweet creation timestamps
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
16. Determining the final set of archived tweets by filtering the
tweet creation timestamps within the time window
16
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
https://web.archive.org/web/20220218163926/https://twitter.com/randyhillier/status/1006984708109099008
An archived tweet’s URL
Timestamp when the tweet was archived
Tweet ID encoding the tweet creation timestamp:
20180613194037
The archived timestamp of the tweet falls within the timeframe, but the tweet creation
timestamp does not fall within the timeframe.
So, such archived tweets can be filtered out.
17. Extracting tweet text from archived tweets using
BeautifulSoup and Selenium
17
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
https://web.archive.org/web/20220220024223/https://twitter.com/randyhillier/status/1495226962058649603
TweetTextSize TweetTextSize--jumbo js-tweet-text tweet-text
An archived tweet’s URL
Extracted text from archived tweet
HTML tag containing
the tweet text
https://www.selenium.dev/
https://pypi.org/project/beautifulsoup4/
Selenium automates web scraping and BeautifulSoup parses text from HTML.
18. Computing text similarity score between tweet text from
screenshot and archived tweets using Python’s difflib library
18
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
https://docs.python.org/3/library/difflib.html
Example screenshot image Extracted text from archived tweet Extracted tweet text from screenshot
match_score(Archived_Tweet_Text, Screenshot_Tweet_Text)= 81.40%
Text similarity score is computed based on longest common subsequence
Archived_Tweet_Text1 Screenshot_Tweet_Text match _score = 81.40%
Archived_Tweet_Text2 Screenshot_Tweet_Text match_score = 30.78%
Archived_Tweet_Text3 Screenshot_Tweet_Text match_score = 5.67%
……………..
A match score of 81.40% helps us to prove the existence of the screenshot tweet posted by the alleged
author.
19. A threshold of 60% produced the highest F1 (0.69)
19
Threshold Value Precision Recall F1 Score
90% 1.00 0.42 0.59
80% 1.00 0.49 0.66
70% 1.00 0.51 0.67
60% 1.00 0.53 0.69
Experimented on 108 single tweet images from the collected dataset.
Performance of the overlap between the tweet text from the
screenshot and the archived tweets.
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL
20. Summary
20
➢ Screenshots are an easy way to share content on social media.
➢ Since screenshots can be easily faked, it is a critical task to detect a fabricated post.
➢ Services of web archives could be useful to verify attribution of a screenshot by finding
an archived version of the screenshot content.
➢ Our research will mitigate misinformation and disinformation spread on social media.
Tarannum Zaki MSVSCC 2024 Web Archives for Verifying Attribution in Twitter Screenshots @tarannum_zaki @WebSciDL