SlideShare a Scribd company logo
1 of 15
Download to read offline
Machine Classification and Analysis of Suicide-
Related Communication on Twitter
Presentation @ ACM Hypertext 2015
Pete Burnap, Gualtiero (Walter) Colombo & Jonathan
Scourfield
Social Data Science Lab
School of Computer Science and Informatics & School of Social
Sciences
Cardiff University
@pbFeed @socdatalab
Social Data Science Lab - @socdatalab
•  Formed in 2015 out of the Collaborative Online Social Media
Observatory (COSMOS) programme of work (cosmosproject.net)
•  Mission is to continue the work of COSMOS in democratising access
to big social data (e.g. Twitter, Foursquare, Instagram) amongst the
academic, private, public and third sectors.
•  A significant proportion of research funds have been awarded to
collect and analyse social media data in the contexts of Societal
Safety and Security e.g. social tension, hate speech, crime
reporting and fear of crime, suicidal ideation
•  Working with Metropolitan Police, Department of Health,
Food Standards Agency
The Problem
•  Our previous research has studied online social
networks as “social machines” that enable spread of
malicious or potentially dangerous information (e.g.
rumour, hate speech, malware)
•  Concern about suicide and Internet has moved from
dedicated suicide websites to general social media
platforms
•  Previous research has shown spikes in recorded suicide
rates due to increased risk factors (e.g. celebrity suicide)
The Problem
•  Normalisation of suicidal language (Daine et al., 2013)
•  To date research has tended to rely on human coding of
online content – difficult to scale to ‘volume’, or suicide
notes (different state of mind?)
•  Social media analysis has yet to distinguish between
different types of suicidal communication
Research Aims
•  To explore the potential of natural language processing and
machine learning for automated identification and
differentiation of suicide-related communication in very large
social media data sets
•  This would enable those responsible for supporting safety and
wellbeing (e.g. samaritans) to establish a more realistic idea of
the volume of suicidal information online and possibly identify
emerging ‘clusters’
•  While computation is essential, the work was driven from the
s tart by a strong understanding of suicidal
communication/language with established suicide
researchers
Developing a classifier for suicide-related social media
content
•  Anonymised data from suicide discussion fora
•  Human annotated – ‘is this person suicidal?’
•  Identify (TF.IDF) terms & phrases from ‘suicidal texts’
•  Automated collection of data from Twitter & Tumblr using TF.IDF
terms
•  Human annotated sample (n=2000 1k Twitter + 1k Tumblr) –
coding frame
•  c1: Evidence of possible suicidal intent
•  c2: Campaigning (i.e. petitions etc.)
•  c3: Flippant reference to suicide
•  c4: Information or support
•  c5: Memorial or condolence
•  c6: Reporting news of someone’s suicide (not bombing)
•  c7: None of the above
Features
(Set 1) Lexical characteristics of sentences used, such as the Parts of
Speech (POS), and other language structural features, such as the
the most frequently used words and phrases. References to self
and others are also captured with POS – these terms have been
identified in previous research as being evident within suicidal
communication
(Set 2) Sentiment, affective and emotional features and levels of the
terms used within the text. Emotions such as fear, anger and
general aggressiveness are particularly prominent in suicidal
communication (WordNet Affect)
(Set 3) Language expressed in short, informal text such as social media
posts within a limited number of characters. These were
extracted from annotated Tumblr posts
Machine Classification
•  Key question here is: what are the features of suicidal
ideation, and what are the features of the other classes?
•  Accuracy important but explanatory value also crucial
•  Methods used for the classifier
• Probabilistic (Naïve Bayes), non-probabilistic linear (linear
SVM) and rule-based (Decision Tree) machine classifier
• Principal Components Analysis (1444 to 255 features)
• Improvement with ‘ensemble’ classifier designed to
incorporate diverse principal components (Rotation Forest]
Results (all)
Results (suicidal ideation)
Classifier accuracy
PCA
P 0.321 0.345 0.762 0.507
(combined)
R 0.641 0.385 0.205 0.436
F 0.427 0.364 0.323 0.469
Table 3: Confusion matrix for the best performing
classification model
classi.
c1 c2 c3 c4 c5 c6 c7
as
c1 57 0 16 0 0 0 5
c2 0 19 2 4 0 3 0
c3 13 1 142 0 0 5 16
c4 0 4 5 20 0 3 3
c5 1 1 1 0 31 1 1
c6 0 6 7 6 2 80 3
c7 18 0 20 1 2 4 98
6. DISCUSSION
In this section we analyse the main feature components pro-
duced by running the PCA procedure on the combined set
that resulted in the best set of results, as shown in Tables 1
Exam
regex
ing’ .
ideati
Other
tainin
when
that
verbs
words
and ‘
pear
a↵ect
c2: F
we ca
regula
minol
cific t
to thi
c3: A
conce
prese
F-measure: c1 = 0.690, all classes: 0.728
Predictive Features
d to suicide
information
enting sources
ws (research
of the name
lated to the
d of the ‘TV’
memorial, in-
are the com-
in the tweets
tive features
ot related to
such as gen-
hat’s wrong
tes (such as
es that could
but are also
Table 5: Principal components per class
c1 - Evidence of possible suicidal intent
0.185word list1 end it all 521+0.185end it all+0.179it all now
+0.179all now+0.175it all
0.149word list1 want to be dead 554-0.133 -0.129i think
+0.125word list1 to commit suicide 547+0.114really
0.149word list1 want to be dead 554+0.145wn a↵ect11 alarm
496-0.123number of adverb superlative 211-0.121word list7
relationship 780+0.118regEx class6 +.+report.+ 701
0.153thinking about killing+0.153about killing myself
+0.153about killing+0.147so im+0.147wn a↵ect11 misery 314
0.119number of predeterminers 206+0.117regEx class1 +.+
((cutting|depres|sui)|these|bad|sad).+(thoughts|feel)
.+ 667+0.115wn domain astrology 160-0.106bombing
0.231regEx class1 +.+(bdie).+(bmy).+bsleep.+0.177word
list want to be dead 554-0.155wn domain dentistry 113
-0.146wn a↵ect11 security 277-0.129wn a↵ect11 admiration
c2 - Campaigning (i.e. petitions etc.)
0.25 word list2 support 746-0.134wn domain racing 84
Explanatory features
•  Word-lists and regular expressions (regex) extracted from online
suicide-related discussion forums and other microblogging Web
sites provide ‘clues’ effective for the suicidal ideation class
•  Lexical and grammar features such as POSs appear mostly
ineffective
•  ‘Affective’ language very relevant (such as those represented by the
WordNet library of ‘cognitive synonyms’) and able to well represent
the affective and emotional states associated to this particular type
of language.
•  Sentiment Scores generated by software tools for sentiment
analysis appear also ineffective and either scarcely or not at all
included within the principal components predictive of each
class
Networks of Suicidal Ideation
“…shortest path of retweets of suicidal ideation
was higher than previous studies that reported
on general retweet path length. Our results
found an average of 5, while other research
reported metrics between 2 and 4.8.”
Colombo, G., Burnap, P., Hodorog, A. and Scourfield, J. (2015) ‘Analysing the connectivity and
communication of suicidal users on Twitter’, Computer Communications - available open
access http://tinyurl.com/suicidenetworks
Thanks
Questions?
@pbFeed

More Related Content

What's hot

Who to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsWho to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsNicola Barbieri
 
Microposts2015 - Social Spam Detection on Twitter
Microposts2015 - Social Spam Detection on TwitterMicroposts2015 - Social Spam Detection on Twitter
Microposts2015 - Social Spam Detection on Twitterazubiaga
 
Conversation Practices and Network Structure in Twitter
Conversation Practices and Network Structure in TwitterConversation Practices and Network Structure in Twitter
Conversation Practices and Network Structure in TwitterLuca Rossi
 
Epidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on TwitterEpidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on TwitterParang Saraf
 
Slides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on TwitterSlides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on TwitterParang Saraf
 

What's hot (6)

Who to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsWho to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanations
 
Microposts2015 - Social Spam Detection on Twitter
Microposts2015 - Social Spam Detection on TwitterMicroposts2015 - Social Spam Detection on Twitter
Microposts2015 - Social Spam Detection on Twitter
 
Conversation Practices and Network Structure in Twitter
Conversation Practices and Network Structure in TwitterConversation Practices and Network Structure in Twitter
Conversation Practices and Network Structure in Twitter
 
Epidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on TwitterEpidemiological Modeling of News and Rumors on Twitter
Epidemiological Modeling of News and Rumors on Twitter
 
Social media analysis project
Social media analysis projectSocial media analysis project
Social media analysis project
 
Slides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on TwitterSlides: Epidemiological Modeling of News and Rumors on Twitter
Slides: Epidemiological Modeling of News and Rumors on Twitter
 

Similar to Machine Classification and Analysis of Suicide-Related Communication on Twitter

Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and SharingData-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and SharingAlex Pinto
 
Language of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 AnalysisLanguage of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 AnalysisYelena Mejova
 
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...Alexandre Sieira
 
Social Media Analytics
Social Media AnalyticsSocial Media Analytics
Social Media AnalyticsMuhammad Rifqi
 
Misinfosec frameworks Cansecwest 2019
Misinfosec frameworks Cansecwest 2019Misinfosec frameworks Cansecwest 2019
Misinfosec frameworks Cansecwest 2019bodaceacat
 
CansecWest2019: Infosec Frameworks for Misinformation
CansecWest2019: Infosec Frameworks for MisinformationCansecWest2019: Infosec Frameworks for Misinformation
CansecWest2019: Infosec Frameworks for Misinformationbodaceacat
 
Terp breuer misinfosecframeworks_cansecwest2019
Terp breuer misinfosecframeworks_cansecwest2019Terp breuer misinfosecframeworks_cansecwest2019
Terp breuer misinfosecframeworks_cansecwest2019bodaceacat
 
Future of AI-powered automation in business
Future of AI-powered automation in businessFuture of AI-powered automation in business
Future of AI-powered automation in businessLouis Dorard
 
Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...Nikolaos Aletras
 
Sharing is Caring: Medindo a Eficácia de Comunidades de Compartilhamento de T...
Sharing is Caring: Medindo a Eficácia de Comunidades de Compartilhamento de T...Sharing is Caring: Medindo a Eficácia de Comunidades de Compartilhamento de T...
Sharing is Caring: Medindo a Eficácia de Comunidades de Compartilhamento de T...Alexandre Sieira
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language ModelsLeon Dohmen
 
1. Choose a Case and Complete the Project PlanHospital to Research.docx
1. Choose a Case and Complete the Project PlanHospital to Research.docx1. Choose a Case and Complete the Project PlanHospital to Research.docx
1. Choose a Case and Complete the Project PlanHospital to Research.docxjeremylockett77
 
Matthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptxMatthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptxreenarocky
 
CDTW Capstone Presentation
CDTW Capstone Presentation CDTW Capstone Presentation
CDTW Capstone Presentation Todd Rutherford
 
Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Todd Rutherford
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
bsides NOVA 2017 So You Want to Be a Cyber Threat Analyst eh?
bsides NOVA 2017 So You Want to Be a Cyber Threat Analyst eh?bsides NOVA 2017 So You Want to Be a Cyber Threat Analyst eh?
bsides NOVA 2017 So You Want to Be a Cyber Threat Analyst eh?Anthony Melfi
 
Tim Estes - Generating dynamic social networks from large scale unstructured ...
Tim Estes - Generating dynamic social networks from large scale unstructured ...Tim Estes - Generating dynamic social networks from large scale unstructured ...
Tim Estes - Generating dynamic social networks from large scale unstructured ...Digital Reasoning
 

Similar to Machine Classification and Analysis of Suicide-Related Communication on Twitter (20)

Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and SharingData-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
 
Language of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 AnalysisLanguage of Politics on Twitter - 03 Analysis
Language of Politics on Twitter - 03 Analysis
 
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
 
Social Media Analytics
Social Media AnalyticsSocial Media Analytics
Social Media Analytics
 
Misinfosec frameworks Cansecwest 2019
Misinfosec frameworks Cansecwest 2019Misinfosec frameworks Cansecwest 2019
Misinfosec frameworks Cansecwest 2019
 
CansecWest2019: Infosec Frameworks for Misinformation
CansecWest2019: Infosec Frameworks for MisinformationCansecWest2019: Infosec Frameworks for Misinformation
CansecWest2019: Infosec Frameworks for Misinformation
 
Terp breuer misinfosecframeworks_cansecwest2019
Terp breuer misinfosecframeworks_cansecwest2019Terp breuer misinfosecframeworks_cansecwest2019
Terp breuer misinfosecframeworks_cansecwest2019
 
Future of AI-powered automation in business
Future of AI-powered automation in businessFuture of AI-powered automation in business
Future of AI-powered automation in business
 
Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...Predicting user demographics in social networks - Invited Talk at University ...
Predicting user demographics in social networks - Invited Talk at University ...
 
Sharing is Caring: Medindo a Eficácia de Comunidades de Compartilhamento de T...
Sharing is Caring: Medindo a Eficácia de Comunidades de Compartilhamento de T...Sharing is Caring: Medindo a Eficácia de Comunidades de Compartilhamento de T...
Sharing is Caring: Medindo a Eficácia de Comunidades de Compartilhamento de T...
 
And then there were ... Large Language Models
And then there were ... Large Language ModelsAnd then there were ... Large Language Models
And then there were ... Large Language Models
 
1. Choose a Case and Complete the Project PlanHospital to Research.docx
1. Choose a Case and Complete the Project PlanHospital to Research.docx1. Choose a Case and Complete the Project PlanHospital to Research.docx
1. Choose a Case and Complete the Project PlanHospital to Research.docx
 
Matthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptxMatthew_Davis_Slides.pptx
Matthew_Davis_Slides.pptx
 
CDTW Capstone Presentation
CDTW Capstone Presentation CDTW Capstone Presentation
CDTW Capstone Presentation
 
Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Team CDTW Capstone Presentation
Team CDTW Capstone Presentation
 
Cyber Portents and Precursors
Cyber Portents and PrecursorsCyber Portents and Precursors
Cyber Portents and Precursors
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
bsides NOVA 2017 So You Want to Be a Cyber Threat Analyst eh?
bsides NOVA 2017 So You Want to Be a Cyber Threat Analyst eh?bsides NOVA 2017 So You Want to Be a Cyber Threat Analyst eh?
bsides NOVA 2017 So You Want to Be a Cyber Threat Analyst eh?
 
Tim Estes - Generating dynamic social networks from large scale unstructured ...
Tim Estes - Generating dynamic social networks from large scale unstructured ...Tim Estes - Generating dynamic social networks from large scale unstructured ...
Tim Estes - Generating dynamic social networks from large scale unstructured ...
 

Recently uploaded

定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一3sw2qly1
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts servicevipmodelshub1
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Roomdivyansh0kumar0
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Personfurqan222004
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Lucknow
 

Recently uploaded (20)

定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
定制(CC毕业证书)美国美国社区大学毕业证成绩单原版一比一
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Alwarpet Phone 🍆 8250192130 👅 celebrity escorts service
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
 
Complet Documnetation for Smart Assistant Application for Disabled Person
Complet Documnetation   for Smart Assistant Application for Disabled PersonComplet Documnetation   for Smart Assistant Application for Disabled Person
Complet Documnetation for Smart Assistant Application for Disabled Person
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
 

Machine Classification and Analysis of Suicide-Related Communication on Twitter

  • 1. Machine Classification and Analysis of Suicide- Related Communication on Twitter Presentation @ ACM Hypertext 2015 Pete Burnap, Gualtiero (Walter) Colombo & Jonathan Scourfield Social Data Science Lab School of Computer Science and Informatics & School of Social Sciences Cardiff University @pbFeed @socdatalab
  • 2. Social Data Science Lab - @socdatalab •  Formed in 2015 out of the Collaborative Online Social Media Observatory (COSMOS) programme of work (cosmosproject.net) •  Mission is to continue the work of COSMOS in democratising access to big social data (e.g. Twitter, Foursquare, Instagram) amongst the academic, private, public and third sectors. •  A significant proportion of research funds have been awarded to collect and analyse social media data in the contexts of Societal Safety and Security e.g. social tension, hate speech, crime reporting and fear of crime, suicidal ideation •  Working with Metropolitan Police, Department of Health, Food Standards Agency
  • 3. The Problem •  Our previous research has studied online social networks as “social machines” that enable spread of malicious or potentially dangerous information (e.g. rumour, hate speech, malware) •  Concern about suicide and Internet has moved from dedicated suicide websites to general social media platforms •  Previous research has shown spikes in recorded suicide rates due to increased risk factors (e.g. celebrity suicide)
  • 4. The Problem •  Normalisation of suicidal language (Daine et al., 2013) •  To date research has tended to rely on human coding of online content – difficult to scale to ‘volume’, or suicide notes (different state of mind?) •  Social media analysis has yet to distinguish between different types of suicidal communication
  • 5. Research Aims •  To explore the potential of natural language processing and machine learning for automated identification and differentiation of suicide-related communication in very large social media data sets •  This would enable those responsible for supporting safety and wellbeing (e.g. samaritans) to establish a more realistic idea of the volume of suicidal information online and possibly identify emerging ‘clusters’ •  While computation is essential, the work was driven from the s tart by a strong understanding of suicidal communication/language with established suicide researchers
  • 6. Developing a classifier for suicide-related social media content •  Anonymised data from suicide discussion fora •  Human annotated – ‘is this person suicidal?’ •  Identify (TF.IDF) terms & phrases from ‘suicidal texts’ •  Automated collection of data from Twitter & Tumblr using TF.IDF terms •  Human annotated sample (n=2000 1k Twitter + 1k Tumblr) – coding frame •  c1: Evidence of possible suicidal intent •  c2: Campaigning (i.e. petitions etc.) •  c3: Flippant reference to suicide •  c4: Information or support •  c5: Memorial or condolence •  c6: Reporting news of someone’s suicide (not bombing) •  c7: None of the above
  • 7. Features (Set 1) Lexical characteristics of sentences used, such as the Parts of Speech (POS), and other language structural features, such as the the most frequently used words and phrases. References to self and others are also captured with POS – these terms have been identified in previous research as being evident within suicidal communication (Set 2) Sentiment, affective and emotional features and levels of the terms used within the text. Emotions such as fear, anger and general aggressiveness are particularly prominent in suicidal communication (WordNet Affect) (Set 3) Language expressed in short, informal text such as social media posts within a limited number of characters. These were extracted from annotated Tumblr posts
  • 8. Machine Classification •  Key question here is: what are the features of suicidal ideation, and what are the features of the other classes? •  Accuracy important but explanatory value also crucial •  Methods used for the classifier • Probabilistic (Naïve Bayes), non-probabilistic linear (linear SVM) and rule-based (Decision Tree) machine classifier • Principal Components Analysis (1444 to 255 features) • Improvement with ‘ensemble’ classifier designed to incorporate diverse principal components (Rotation Forest]
  • 11. Classifier accuracy PCA P 0.321 0.345 0.762 0.507 (combined) R 0.641 0.385 0.205 0.436 F 0.427 0.364 0.323 0.469 Table 3: Confusion matrix for the best performing classification model classi. c1 c2 c3 c4 c5 c6 c7 as c1 57 0 16 0 0 0 5 c2 0 19 2 4 0 3 0 c3 13 1 142 0 0 5 16 c4 0 4 5 20 0 3 3 c5 1 1 1 0 31 1 1 c6 0 6 7 6 2 80 3 c7 18 0 20 1 2 4 98 6. DISCUSSION In this section we analyse the main feature components pro- duced by running the PCA procedure on the combined set that resulted in the best set of results, as shown in Tables 1 Exam regex ing’ . ideati Other tainin when that verbs words and ‘ pear a↵ect c2: F we ca regula minol cific t to thi c3: A conce prese F-measure: c1 = 0.690, all classes: 0.728
  • 12. Predictive Features d to suicide information enting sources ws (research of the name lated to the d of the ‘TV’ memorial, in- are the com- in the tweets tive features ot related to such as gen- hat’s wrong tes (such as es that could but are also Table 5: Principal components per class c1 - Evidence of possible suicidal intent 0.185word list1 end it all 521+0.185end it all+0.179it all now +0.179all now+0.175it all 0.149word list1 want to be dead 554-0.133 -0.129i think +0.125word list1 to commit suicide 547+0.114really 0.149word list1 want to be dead 554+0.145wn a↵ect11 alarm 496-0.123number of adverb superlative 211-0.121word list7 relationship 780+0.118regEx class6 +.+report.+ 701 0.153thinking about killing+0.153about killing myself +0.153about killing+0.147so im+0.147wn a↵ect11 misery 314 0.119number of predeterminers 206+0.117regEx class1 +.+ ((cutting|depres|sui)|these|bad|sad).+(thoughts|feel) .+ 667+0.115wn domain astrology 160-0.106bombing 0.231regEx class1 +.+(bdie).+(bmy).+bsleep.+0.177word list want to be dead 554-0.155wn domain dentistry 113 -0.146wn a↵ect11 security 277-0.129wn a↵ect11 admiration c2 - Campaigning (i.e. petitions etc.) 0.25 word list2 support 746-0.134wn domain racing 84
  • 13. Explanatory features •  Word-lists and regular expressions (regex) extracted from online suicide-related discussion forums and other microblogging Web sites provide ‘clues’ effective for the suicidal ideation class •  Lexical and grammar features such as POSs appear mostly ineffective •  ‘Affective’ language very relevant (such as those represented by the WordNet library of ‘cognitive synonyms’) and able to well represent the affective and emotional states associated to this particular type of language. •  Sentiment Scores generated by software tools for sentiment analysis appear also ineffective and either scarcely or not at all included within the principal components predictive of each class
  • 14. Networks of Suicidal Ideation “…shortest path of retweets of suicidal ideation was higher than previous studies that reported on general retweet path length. Our results found an average of 5, while other research reported metrics between 2 and 4.8.” Colombo, G., Burnap, P., Hodorog, A. and Scourfield, J. (2015) ‘Analysing the connectivity and communication of suicidal users on Twitter’, Computer Communications - available open access http://tinyurl.com/suicidenetworks