SlideShare a Scribd company logo
1 of 73
Download to read offline
DART 2014 
8th Internation Workshop on 
Information Filtering and Retrieval 
Pisa (Italy) 
December 10, 2014 
A comparison of lexicon-based 
approaches for Sentiment Analysis 
of microblog posts 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
(Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group)
Outline 
• Background 
• Sentiment Analysis 
• Lexicon-based approaches 
• Methodology 
• State-of-the-art 
lexicons 
• Experiments 
• Conclusions 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 2 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Background 
One minute on the Web 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 3 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Background 
One minute on the Web 
4 
Information 
Overload 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
5 
Background 
Information Overload 
Obstacleor Opportunity? 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
6 
Opportunities 
(Social) Content Analytics 
Insight: to aggregate rough human-generated data to get 
valuable people-based findings 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
- Real-time polls 
7 
Social Content Analytics 
Applications 
- Social CRM 
- Online brand 
monitoring 
All these applications share a common denominator 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
- Real-time polls 
They all need a methodology to automatically associate 
an opinion and/or a polarity to each piece of content 
8 
Social Content Analytics 
Applications 
- Social CRM 
- Online brand 
monitoring 
All these applications share a common denominator 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
- Real-time polls 
9 
Social Content Analytics 
Applications 
- Social CRM 
Solution: 
- Online brand 
monitoring 
Sentiment Analysis 
All these applications share a common denominator 
They all need a methodology to automatically associate 
an opinion and/or a polarity to each piece of content 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
10 
Sentiment Analysis 
Definition 
“It is the field of study that 
analyzes people’s 
opinions, sentiments, 
evaluations, appraisals, 
attitudes, and emotions 
towards entities such as 
products, services, 
organizations, individuals, 
issues, events, topics, and 
their attributes “ (*) 
(Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008) 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
11 
Sentiment Analysis 
Definition 
“It is the field of study that 
analyzes people’s 
opinions, sentiments, 
evaluations, appraisals, 
attitudes, and emotions 
towards entities such as 
products, services, 
organizations, individuals, 
issues, events, topics, and 
their attributes “ (*) 
(Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008) 
We will focus on the polarity detection task 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
12 
Sentiment Analysis 
State of the art 
Supervised 
Approaches 
(Machine Learning-based) 
Unsupervised 
Approaches 
(Lexicon-based) 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Man ? 
13 
Sentiment Analysis 
Supervised approaches 
Dog 
Learn a classification model 
relying on labeled examples 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
frustration - - 
joy +++ 
14 
Sentiment Analysis 
Unsupervised approaches 
Rely on external lexical resources 
that associate a polarity score to each term. 
Sentiment of the content depends on 
the sentiment of the terms which compose it. 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
15 
Sentiment Analysis 
Supervised vs Unsupervised 
Pros Cons 
Nakov, Preslav, et al. "Semeval-2013 task 2: Sentiment analysis in Twitter.” 
Proceedings of SemEval 2013 
Rosenthal, Sara, et al. "Semeval-2014 task 9: Sentiment analysis in Twitter." 
Proceedings of SemEval 2014. 
(*) 
(**) 
Supervised 
Higher Accuracy 
(*) (**) 
Pre-labeled 
examples 
Unsupervised No Training 
Accuracy depends on lexical 
resources 
Several lexical resources available 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Pros Cons 
Supervised 
Higher Accuracy 
(*) (**) 
Pre-labeled 
examples 
Unsupervised No Training 
Accuracy depends on lexical 
resources 
Several lexical resources available 
We focus on 
lexicon-based approaches 
16 
Sentiment Analysis 
Supervised vs Unsupervised 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
17 
Contributions 
We propose a novel 
unsupervised lexicon-based 
approach for 
sentiment analysis 
We provide a 
comparison of 
lexical resources for 
sentiment analysis of 
microblog posts 
1. 
2. 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
18 
Methodology 
Lexicon-based approach 
Insight: 
The polarity of a textual content (e.g. a 
microblog posts) depends on the polarity 
of the microphrases which compose it. 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
19 
Methodology 
Lexicon-based approach 
Insight: 
The polarity of a textual content (e.g. a 
microblog posts) depends on the polarity 
of the microphrases which compose it. 
A microphrase is built 
whenever a splitting cue 
is found in the text 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Conjunctions, adverbs and 
punctuations are used as 
20 
Methodology 
Lexicon-based approach 
Insight: 
The polarity of a textual content (e.g. a 
microblog posts) depends on the polarity 
of the microphrases which compose it. 
A microphrase is built 
whenever a splitting cue 
is found in the text 
splitting cues 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Conjunctions, adverbs and 
punctuations are used as 
21 
Methodology 
Lexicon-based approach 
Insight: 
The polarity of a textual content (e.g. a 
microblog posts) depends on the polarity 
of the microphrases which compose it. 
A microphrase is built 
whenever a splitting cue 
is found in the text 
splitting cues 
example: “I don’t like this food, it’s terrible” 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Conjunctions, adverbs and 
punctuations are used as 
22 
Methodology 
Lexicon-based approach 
Insight: 
The polarity of a textual content (e.g. a 
microblog posts) depends on the polarity 
of the microphrases which compose it. 
A microphrase is built 
whenever a splitting cue 
is found in the text 
splitting cues 
example: “I don’t like this food, it’s terrible” 
{ 
{ 
splitting 
m1 cue 
m2 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
23 
Methodology 
Lexicon-based approach 
Insight: 
The polarity of a textual content (e.g. a 
microblog posts) depends on the polarity 
of the microphrases which compose it. 
k 
pol(T) = Σ pol(mi) 
i=1 
Tweet microphrase 
T={m1…mk} 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
24 
Methodology 
Lexicon-based approach 
Insight: 
The polarity of a microphrase depends on 
the polarity of the terms which compose it. 
k 
pol(T) = Σ pol(mi) 
i=1 
Tweet microphrase 
n 
pol(mi) = Σ score(tj) 
j=1 
term 
T={m1…mk} 
Mi={t1…tn} 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
25 
Methodology 
Four variant proposed 
Basic 
k 
pol(T) = Σ pol(mi) i=1 
n 
pol(mi) = Σ score(tj) 
j=1 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Four variant proposed 
Normalized 
pol(T) = Σ pol(mi) i=1 
pol(mi) = Σ 
score(tj) 
26 
Methodology 
Basic 
k 
pol(T) = Σ pol(mi) i=1 
n 
pol(mi) = Σ score(tj) 
j=1 
n 
|mi| 
j=1 
Score of each microphrase is normalized 
according to its length 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Four variant proposed 
Normalized 
pol(T) = Σ pol(mi) i=1 
pol(mi) = Σ 
score(tj) 
with an higher weight 
categories=adverbs, verbs, adjectives & valence 
27 
Methodology 
Basic 
k 
pol(T) = Σ pol(mi) i=1 
n 
pol(mi) = Σ score(tj) 
j=1 
n 
|mi| 
j=1 
Emphasized 
pol(T) = Σ pol(mi) i=1 
pol(mi) = n 
Σ score(tj) 
j=1 
*w(tj) 
Specific categories are provided 
&& 
valence shifters (intensifiers & downtoners) 
Several weights have been evaluated 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Four variant proposed 
Normalized 
pol(T) = Σ pol(mi) i=1 
pol(mi) = Σ 
score(tj) 
28 
Methodology 
Basic 
k 
pol(T) = Σ pol(mi) i=1 
n 
pol(mi) = Σ score(tj) 
j=1 
n 
|mi| 
j=1 
Emphasized Normalized-Emphasized 
pol(T) = Σ pol(mi) i=1 
pol(mi) = n 
Σ score(tj) 
j=1 
pol(T) = Σ pol(mi) 
pol(mi) = Σscore(tj) 
Combination 
|mi| *w(tj) *w(tj) 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
We have a problem 
Normalized 
pol(T) = Σ pol(mi) i=1 
pol(mi) = Σ 
score(tj) 
29 
Methodology 
Basic 
k 
pol(T) = Σ pol(mi) i=1 
n 
pol(mi) = Σ score(tj) 
j=1 
n 
|mi| 
j=1 
Emphasized Normalized-Emphasized 
pol(T) = Σ pol(mi) i=1 
pol(mi) = n 
Σ score(tj) 
j=1 
pol(T) = Σ pol(mi) 
pol(mi) = Σscore(tj) 
|mi| *w(tj) *w(tj) 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
We have a problem 
Normalized 
pol(T) = Σ pol(mi) i=1 
pol(mi) = Σ 
How to calculate 
score(score(tj) ? 
tj) 
30 
Methodology 
Basic 
k 
pol(T) = Σ pol(mi) i=1 
n 
pol(mi) = Σ score(tj) 
j=1 
n 
|mi| 
j=1 
Emphasized Normalized-Emphasized 
pol(T) = Σ pol(mi) i=1 
pol(mi) = n 
Σ score(tj) 
j=1 
pol(T) = Σ pol(mi) 
pol(mi) = Σscore(tj) 
|mi| *w(tj) *w(tj) 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
31 
Solution 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
32 
Lexical Resources 
State of the art 
We evaluated four state-of-the-art 
resources for sentiment analysis 
SentiWordNet 
http://sentiwordnet.isti.cnr.it 
WordNet Affect 
http://wndomains.fbk.eu/wnaffect.html 
SenticNet 
http://sentic.net 
MPQA 
http://mpqa.cs.pitt.edu 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
33 
Lexical Resources SentiWordNet(*) 
Each WordNet synset is provided with three different 
sentiment scores (positivity, negativity, objectivity) 
(*) Baccianella, Stefano, Andrea Esuli, and Fabrizio 
Sebastiani. "SentiWordNet 3.0: An Enhanced Lexical 
Resource for Sentiment Analysis and Opinion Mining." 
LREC. Vol. 10. 2010. 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
34 
Lexical Resources WordNet Affect(*) 
WordNet extension 
Affective-related synsets 
are mapped with an A-Label 
e.g. euphoria —> positive-emotion 
illness —> physical state 
(*) Strapparava, Carlo, and Alessandro Valitutti. "WordNet 
Affect: an Affective Extension of WordNet." LREC. Vol. 4. 
2004. 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
35 
Lexical Resources SenticNet(*) 
Inspired by the Hourglass of 
Emotions model 
Each term is represented of the 
ground of the intensity of four basic 
emotional dimensions (sensitivity, 
aptitude, attention, pleasantness) 
The activation level of each dimension 
defines 16 basic emotions 
(*) Cambria, Erik, Daniel Olsher, and Dheeraj Rajagopal. 
"SenticNet 3: a common and common-sense knowledge 
base for cognition-driven sentiment analysis." Twenty-eighth 
AAAI conference on artificial intelligence. 2014. 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
36 
Lexical Resources SenticNet(*) 
According to the triggered emotions, each term 
is provided with an aggregated polarity score 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
37 
Lexical Resources SenticNet(*) 
SenticNet models a sentiment score 
for some bigrams and trigrams as well! 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
38 
Lexical Resources MPQA(*) 
(*) Wilson, Theresa, Janyce Wiebe, and Paul Hoffmann. 
"Recognizing contextual polarity in phrase-level 
sentiment analysis." Proceedings of the conference on 
human language technology and empirical methods in 
natural language processing. Association for Computational 
Linguistics, 2005. 
Each term is 
(manually) provided 
with a discrete 
sentiment score 
+1 positive 
0 neutral 
-1 negative 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
39 
Lexical Resources Comparison 
Resource Coverage (terms) 
SentiWordNet 117,659 
WordNet Affect 200 
SenticNet 14,000 
MPQA 8,222 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Cataldo Musto, Giovanni Semeraro, Marco Polignano 40 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
41 
Lexical Resources 
Score calculation 
SentiWordNet 
Given a term, 
score(tj) is the 
mean of the 
sentiment score of 
all the possible 
synsets of tj 
score(good) = 0.75 + 0 + 1 +1 = 
4 
0.687 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Score calculation 
Given a term, score(tj), 
WordNet Affect hierarchy is 
climbed until an A-Label which 
occur in SentiWordNet is found. 
tj inherits the sentiment 
score of the A-Label 
score(good) = score(benevolence) = 
0.339 
42 
Lexical Resources 
WordNet Affect 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
43 
Lexical Resources 
Score calculation 
SenticNet 
Given a term, 
score(tj), SenticNet 
APIs are queried 
and sentiment 
score is extracted 
score(good) = 0.883 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
44 
Lexical Resources 
Score calculation 
MPQA 
Given a term, 
score(tj), MPQA 
Lexicon are 
queried and 
sentiment score is 
extracted 
score(good) = 1 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
45 
Methodology 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experimental Evaluation 
Research Hypothesis 
46 
1. How do the different 
versions of the algorithm 
perform with respect to state-of-the- 
art datasets? 
2. What is the best lexical 
resource to detect the polarity 
of microblog posts? 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experimental Evaluation 
Description of the datasets 
47 
• SemEval-2013 • 14,435 Tweets • 8,180 training • 3,255 test • Positive, Negative, Neutral • STS Dataset • 1,600,000 Tweets • only 359 test • Positive, Negative 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experimental Evaluation 
Statistics about Coverage 
48 
Lexicon SemEval-2013-Test STS-Test 
Vocabulary Size 18,309 6,711 
SentiWordNet 4,314 883 
WordNet-Affect 149 48 
MPQA 897 224 
SenticNet 1,497 326 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 1 
49 
Intra-Lexicons evaluation 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
norm vs norm+emph 
significant (p < 0,0001) 
Basic 
Normalized 
Emphasized 
Norm-Emph 
Experiment 1 
57,67 
58,1 
58,65 
58,99 
45 50 55 60 65 
50 
SemEval :: SentiWordNet 
Emphasis and Normalization improve the accuracy 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Basic 
Normalized 
Emphasized 
Norm-Emph 
Experiment 1 
53,92 
55,05 
53,95 
55,08 
not significant 
45 50 55 60 65 
51 
SemEval :: WordNet Affect 
Emphasis and Normalization improve the accuracy 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Basic 
Normalized 
Emphasized 
Norm-Emph 
Experiment 1 
58,03 
57,97 
58,25 
58,1 
not significant 
45 50 55 60 65 
52 
SemEval :: MPQA 
Emphasis improves the accuracy. Normalization doesn’t. 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Basic 
Normalized 
Emphasized 
Norm-Emph 
Experiment 1 
48,69 
47,25 
48,29 
48,08 
norm vs norm+emph 
significant (p < 0,0001) 
45 50 55 60 65 
53 
SemEval :: SenticNet 
No improvement 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 1 
54 
General Outcomes 
SentiWordNet WordNet Affect MPQA 
Emphasis leads to improvements 
(7 out of 8 comparisons). 
1. 
2. 
SenticNet 
Normalization doesn’t. (1 out of 
4 comparisons) 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Basic 
Normalized 
Emphasized 
Norm-Emph 
Experiment 1 
71,87 
72,42 
71,31 
71,59 
not significant 
gaps 
60 63,75 67,5 71,25 75 
55 
STS :: SentiWordNet 
Normalization improves the accuracy. Emphasis doesn’t 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Basic 
Normalized 
Emphasized 
Norm-Emph 
Experiment 1 
62,95 
62,67 
62,96 
62,95 
60 63,75 67,5 71,25 75 
56 
STS :: WordNet Affect 
not significant 
gaps 
Emphasis improves the accuracy. Normalization doesn’t 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Basic 
Normalized 
Emphasized 
Norm-Emph 
Experiment 1 
69,54 
70,75 
69,92 
70,76 
60 63,75 67,5 71,25 75 
57 
STS :: MPQA 
not significant 
gaps 
Both Emphasis and Normalization improve the accuracy. 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Basic 
Normalized 
Emphasized 
Norm-Emph 
Experiment 1 
74,37 
74,65 
74,65 
73,82 
not significant 
70 71,75 73,5 75,25 77 
58 
STS :: SenticNet 
Normalization improves the accuracy. Emphasis doesn’t 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 1 
SenticNet 
59 
General Outcomes 
SentiWordNet WordNet Affect MPQA 
1. 
Controversial behavior (normalization 
typically improves, emphasis doesn’t) 2. 
Little statistical significance 
(small dataset) 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
60 
Inter-Lexicons evaluation 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
61 
Comparison between lexicons 
Accuracy 
80 
60 
40 
20 
0 
SentiWordNet SenticNet WordNet-Affect MPQA 
58,25 
62,96 
55,08 
74,65 
48,69 
72,42 
SemEval-2013 STS 
70,76 
58,99 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
SentiWordNet is the best-performing configuration on SemEval data 
62 
Comparison between lexicons 
Accuracy 
80 
60 
40 
20 
0 
SentiWordNet SenticNet WordNet-Affect MPQA 
58,25 
62,96 
55,08 
74,65 
48,69 
72,42 
SemEval-2013 STS 
70,76 
58,99 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
63 
Comparison between lexicons 
Accuracy 
80 
60 
40 
20 
0 
SentiWordNet SenticNet WordNet-Affect MPQA 
58,25 
62,96 
55,08 
74,65 
48,69 
72,42 
SemEval-2013 STS 
70,76 
58,99 
MPQA well-performs on SemEval data 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
SenticNet has a controversial behavior: worst on SemEval - best on STS 
64 
Comparison between lexicons 
Accuracy 
80 
60 
40 
20 
0 
SentiWordNet SenticNet WordNet-Affect MPQA 
58,25 
62,96 
55,08 
74,65 
48,69 
72,42 
SemEval-2013 STS 
70,76 
58,99 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
Reason: SenticNet can hardly classify neutral Tweets (threshold learning?) 
65 
Comparison between lexicons 
Accuracy 
80 
60 
40 
20 
0 
SentiWordNet SenticNet WordNet-Affect MPQA 
58,25 
62,96 
55,08 
74,65 
48,69 
72,42 
SemEval-2013 STS 
70,76 
58,99 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
66 
Comparison between lexicons 
Accuracy 
80 
60 
40 
20 
0 
SentiWordNet SenticNet WordNet-Affect MPQA 
58,25 
62,96 
55,08 
74,65 
48,69 
72,42 
SemEval-2013 STS 
70,76 
58,99 
SentiWordNet and MPQA confirm their performance on STS 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
Poor coverage negatively influences Wordnet-Affect performances 
67 
Comparison between lexicons 
Accuracy 
80 
60 
40 
20 
0 
SentiWordNet SenticNet WordNet-Affect MPQA 
58,25 
62,96 
55,08 
74,65 
48,69 
72,42 
SemEval-2013 STS 
70,76 
58,99 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
68 
Statistical Analysis 
Accuracy 
80 
60 
40 
20 
0 
SentiWordNet SenticNet WordNet-Affect MPQA 
58,25 
62,96 
55,08 
74,65 
48,69 
72,42 
best p < 0,0001 p < 0,001 p < 0,50 p < 0,42 best p < 0,0001 p < 0,11 
SemEval-2013 STS 
70,76 
58,99 
= not significant gap = significant gap 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
69 
Conclusions 
Accuracy 
80 
60 
40 
20 
0 
SentiWordNet SenticNet WordNet-Affect MPQA 
58,25 
62,96 
55,08 
74,65 
48,69 
72,42 
best p < 0,0001 p < 0,001 p < 0,50 p < 0,42 best p < 0,0001 p < 0,11 
SemEval-2013 STS 
70,76 
58,99 
= best-performing lexicons 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Conclusions 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 70 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Lessons Learned 
INVESTIGATION ABOUT THE EFFECTIVENESS OF LEXICAL RESOURCES IN 
POLARITY CLASSIFICATION OF MICROBLOG POSTS 
Comparison of 4 state-of-the-art resources 
71 
SentiWordNet - SenticNet - MPQA - WordNet Affect 
Evaluation. 
Research Question: What is the impact of each lexical resource in 
the task of polarity classification? 
MPQA and SentiWordNet typically overcome other resources 
(interesting result, due to the smaller coverage of MPQA) 
SenticNet behavior is worth to be deepen investigated 
1. 
2. 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Future Research 
72 
Evaluation against different datasets and with 
more lexical results; 
Better tuning of parameters (classification 
threshold) , integration of more complex 
syntactic structures, merging lexical resources 
Integration of the algorithm in a 
recommendation framework to exploit 
sentiment-based information to model user 
interests 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
questions? 
Cataldo Musto, Ph.D 
cataldo.musto@uniba.it

More Related Content

What's hot

Glove global vectors for word representation
Glove global vectors for word representationGlove global vectors for word representation
Glove global vectors for word representationhyunyoung Lee
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data miningSulman Ahmed
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsKush Kulshrestha
 
Stressen's matrix multiplication
Stressen's matrix multiplicationStressen's matrix multiplication
Stressen's matrix multiplicationKumar
 
Unit4: Knowledge Representation
Unit4: Knowledge RepresentationUnit4: Knowledge Representation
Unit4: Knowledge RepresentationTekendra Nath Yogi
 
NP Complete Problems
NP Complete ProblemsNP Complete Problems
NP Complete ProblemsNikhil Joshi
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language ProcessingPranav Gupta
 
Divide and Conquer - Part 1
Divide and Conquer - Part 1Divide and Conquer - Part 1
Divide and Conquer - Part 1Amrinder Arora
 
Randomized Algorithms
Randomized AlgorithmsRandomized Algorithms
Randomized AlgorithmsKetan Kamra
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyMarina Santini
 
Knowledge representation In Artificial Intelligence
Knowledge representation In Artificial IntelligenceKnowledge representation In Artificial Intelligence
Knowledge representation In Artificial IntelligenceRamla Sheikh
 
Top Down Parsing, Predictive Parsing
Top Down Parsing, Predictive ParsingTop Down Parsing, Predictive Parsing
Top Down Parsing, Predictive ParsingTanzeela_Hussain
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Simplilearn
 
Propositional logic
Propositional logicPropositional logic
Propositional logicRushdi Shams
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine LearningKnoldus Inc.
 

What's hot (20)

Word2Vec
Word2VecWord2Vec
Word2Vec
 
Glove global vectors for word representation
Glove global vectors for word representationGlove global vectors for word representation
Glove global vectors for word representation
 
Greedy Algorithms
Greedy AlgorithmsGreedy Algorithms
Greedy Algorithms
 
Rules of data mining
Rules of data miningRules of data mining
Rules of data mining
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
Stressen's matrix multiplication
Stressen's matrix multiplicationStressen's matrix multiplication
Stressen's matrix multiplication
 
Unit4: Knowledge Representation
Unit4: Knowledge RepresentationUnit4: Knowledge Representation
Unit4: Knowledge Representation
 
Perception
PerceptionPerception
Perception
 
NP Complete Problems
NP Complete ProblemsNP Complete Problems
NP Complete Problems
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Divide and Conquer - Part 1
Divide and Conquer - Part 1Divide and Conquer - Part 1
Divide and Conquer - Part 1
 
Randomized Algorithms
Randomized AlgorithmsRandomized Algorithms
Randomized Algorithms
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 
Knowledge representation In Artificial Intelligence
Knowledge representation In Artificial IntelligenceKnowledge representation In Artificial Intelligence
Knowledge representation In Artificial Intelligence
 
Top Down Parsing, Predictive Parsing
Top Down Parsing, Predictive ParsingTop Down Parsing, Predictive Parsing
Top Down Parsing, Predictive Parsing
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
 
Propositional logic
Propositional logicPropositional logic
Propositional logic
 
Inverted index
Inverted indexInverted index
Inverted index
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 

Similar to A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts

Combining Distributional Semantics and Entity Linking for Context-aware Conte...
Combining Distributional Semantics and Entity Linking for Context-aware Conte...Combining Distributional Semantics and Entity Linking for Context-aware Conte...
Combining Distributional Semantics and Entity Linking for Context-aware Conte...Cataldo Musto
 
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Cataldo Musto
 
Discourse-Centric Learning Analytics
Discourse-Centric Learning AnalyticsDiscourse-Centric Learning Analytics
Discourse-Centric Learning AnalyticsSimon Buckingham Shum
 
An evaluation of SimRank and Personalized PageRank to build a recommender sys...
An evaluation of SimRank and Personalized PageRank to build a recommender sys...An evaluation of SimRank and Personalized PageRank to build a recommender sys...
An evaluation of SimRank and Personalized PageRank to build a recommender sys...Paolo Tomeo
 
Corneli
CorneliCorneli
Cornelianesah
 
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on TwitterAn Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on TwitterSymeon Papadopoulos
 
Impact your Library UX with Contextual Inquiry
Impact your Library UX with Contextual InquiryImpact your Library UX with Contextual Inquiry
Impact your Library UX with Contextual InquiryRachel Vacek
 
Linked Open Data-enabled Strategies for Top-N Recommendations
Linked Open Data-enabled Strategies for Top-N RecommendationsLinked Open Data-enabled Strategies for Top-N Recommendations
Linked Open Data-enabled Strategies for Top-N RecommendationsCataldo Musto
 
Transcript - DOIs to support citation of grey literature
Transcript - DOIs to support citation of grey literatureTranscript - DOIs to support citation of grey literature
Transcript - DOIs to support citation of grey literatureARDC
 
SATANJEEV BANERJEE
SATANJEEV BANERJEESATANJEEV BANERJEE
SATANJEEV BANERJEEbutest
 
Academia, part of my 2014-2015 lectures at the University of Bergamo.
Academia, part of my 2014-2015 lectures at the University of Bergamo.Academia, part of my 2014-2015 lectures at the University of Bergamo.
Academia, part of my 2014-2015 lectures at the University of Bergamo.Roberto Peretta
 
FoCAS Newsletter Issue Two: January 2014
FoCAS Newsletter Issue Two: January 2014FoCAS Newsletter Issue Two: January 2014
FoCAS Newsletter Issue Two: January 2014FoCAS Initiative
 
Supervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmSupervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmIJSRD
 
The Effect of Different Set-based Visualizations on User Exploration of Reco...
The Effect of Different Set-based  Visualizations on User Exploration of Reco...The Effect of Different Set-based  Visualizations on User Exploration of Reco...
The Effect of Different Set-based Visualizations on User Exploration of Reco...Denis Parra Santander
 
Ed-Media2010- De Liddo
Ed-Media2010- De LiddoEd-Media2010- De Liddo
Ed-Media2010- De LiddoAnna De Liddo
 
From Open Content To Open Thinking
From Open Content To Open ThinkingFrom Open Content To Open Thinking
From Open Content To Open ThinkingAnna De Liddo
 

Similar to A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts (20)

Combining Distributional Semantics and Entity Linking for Context-aware Conte...
Combining Distributional Semantics and Entity Linking for Context-aware Conte...Combining Distributional Semantics and Entity Linking for Context-aware Conte...
Combining Distributional Semantics and Entity Linking for Context-aware Conte...
 
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
 
Discourse-Centric Learning Analytics
Discourse-Centric Learning AnalyticsDiscourse-Centric Learning Analytics
Discourse-Centric Learning Analytics
 
ESWC 2014 Tutorial Part 4
ESWC 2014 Tutorial Part 4ESWC 2014 Tutorial Part 4
ESWC 2014 Tutorial Part 4
 
An evaluation of SimRank and Personalized PageRank to build a recommender sys...
An evaluation of SimRank and Personalized PageRank to build a recommender sys...An evaluation of SimRank and Personalized PageRank to build a recommender sys...
An evaluation of SimRank and Personalized PageRank to build a recommender sys...
 
Corneli
CorneliCorneli
Corneli
 
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on TwitterAn Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
 
Anu paper(IJARCCE)
Anu paper(IJARCCE)Anu paper(IJARCCE)
Anu paper(IJARCCE)
 
Impact your Library UX with Contextual Inquiry
Impact your Library UX with Contextual InquiryImpact your Library UX with Contextual Inquiry
Impact your Library UX with Contextual Inquiry
 
Linked Open Data-enabled Strategies for Top-N Recommendations
Linked Open Data-enabled Strategies for Top-N RecommendationsLinked Open Data-enabled Strategies for Top-N Recommendations
Linked Open Data-enabled Strategies for Top-N Recommendations
 
Transcript - DOIs to support citation of grey literature
Transcript - DOIs to support citation of grey literatureTranscript - DOIs to support citation of grey literature
Transcript - DOIs to support citation of grey literature
 
SATANJEEV BANERJEE
SATANJEEV BANERJEESATANJEEV BANERJEE
SATANJEEV BANERJEE
 
Academia, part of my 2014-2015 lectures at the University of Bergamo.
Academia, part of my 2014-2015 lectures at the University of Bergamo.Academia, part of my 2014-2015 lectures at the University of Bergamo.
Academia, part of my 2014-2015 lectures at the University of Bergamo.
 
Sub1557
Sub1557Sub1557
Sub1557
 
N01741100102
N01741100102N01741100102
N01741100102
 
FoCAS Newsletter Issue Two: January 2014
FoCAS Newsletter Issue Two: January 2014FoCAS Newsletter Issue Two: January 2014
FoCAS Newsletter Issue Two: January 2014
 
Supervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmSupervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithm
 
The Effect of Different Set-based Visualizations on User Exploration of Reco...
The Effect of Different Set-based  Visualizations on User Exploration of Reco...The Effect of Different Set-based  Visualizations on User Exploration of Reco...
The Effect of Different Set-based Visualizations on User Exploration of Reco...
 
Ed-Media2010- De Liddo
Ed-Media2010- De LiddoEd-Media2010- De Liddo
Ed-Media2010- De Liddo
 
From Open Content To Open Thinking
From Open Content To Open ThinkingFrom Open Content To Open Thinking
From Open Content To Open Thinking
 

More from Cataldo Musto

MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...Cataldo Musto
 
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationFairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationCataldo Musto
 
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Cataldo Musto
 
Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Cataldo Musto
 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Cataldo Musto
 
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Cataldo Musto
 
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Cataldo Musto
 
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsHybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsCataldo Musto
 
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Cataldo Musto
 
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeL'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeCataldo Musto
 
Explanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemExplanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemCataldo Musto
 
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Cataldo Musto
 
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...Cataldo Musto
 
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfMyrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfCataldo Musto
 
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Cataldo Musto
 
Holistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesHolistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesCataldo Musto
 
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsA Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsCataldo Musto
 
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?Cataldo Musto
 
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Cataldo Musto
 
Il Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkIl Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkCataldo Musto
 

More from Cataldo Musto (20)

MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
 
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationFairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
 
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
 
Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...
 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...
 
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
 
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
 
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsHybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
 
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
 
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeL'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
 
Explanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemExplanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender System
 
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
 
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
 
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfMyrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
 
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
 
Holistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesHolistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart Cities
 
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsA Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
 
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
 
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
 
Il Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkIl Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social Network
 

Recently uploaded

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 

Recently uploaded (20)

My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 

A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts

  • 1. DART 2014 8th Internation Workshop on Information Filtering and Retrieval Pisa (Italy) December 10, 2014 A comparison of lexicon-based approaches for Sentiment Analysis of microblog posts Cataldo Musto, Giovanni Semeraro, Marco Polignano (Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group)
  • 2. Outline • Background • Sentiment Analysis • Lexicon-based approaches • Methodology • State-of-the-art lexicons • Experiments • Conclusions Cataldo Musto, Giovanni Semeraro, Marco Polignano 2 A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 3. Background One minute on the Web Cataldo Musto, Giovanni Semeraro, Marco Polignano 3 A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 4. Background One minute on the Web 4 Information Overload Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 5. 5 Background Information Overload Obstacleor Opportunity? Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 6. 6 Opportunities (Social) Content Analytics Insight: to aggregate rough human-generated data to get valuable people-based findings Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 7. - Real-time polls 7 Social Content Analytics Applications - Social CRM - Online brand monitoring All these applications share a common denominator Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 8. - Real-time polls They all need a methodology to automatically associate an opinion and/or a polarity to each piece of content 8 Social Content Analytics Applications - Social CRM - Online brand monitoring All these applications share a common denominator Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 9. - Real-time polls 9 Social Content Analytics Applications - Social CRM Solution: - Online brand monitoring Sentiment Analysis All these applications share a common denominator They all need a methodology to automatically associate an opinion and/or a polarity to each piece of content Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 10. 10 Sentiment Analysis Definition “It is the field of study that analyzes people’s opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes “ (*) (Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008) Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 11. 11 Sentiment Analysis Definition “It is the field of study that analyzes people’s opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes “ (*) (Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008) We will focus on the polarity detection task Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 12. 12 Sentiment Analysis State of the art Supervised Approaches (Machine Learning-based) Unsupervised Approaches (Lexicon-based) Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 13. Man ? 13 Sentiment Analysis Supervised approaches Dog Learn a classification model relying on labeled examples Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 14. frustration - - joy +++ 14 Sentiment Analysis Unsupervised approaches Rely on external lexical resources that associate a polarity score to each term. Sentiment of the content depends on the sentiment of the terms which compose it. Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 15. 15 Sentiment Analysis Supervised vs Unsupervised Pros Cons Nakov, Preslav, et al. "Semeval-2013 task 2: Sentiment analysis in Twitter.” Proceedings of SemEval 2013 Rosenthal, Sara, et al. "Semeval-2014 task 9: Sentiment analysis in Twitter." Proceedings of SemEval 2014. (*) (**) Supervised Higher Accuracy (*) (**) Pre-labeled examples Unsupervised No Training Accuracy depends on lexical resources Several lexical resources available Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 16. Pros Cons Supervised Higher Accuracy (*) (**) Pre-labeled examples Unsupervised No Training Accuracy depends on lexical resources Several lexical resources available We focus on lexicon-based approaches 16 Sentiment Analysis Supervised vs Unsupervised Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 17. 17 Contributions We propose a novel unsupervised lexicon-based approach for sentiment analysis We provide a comparison of lexical resources for sentiment analysis of microblog posts 1. 2. Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 18. 18 Methodology Lexicon-based approach Insight: The polarity of a textual content (e.g. a microblog posts) depends on the polarity of the microphrases which compose it. Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 19. 19 Methodology Lexicon-based approach Insight: The polarity of a textual content (e.g. a microblog posts) depends on the polarity of the microphrases which compose it. A microphrase is built whenever a splitting cue is found in the text Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 20. Conjunctions, adverbs and punctuations are used as 20 Methodology Lexicon-based approach Insight: The polarity of a textual content (e.g. a microblog posts) depends on the polarity of the microphrases which compose it. A microphrase is built whenever a splitting cue is found in the text splitting cues Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 21. Conjunctions, adverbs and punctuations are used as 21 Methodology Lexicon-based approach Insight: The polarity of a textual content (e.g. a microblog posts) depends on the polarity of the microphrases which compose it. A microphrase is built whenever a splitting cue is found in the text splitting cues example: “I don’t like this food, it’s terrible” Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 22. Conjunctions, adverbs and punctuations are used as 22 Methodology Lexicon-based approach Insight: The polarity of a textual content (e.g. a microblog posts) depends on the polarity of the microphrases which compose it. A microphrase is built whenever a splitting cue is found in the text splitting cues example: “I don’t like this food, it’s terrible” { { splitting m1 cue m2 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 23. 23 Methodology Lexicon-based approach Insight: The polarity of a textual content (e.g. a microblog posts) depends on the polarity of the microphrases which compose it. k pol(T) = Σ pol(mi) i=1 Tweet microphrase T={m1…mk} Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 24. 24 Methodology Lexicon-based approach Insight: The polarity of a microphrase depends on the polarity of the terms which compose it. k pol(T) = Σ pol(mi) i=1 Tweet microphrase n pol(mi) = Σ score(tj) j=1 term T={m1…mk} Mi={t1…tn} Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 25. 25 Methodology Four variant proposed Basic k pol(T) = Σ pol(mi) i=1 n pol(mi) = Σ score(tj) j=1 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 26. Four variant proposed Normalized pol(T) = Σ pol(mi) i=1 pol(mi) = Σ score(tj) 26 Methodology Basic k pol(T) = Σ pol(mi) i=1 n pol(mi) = Σ score(tj) j=1 n |mi| j=1 Score of each microphrase is normalized according to its length Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 27. Four variant proposed Normalized pol(T) = Σ pol(mi) i=1 pol(mi) = Σ score(tj) with an higher weight categories=adverbs, verbs, adjectives & valence 27 Methodology Basic k pol(T) = Σ pol(mi) i=1 n pol(mi) = Σ score(tj) j=1 n |mi| j=1 Emphasized pol(T) = Σ pol(mi) i=1 pol(mi) = n Σ score(tj) j=1 *w(tj) Specific categories are provided && valence shifters (intensifiers & downtoners) Several weights have been evaluated Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 28. Four variant proposed Normalized pol(T) = Σ pol(mi) i=1 pol(mi) = Σ score(tj) 28 Methodology Basic k pol(T) = Σ pol(mi) i=1 n pol(mi) = Σ score(tj) j=1 n |mi| j=1 Emphasized Normalized-Emphasized pol(T) = Σ pol(mi) i=1 pol(mi) = n Σ score(tj) j=1 pol(T) = Σ pol(mi) pol(mi) = Σscore(tj) Combination |mi| *w(tj) *w(tj) Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 29. We have a problem Normalized pol(T) = Σ pol(mi) i=1 pol(mi) = Σ score(tj) 29 Methodology Basic k pol(T) = Σ pol(mi) i=1 n pol(mi) = Σ score(tj) j=1 n |mi| j=1 Emphasized Normalized-Emphasized pol(T) = Σ pol(mi) i=1 pol(mi) = n Σ score(tj) j=1 pol(T) = Σ pol(mi) pol(mi) = Σscore(tj) |mi| *w(tj) *w(tj) Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 30. We have a problem Normalized pol(T) = Σ pol(mi) i=1 pol(mi) = Σ How to calculate score(score(tj) ? tj) 30 Methodology Basic k pol(T) = Σ pol(mi) i=1 n pol(mi) = Σ score(tj) j=1 n |mi| j=1 Emphasized Normalized-Emphasized pol(T) = Σ pol(mi) i=1 pol(mi) = n Σ score(tj) j=1 pol(T) = Σ pol(mi) pol(mi) = Σscore(tj) |mi| *w(tj) *w(tj) Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 31. 31 Solution Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 32. 32 Lexical Resources State of the art We evaluated four state-of-the-art resources for sentiment analysis SentiWordNet http://sentiwordnet.isti.cnr.it WordNet Affect http://wndomains.fbk.eu/wnaffect.html SenticNet http://sentic.net MPQA http://mpqa.cs.pitt.edu Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 33. 33 Lexical Resources SentiWordNet(*) Each WordNet synset is provided with three different sentiment scores (positivity, negativity, objectivity) (*) Baccianella, Stefano, Andrea Esuli, and Fabrizio Sebastiani. "SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining." LREC. Vol. 10. 2010. Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 34. 34 Lexical Resources WordNet Affect(*) WordNet extension Affective-related synsets are mapped with an A-Label e.g. euphoria —> positive-emotion illness —> physical state (*) Strapparava, Carlo, and Alessandro Valitutti. "WordNet Affect: an Affective Extension of WordNet." LREC. Vol. 4. 2004. Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 35. 35 Lexical Resources SenticNet(*) Inspired by the Hourglass of Emotions model Each term is represented of the ground of the intensity of four basic emotional dimensions (sensitivity, aptitude, attention, pleasantness) The activation level of each dimension defines 16 basic emotions (*) Cambria, Erik, Daniel Olsher, and Dheeraj Rajagopal. "SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis." Twenty-eighth AAAI conference on artificial intelligence. 2014. Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 36. 36 Lexical Resources SenticNet(*) According to the triggered emotions, each term is provided with an aggregated polarity score Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 37. 37 Lexical Resources SenticNet(*) SenticNet models a sentiment score for some bigrams and trigrams as well! Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 38. 38 Lexical Resources MPQA(*) (*) Wilson, Theresa, Janyce Wiebe, and Paul Hoffmann. "Recognizing contextual polarity in phrase-level sentiment analysis." Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, 2005. Each term is (manually) provided with a discrete sentiment score +1 positive 0 neutral -1 negative Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 39. 39 Lexical Resources Comparison Resource Coverage (terms) SentiWordNet 117,659 WordNet Affect 200 SenticNet 14,000 MPQA 8,222 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 40. Cataldo Musto, Giovanni Semeraro, Marco Polignano 40 A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 41. 41 Lexical Resources Score calculation SentiWordNet Given a term, score(tj) is the mean of the sentiment score of all the possible synsets of tj score(good) = 0.75 + 0 + 1 +1 = 4 0.687 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 42. Score calculation Given a term, score(tj), WordNet Affect hierarchy is climbed until an A-Label which occur in SentiWordNet is found. tj inherits the sentiment score of the A-Label score(good) = score(benevolence) = 0.339 42 Lexical Resources WordNet Affect Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 43. 43 Lexical Resources Score calculation SenticNet Given a term, score(tj), SenticNet APIs are queried and sentiment score is extracted score(good) = 0.883 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 44. 44 Lexical Resources Score calculation MPQA Given a term, score(tj), MPQA Lexicon are queried and sentiment score is extracted score(good) = 1 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 45. 45 Methodology Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 46. Experimental Evaluation Research Hypothesis 46 1. How do the different versions of the algorithm perform with respect to state-of-the- art datasets? 2. What is the best lexical resource to detect the polarity of microblog posts? Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 47. Experimental Evaluation Description of the datasets 47 • SemEval-2013 • 14,435 Tweets • 8,180 training • 3,255 test • Positive, Negative, Neutral • STS Dataset • 1,600,000 Tweets • only 359 test • Positive, Negative Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 48. Experimental Evaluation Statistics about Coverage 48 Lexicon SemEval-2013-Test STS-Test Vocabulary Size 18,309 6,711 SentiWordNet 4,314 883 WordNet-Affect 149 48 MPQA 897 224 SenticNet 1,497 326 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 49. Experiment 1 49 Intra-Lexicons evaluation Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 50. norm vs norm+emph significant (p < 0,0001) Basic Normalized Emphasized Norm-Emph Experiment 1 57,67 58,1 58,65 58,99 45 50 55 60 65 50 SemEval :: SentiWordNet Emphasis and Normalization improve the accuracy Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 51. Basic Normalized Emphasized Norm-Emph Experiment 1 53,92 55,05 53,95 55,08 not significant 45 50 55 60 65 51 SemEval :: WordNet Affect Emphasis and Normalization improve the accuracy Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 52. Basic Normalized Emphasized Norm-Emph Experiment 1 58,03 57,97 58,25 58,1 not significant 45 50 55 60 65 52 SemEval :: MPQA Emphasis improves the accuracy. Normalization doesn’t. Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 53. Basic Normalized Emphasized Norm-Emph Experiment 1 48,69 47,25 48,29 48,08 norm vs norm+emph significant (p < 0,0001) 45 50 55 60 65 53 SemEval :: SenticNet No improvement Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 54. Experiment 1 54 General Outcomes SentiWordNet WordNet Affect MPQA Emphasis leads to improvements (7 out of 8 comparisons). 1. 2. SenticNet Normalization doesn’t. (1 out of 4 comparisons) Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 55. Basic Normalized Emphasized Norm-Emph Experiment 1 71,87 72,42 71,31 71,59 not significant gaps 60 63,75 67,5 71,25 75 55 STS :: SentiWordNet Normalization improves the accuracy. Emphasis doesn’t Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 56. Basic Normalized Emphasized Norm-Emph Experiment 1 62,95 62,67 62,96 62,95 60 63,75 67,5 71,25 75 56 STS :: WordNet Affect not significant gaps Emphasis improves the accuracy. Normalization doesn’t Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 57. Basic Normalized Emphasized Norm-Emph Experiment 1 69,54 70,75 69,92 70,76 60 63,75 67,5 71,25 75 57 STS :: MPQA not significant gaps Both Emphasis and Normalization improve the accuracy. Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 58. Basic Normalized Emphasized Norm-Emph Experiment 1 74,37 74,65 74,65 73,82 not significant 70 71,75 73,5 75,25 77 58 STS :: SenticNet Normalization improves the accuracy. Emphasis doesn’t Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 59. Experiment 1 SenticNet 59 General Outcomes SentiWordNet WordNet Affect MPQA 1. Controversial behavior (normalization typically improves, emphasis doesn’t) 2. Little statistical significance (small dataset) Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 60. Experiment 2 60 Inter-Lexicons evaluation Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 61. Experiment 2 61 Comparison between lexicons Accuracy 80 60 40 20 0 SentiWordNet SenticNet WordNet-Affect MPQA 58,25 62,96 55,08 74,65 48,69 72,42 SemEval-2013 STS 70,76 58,99 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 62. Experiment 2 SentiWordNet is the best-performing configuration on SemEval data 62 Comparison between lexicons Accuracy 80 60 40 20 0 SentiWordNet SenticNet WordNet-Affect MPQA 58,25 62,96 55,08 74,65 48,69 72,42 SemEval-2013 STS 70,76 58,99 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 63. Experiment 2 63 Comparison between lexicons Accuracy 80 60 40 20 0 SentiWordNet SenticNet WordNet-Affect MPQA 58,25 62,96 55,08 74,65 48,69 72,42 SemEval-2013 STS 70,76 58,99 MPQA well-performs on SemEval data Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 64. Experiment 2 SenticNet has a controversial behavior: worst on SemEval - best on STS 64 Comparison between lexicons Accuracy 80 60 40 20 0 SentiWordNet SenticNet WordNet-Affect MPQA 58,25 62,96 55,08 74,65 48,69 72,42 SemEval-2013 STS 70,76 58,99 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 65. Experiment 2 Reason: SenticNet can hardly classify neutral Tweets (threshold learning?) 65 Comparison between lexicons Accuracy 80 60 40 20 0 SentiWordNet SenticNet WordNet-Affect MPQA 58,25 62,96 55,08 74,65 48,69 72,42 SemEval-2013 STS 70,76 58,99 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 66. Experiment 2 66 Comparison between lexicons Accuracy 80 60 40 20 0 SentiWordNet SenticNet WordNet-Affect MPQA 58,25 62,96 55,08 74,65 48,69 72,42 SemEval-2013 STS 70,76 58,99 SentiWordNet and MPQA confirm their performance on STS Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 67. Experiment 2 Poor coverage negatively influences Wordnet-Affect performances 67 Comparison between lexicons Accuracy 80 60 40 20 0 SentiWordNet SenticNet WordNet-Affect MPQA 58,25 62,96 55,08 74,65 48,69 72,42 SemEval-2013 STS 70,76 58,99 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 68. Experiment 2 68 Statistical Analysis Accuracy 80 60 40 20 0 SentiWordNet SenticNet WordNet-Affect MPQA 58,25 62,96 55,08 74,65 48,69 72,42 best p < 0,0001 p < 0,001 p < 0,50 p < 0,42 best p < 0,0001 p < 0,11 SemEval-2013 STS 70,76 58,99 = not significant gap = significant gap Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 69. Experiment 2 69 Conclusions Accuracy 80 60 40 20 0 SentiWordNet SenticNet WordNet-Affect MPQA 58,25 62,96 55,08 74,65 48,69 72,42 best p < 0,0001 p < 0,001 p < 0,50 p < 0,42 best p < 0,0001 p < 0,11 SemEval-2013 STS 70,76 58,99 = best-performing lexicons Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 70. Conclusions Cataldo Musto, Giovanni Semeraro, Marco Polignano 70 A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 71. Lessons Learned INVESTIGATION ABOUT THE EFFECTIVENESS OF LEXICAL RESOURCES IN POLARITY CLASSIFICATION OF MICROBLOG POSTS Comparison of 4 state-of-the-art resources 71 SentiWordNet - SenticNet - MPQA - WordNet Affect Evaluation. Research Question: What is the impact of each lexical resource in the task of polarity classification? MPQA and SentiWordNet typically overcome other resources (interesting result, due to the smaller coverage of MPQA) SenticNet behavior is worth to be deepen investigated 1. 2. Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 72. Future Research 72 Evaluation against different datasets and with more lexical results; Better tuning of parameters (classification threshold) , integration of more complex syntactic structures, merging lexical resources Integration of the algorithm in a recommendation framework to exploit sentiment-based information to model user interests Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 73. questions? Cataldo Musto, Ph.D cataldo.musto@uniba.it