SlideShare a Scribd company logo
1 of 73
Download to read offline
DART 2014 
8th Internation Workshop on 
Information Filtering and Retrieval 
Pisa (Italy) 
December 10, 2014 
A comparison of lexicon-based 
approaches for Sentiment Analysis 
of microblog posts 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
(UniversitĆ  degli Studi di Bari ā€˜Aldo Moroā€™, Italy - SWAP Research Group)
Outline 
ā€¢ Background 
ā€¢ Sentiment Analysis 
ā€¢ Lexicon-based approaches 
ā€¢ Methodology 
ā€¢ State-of-the-art 
lexicons 
ā€¢ Experiments 
ā€¢ Conclusions 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 2 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Background 
One minute on the Web 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 3 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Background 
One minute on the Web 
4 
Information 
Overload 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
5 
Background 
Information Overload 
Obstacleor Opportunity? 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
6 
Opportunities 
(Social) Content Analytics 
Insight: to aggregate rough human-generated data to get 
valuable people-based findings 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
- Real-time polls 
7 
Social Content Analytics 
Applications 
- Social CRM 
- Online brand 
monitoring 
All these applications share a common denominator 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
- Real-time polls 
They all need a methodology to automatically associate 
an opinion and/or a polarity to each piece of content 
8 
Social Content Analytics 
Applications 
- Social CRM 
- Online brand 
monitoring 
All these applications share a common denominator 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
- Real-time polls 
9 
Social Content Analytics 
Applications 
- Social CRM 
Solution: 
- Online brand 
monitoring 
Sentiment Analysis 
All these applications share a common denominator 
They all need a methodology to automatically associate 
an opinion and/or a polarity to each piece of content 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
10 
Sentiment Analysis 
Definition 
ā€œIt is the field of study that 
analyzes peopleā€™s 
opinions, sentiments, 
evaluations, appraisals, 
attitudes, and emotions 
towards entities such as 
products, services, 
organizations, individuals, 
issues, events, topics, and 
their attributes ā€œ (*) 
(Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008) 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
11 
Sentiment Analysis 
Definition 
ā€œIt is the field of study that 
analyzes peopleā€™s 
opinions, sentiments, 
evaluations, appraisals, 
attitudes, and emotions 
towards entities such as 
products, services, 
organizations, individuals, 
issues, events, topics, and 
their attributes ā€œ (*) 
(Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008) 
We will focus on the polarity detection task 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
12 
Sentiment Analysis 
State of the art 
Supervised 
Approaches 
(Machine Learning-based) 
Unsupervised 
Approaches 
(Lexicon-based) 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Man ? 
13 
Sentiment Analysis 
Supervised approaches 
Dog 
Learn a classification model 
relying on labeled examples 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
frustration - - 
joy +++ 
14 
Sentiment Analysis 
Unsupervised approaches 
Rely on external lexical resources 
that associate a polarity score to each term. 
Sentiment of the content depends on 
the sentiment of the terms which compose it. 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
15 
Sentiment Analysis 
Supervised vs Unsupervised 
Pros Cons 
Nakov, Preslav, et al. "Semeval-2013 task 2: Sentiment analysis in Twitter.ā€ 
Proceedings of SemEval 2013 
Rosenthal, Sara, et al. "Semeval-2014 task 9: Sentiment analysis in Twitter." 
Proceedings of SemEval 2014. 
(*) 
(**) 
Supervised 
Higher Accuracy 
(*) (**) 
Pre-labeled 
examples 
Unsupervised No Training 
Accuracy depends on lexical 
resources 
Several lexical resources available 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Pros Cons 
Supervised 
Higher Accuracy 
(*) (**) 
Pre-labeled 
examples 
Unsupervised No Training 
Accuracy depends on lexical 
resources 
Several lexical resources available 
We focus on 
lexicon-based approaches 
16 
Sentiment Analysis 
Supervised vs Unsupervised 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
17 
Contributions 
We propose a novel 
unsupervised lexicon-based 
approach for 
sentiment analysis 
We provide a 
comparison of 
lexical resources for 
sentiment analysis of 
microblog posts 
1. 
2. 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
18 
Methodology 
Lexicon-based approach 
Insight: 
The polarity of a textual content (e.g. a 
microblog posts) depends on the polarity 
of the microphrases which compose it. 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
19 
Methodology 
Lexicon-based approach 
Insight: 
The polarity of a textual content (e.g. a 
microblog posts) depends on the polarity 
of the microphrases which compose it. 
A microphrase is built 
whenever a splitting cue 
is found in the text 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Conjunctions, adverbs and 
punctuations are used as 
20 
Methodology 
Lexicon-based approach 
Insight: 
The polarity of a textual content (e.g. a 
microblog posts) depends on the polarity 
of the microphrases which compose it. 
A microphrase is built 
whenever a splitting cue 
is found in the text 
splitting cues 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Conjunctions, adverbs and 
punctuations are used as 
21 
Methodology 
Lexicon-based approach 
Insight: 
The polarity of a textual content (e.g. a 
microblog posts) depends on the polarity 
of the microphrases which compose it. 
A microphrase is built 
whenever a splitting cue 
is found in the text 
splitting cues 
example: ā€œI donā€™t like this food, itā€™s terribleā€ 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Conjunctions, adverbs and 
punctuations are used as 
22 
Methodology 
Lexicon-based approach 
Insight: 
The polarity of a textual content (e.g. a 
microblog posts) depends on the polarity 
of the microphrases which compose it. 
A microphrase is built 
whenever a splitting cue 
is found in the text 
splitting cues 
example: ā€œI donā€™t like this food, itā€™s terribleā€ 
{ 
{ 
splitting 
m1 cue 
m2 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
23 
Methodology 
Lexicon-based approach 
Insight: 
The polarity of a textual content (e.g. a 
microblog posts) depends on the polarity 
of the microphrases which compose it. 
k 
pol(T) = Ī£ pol(mi) 
i=1 
Tweet microphrase 
T={m1ā€¦mk} 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
24 
Methodology 
Lexicon-based approach 
Insight: 
The polarity of a microphrase depends on 
the polarity of the terms which compose it. 
k 
pol(T) = Ī£ pol(mi) 
i=1 
Tweet microphrase 
n 
pol(mi) = Ī£ score(tj) 
j=1 
term 
T={m1ā€¦mk} 
Mi={t1ā€¦tn} 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
25 
Methodology 
Four variant proposed 
Basic 
k 
pol(T) = Ī£ pol(mi) i=1 
n 
pol(mi) = Ī£ score(tj) 
j=1 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Four variant proposed 
Normalized 
pol(T) = Ī£ pol(mi) i=1 
pol(mi) = Ī£ 
score(tj) 
26 
Methodology 
Basic 
k 
pol(T) = Ī£ pol(mi) i=1 
n 
pol(mi) = Ī£ score(tj) 
j=1 
n 
|mi| 
j=1 
Score of each microphrase is normalized 
according to its length 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Four variant proposed 
Normalized 
pol(T) = Ī£ pol(mi) i=1 
pol(mi) = Ī£ 
score(tj) 
with an higher weight 
categories=adverbs, verbs, adjectives & valence 
27 
Methodology 
Basic 
k 
pol(T) = Ī£ pol(mi) i=1 
n 
pol(mi) = Ī£ score(tj) 
j=1 
n 
|mi| 
j=1 
Emphasized 
pol(T) = Ī£ pol(mi) i=1 
pol(mi) = n 
Ī£ score(tj) 
j=1 
*w(tj) 
Specific categories are provided 
&& 
valence shifters (intensifiers & downtoners) 
Several weights have been evaluated 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Four variant proposed 
Normalized 
pol(T) = Ī£ pol(mi) i=1 
pol(mi) = Ī£ 
score(tj) 
28 
Methodology 
Basic 
k 
pol(T) = Ī£ pol(mi) i=1 
n 
pol(mi) = Ī£ score(tj) 
j=1 
n 
|mi| 
j=1 
Emphasized Normalized-Emphasized 
pol(T) = Ī£ pol(mi) i=1 
pol(mi) = n 
Ī£ score(tj) 
j=1 
pol(T) = Ī£ pol(mi) 
pol(mi) = Ī£score(tj) 
Combination 
|mi| *w(tj) *w(tj) 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
We have a problem 
Normalized 
pol(T) = Ī£ pol(mi) i=1 
pol(mi) = Ī£ 
score(tj) 
29 
Methodology 
Basic 
k 
pol(T) = Ī£ pol(mi) i=1 
n 
pol(mi) = Ī£ score(tj) 
j=1 
n 
|mi| 
j=1 
Emphasized Normalized-Emphasized 
pol(T) = Ī£ pol(mi) i=1 
pol(mi) = n 
Ī£ score(tj) 
j=1 
pol(T) = Ī£ pol(mi) 
pol(mi) = Ī£score(tj) 
|mi| *w(tj) *w(tj) 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
We have a problem 
Normalized 
pol(T) = Ī£ pol(mi) i=1 
pol(mi) = Ī£ 
How to calculate 
score(score(tj) ? 
tj) 
30 
Methodology 
Basic 
k 
pol(T) = Ī£ pol(mi) i=1 
n 
pol(mi) = Ī£ score(tj) 
j=1 
n 
|mi| 
j=1 
Emphasized Normalized-Emphasized 
pol(T) = Ī£ pol(mi) i=1 
pol(mi) = n 
Ī£ score(tj) 
j=1 
pol(T) = Ī£ pol(mi) 
pol(mi) = Ī£score(tj) 
|mi| *w(tj) *w(tj) 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
31 
Solution 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
32 
Lexical Resources 
State of the art 
We evaluated four state-of-the-art 
resources for sentiment analysis 
SentiWordNet 
http://sentiwordnet.isti.cnr.it 
WordNet Affect 
http://wndomains.fbk.eu/wnaffect.html 
SenticNet 
http://sentic.net 
MPQA 
http://mpqa.cs.pitt.edu 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
33 
Lexical Resources SentiWordNet(*) 
Each WordNet synset is provided with three different 
sentiment scores (positivity, negativity, objectivity) 
(*) Baccianella, Stefano, Andrea Esuli, and Fabrizio 
Sebastiani. "SentiWordNet 3.0: An Enhanced Lexical 
Resource for Sentiment Analysis and Opinion Mining." 
LREC. Vol. 10. 2010. 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
34 
Lexical Resources WordNet Affect(*) 
WordNet extension 
Affective-related synsets 
are mapped with an A-Label 
e.g. euphoria ā€”> positive-emotion 
illness ā€”> physical state 
(*) Strapparava, Carlo, and Alessandro Valitutti. "WordNet 
Affect: an Affective Extension of WordNet." LREC. Vol. 4. 
2004. 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
35 
Lexical Resources SenticNet(*) 
Inspired by the Hourglass of 
Emotions model 
Each term is represented of the 
ground of the intensity of four basic 
emotional dimensions (sensitivity, 
aptitude, attention, pleasantness) 
The activation level of each dimension 
defines 16 basic emotions 
(*) Cambria, Erik, Daniel Olsher, and Dheeraj Rajagopal. 
"SenticNet 3: a common and common-sense knowledge 
base for cognition-driven sentiment analysis." Twenty-eighth 
AAAI conference on artificial intelligence. 2014. 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
36 
Lexical Resources SenticNet(*) 
According to the triggered emotions, each term 
is provided with an aggregated polarity score 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
37 
Lexical Resources SenticNet(*) 
SenticNet models a sentiment score 
for some bigrams and trigrams as well! 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
38 
Lexical Resources MPQA(*) 
(*) Wilson, Theresa, Janyce Wiebe, and Paul Hoffmann. 
"Recognizing contextual polarity in phrase-level 
sentiment analysis." Proceedings of the conference on 
human language technology and empirical methods in 
natural language processing. Association for Computational 
Linguistics, 2005. 
Each term is 
(manually) provided 
with a discrete 
sentiment score 
+1 positive 
0 neutral 
-1 negative 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
39 
Lexical Resources Comparison 
Resource Coverage (terms) 
SentiWordNet 117,659 
WordNet Affect 200 
SenticNet 14,000 
MPQA 8,222 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Cataldo Musto, Giovanni Semeraro, Marco Polignano 40 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
41 
Lexical Resources 
Score calculation 
SentiWordNet 
Given a term, 
score(tj) is the 
mean of the 
sentiment score of 
all the possible 
synsets of tj 
score(good) = 0.75 + 0 + 1 +1 = 
4 
0.687 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Score calculation 
Given a term, score(tj), 
WordNet Affect hierarchy is 
climbed until an A-Label which 
occur in SentiWordNet is found. 
tj inherits the sentiment 
score of the A-Label 
score(good) = score(benevolence) = 
0.339 
42 
Lexical Resources 
WordNet Affect 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
43 
Lexical Resources 
Score calculation 
SenticNet 
Given a term, 
score(tj), SenticNet 
APIs are queried 
and sentiment 
score is extracted 
score(good) = 0.883 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
44 
Lexical Resources 
Score calculation 
MPQA 
Given a term, 
score(tj), MPQA 
Lexicon are 
queried and 
sentiment score is 
extracted 
score(good) = 1 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
45 
Methodology 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experimental Evaluation 
Research Hypothesis 
46 
1. How do the different 
versions of the algorithm 
perform with respect to state-of-the- 
art datasets? 
2. What is the best lexical 
resource to detect the polarity 
of microblog posts? 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experimental Evaluation 
Description of the datasets 
47 
ā€¢ SemEval-2013 ā€¢ 14,435 Tweets ā€¢ 8,180 training ā€¢ 3,255 test ā€¢ Positive, Negative, Neutral ā€¢ STS Dataset ā€¢ 1,600,000 Tweets ā€¢ only 359 test ā€¢ Positive, Negative 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experimental Evaluation 
Statistics about Coverage 
48 
Lexicon SemEval-2013-Test STS-Test 
Vocabulary Size 18,309 6,711 
SentiWordNet 4,314 883 
WordNet-Affect 149 48 
MPQA 897 224 
SenticNet 1,497 326 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 1 
49 
Intra-Lexicons evaluation 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
norm vs norm+emph 
significant (p < 0,0001) 
Basic 
Normalized 
Emphasized 
Norm-Emph 
Experiment 1 
57,67 
58,1 
58,65 
58,99 
45 50 55 60 65 
50 
SemEval :: SentiWordNet 
Emphasis and Normalization improve the accuracy 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Basic 
Normalized 
Emphasized 
Norm-Emph 
Experiment 1 
53,92 
55,05 
53,95 
55,08 
not significant 
45 50 55 60 65 
51 
SemEval :: WordNet Affect 
Emphasis and Normalization improve the accuracy 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Basic 
Normalized 
Emphasized 
Norm-Emph 
Experiment 1 
58,03 
57,97 
58,25 
58,1 
not significant 
45 50 55 60 65 
52 
SemEval :: MPQA 
Emphasis improves the accuracy. Normalization doesnā€™t. 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Basic 
Normalized 
Emphasized 
Norm-Emph 
Experiment 1 
48,69 
47,25 
48,29 
48,08 
norm vs norm+emph 
significant (p < 0,0001) 
45 50 55 60 65 
53 
SemEval :: SenticNet 
No improvement 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 1 
54 
General Outcomes 
SentiWordNet WordNet Affect MPQA 
Emphasis leads to improvements 
(7 out of 8 comparisons). 
1. 
2. 
SenticNet 
Normalization doesnā€™t. (1 out of 
4 comparisons) 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Basic 
Normalized 
Emphasized 
Norm-Emph 
Experiment 1 
71,87 
72,42 
71,31 
71,59 
not significant 
gaps 
60 63,75 67,5 71,25 75 
55 
STS :: SentiWordNet 
Normalization improves the accuracy. Emphasis doesnā€™t 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Basic 
Normalized 
Emphasized 
Norm-Emph 
Experiment 1 
62,95 
62,67 
62,96 
62,95 
60 63,75 67,5 71,25 75 
56 
STS :: WordNet Affect 
not significant 
gaps 
Emphasis improves the accuracy. Normalization doesnā€™t 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Basic 
Normalized 
Emphasized 
Norm-Emph 
Experiment 1 
69,54 
70,75 
69,92 
70,76 
60 63,75 67,5 71,25 75 
57 
STS :: MPQA 
not significant 
gaps 
Both Emphasis and Normalization improve the accuracy. 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Basic 
Normalized 
Emphasized 
Norm-Emph 
Experiment 1 
74,37 
74,65 
74,65 
73,82 
not significant 
70 71,75 73,5 75,25 77 
58 
STS :: SenticNet 
Normalization improves the accuracy. Emphasis doesnā€™t 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 1 
SenticNet 
59 
General Outcomes 
SentiWordNet WordNet Affect MPQA 
1. 
Controversial behavior (normalization 
typically improves, emphasis doesnā€™t) 2. 
Little statistical significance 
(small dataset) 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
60 
Inter-Lexicons evaluation 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
61 
Comparison between lexicons 
Accuracy 
80 
60 
40 
20 
0 
SentiWordNet SenticNet WordNet-Affect MPQA 
58,25 
62,96 
55,08 
74,65 
48,69 
72,42 
SemEval-2013 STS 
70,76 
58,99 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
SentiWordNet is the best-performing configuration on SemEval data 
62 
Comparison between lexicons 
Accuracy 
80 
60 
40 
20 
0 
SentiWordNet SenticNet WordNet-Affect MPQA 
58,25 
62,96 
55,08 
74,65 
48,69 
72,42 
SemEval-2013 STS 
70,76 
58,99 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
63 
Comparison between lexicons 
Accuracy 
80 
60 
40 
20 
0 
SentiWordNet SenticNet WordNet-Affect MPQA 
58,25 
62,96 
55,08 
74,65 
48,69 
72,42 
SemEval-2013 STS 
70,76 
58,99 
MPQA well-performs on SemEval data 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
SenticNet has a controversial behavior: worst on SemEval - best on STS 
64 
Comparison between lexicons 
Accuracy 
80 
60 
40 
20 
0 
SentiWordNet SenticNet WordNet-Affect MPQA 
58,25 
62,96 
55,08 
74,65 
48,69 
72,42 
SemEval-2013 STS 
70,76 
58,99 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
Reason: SenticNet can hardly classify neutral Tweets (threshold learning?) 
65 
Comparison between lexicons 
Accuracy 
80 
60 
40 
20 
0 
SentiWordNet SenticNet WordNet-Affect MPQA 
58,25 
62,96 
55,08 
74,65 
48,69 
72,42 
SemEval-2013 STS 
70,76 
58,99 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
66 
Comparison between lexicons 
Accuracy 
80 
60 
40 
20 
0 
SentiWordNet SenticNet WordNet-Affect MPQA 
58,25 
62,96 
55,08 
74,65 
48,69 
72,42 
SemEval-2013 STS 
70,76 
58,99 
SentiWordNet and MPQA confirm their performance on STS 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
Poor coverage negatively influences Wordnet-Affect performances 
67 
Comparison between lexicons 
Accuracy 
80 
60 
40 
20 
0 
SentiWordNet SenticNet WordNet-Affect MPQA 
58,25 
62,96 
55,08 
74,65 
48,69 
72,42 
SemEval-2013 STS 
70,76 
58,99 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
68 
Statistical Analysis 
Accuracy 
80 
60 
40 
20 
0 
SentiWordNet SenticNet WordNet-Affect MPQA 
58,25 
62,96 
55,08 
74,65 
48,69 
72,42 
best p < 0,0001 p < 0,001 p < 0,50 p < 0,42 best p < 0,0001 p < 0,11 
SemEval-2013 STS 
70,76 
58,99 
= not significant gap = significant gap 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Experiment 2 
69 
Conclusions 
Accuracy 
80 
60 
40 
20 
0 
SentiWordNet SenticNet WordNet-Affect MPQA 
58,25 
62,96 
55,08 
74,65 
48,69 
72,42 
best p < 0,0001 p < 0,001 p < 0,50 p < 0,42 best p < 0,0001 p < 0,11 
SemEval-2013 STS 
70,76 
58,99 
= best-performing lexicons 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Conclusions 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 70 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Lessons Learned 
INVESTIGATION ABOUT THE EFFECTIVENESS OF LEXICAL RESOURCES IN 
POLARITY CLASSIFICATION OF MICROBLOG POSTS 
Comparison of 4 state-of-the-art resources 
71 
SentiWordNet - SenticNet - MPQA - WordNet Affect 
Evaluation. 
Research Question: What is the impact of each lexical resource in 
the task of polarity classification? 
MPQA and SentiWordNet typically overcome other resources 
(interesting result, due to the smaller coverage of MPQA) 
SenticNet behavior is worth to be deepen investigated 
1. 
2. 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
Future Research 
72 
Evaluation against different datasets and with 
more lexical results; 
Better tuning of parameters (classification 
threshold) , integration of more complex 
syntactic structures, merging lexical resources 
Integration of the algorithm in a 
recommendation framework to exploit 
sentiment-based information to model user 
interests 
Cataldo Musto, Giovanni Semeraro, Marco Polignano 
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
questions? 
Cataldo Musto, Ph.D 
cataldo.musto@uniba.it

More Related Content

What's hot

Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis pptAntaraBhattacharya12
Ā 
Introduction to Basics of Python
Introduction to Basics of PythonIntroduction to Basics of Python
Introduction to Basics of PythonElewayte
Ā 
Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...
Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...
Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...Edureka!
Ā 
Data Analysis in Python
Data Analysis in PythonData Analysis in Python
Data Analysis in PythonRichard Herrell
Ā 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media Ravindra Chaudhary
Ā 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysisM. Atif Qureshi
Ā 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWJournal For Research
Ā 
Sentiment classification for product reviews (documentation)
Sentiment classification for product reviews (documentation)Sentiment classification for product reviews (documentation)
Sentiment classification for product reviews (documentation)Mido Razaz
Ā 
System Analysis & Design Presentation.pdf
System Analysis & Design Presentation.pdfSystem Analysis & Design Presentation.pdf
System Analysis & Design Presentation.pdfAriful Islam
Ā 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & AnalysisScott Sanders
Ā 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
Ā 
IRE2014-Sentiment Analysis
IRE2014-Sentiment AnalysisIRE2014-Sentiment Analysis
IRE2014-Sentiment AnalysisGangasagar Patil
Ā 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processingrohitnayak
Ā 
Data Structures and Algorithm Analysis
Data Structures  and  Algorithm AnalysisData Structures  and  Algorithm Analysis
Data Structures and Algorithm AnalysisMary Margarat
Ā 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysisDataminingTools Inc
Ā 
News articles classification
News articles classificationNews articles classification
News articles classificationShrikrishna Parab
Ā 
Information Retrieval-4(inverted index_&amp;_query handling)
Information Retrieval-4(inverted index_&amp;_query handling)Information Retrieval-4(inverted index_&amp;_query handling)
Information Retrieval-4(inverted index_&amp;_query handling)Jeet Das
Ā 

What's hot (20)

Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
Ā 
Introduction to Basics of Python
Introduction to Basics of PythonIntroduction to Basics of Python
Introduction to Basics of Python
Ā 
Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...
Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...
Tkinter Python Tutorial | Python GUI Programming Using Tkinter Tutorial | Pyt...
Ā 
Data Analysis in Python
Data Analysis in PythonData Analysis in Python
Data Analysis in Python
Ā 
Web data mining
Web data miningWeb data mining
Web data mining
Ā 
sentiment analysis text extraction from social media
sentiment  analysis text extraction from social media sentiment  analysis text extraction from social media
sentiment analysis text extraction from social media
Ā 
Text classification & sentiment analysis
Text classification & sentiment analysisText classification & sentiment analysis
Text classification & sentiment analysis
Ā 
Text summarization
Text summarization Text summarization
Text summarization
Ā 
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEWSENTIMENT ANALYSIS-AN OBJECTIVE VIEW
SENTIMENT ANALYSIS-AN OBJECTIVE VIEW
Ā 
Sentiment classification for product reviews (documentation)
Sentiment classification for product reviews (documentation)Sentiment classification for product reviews (documentation)
Sentiment classification for product reviews (documentation)
Ā 
System Analysis & Design Presentation.pdf
System Analysis & Design Presentation.pdfSystem Analysis & Design Presentation.pdf
System Analysis & Design Presentation.pdf
Ā 
Social Media Data Collection & Analysis
Social Media Data Collection & AnalysisSocial Media Data Collection & Analysis
Social Media Data Collection & Analysis
Ā 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
Ā 
IRE2014-Sentiment Analysis
IRE2014-Sentiment AnalysisIRE2014-Sentiment Analysis
IRE2014-Sentiment Analysis
Ā 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
Ā 
Data Structures and Algorithm Analysis
Data Structures  and  Algorithm AnalysisData Structures  and  Algorithm Analysis
Data Structures and Algorithm Analysis
Ā 
Pointers - DataStructures
Pointers - DataStructuresPointers - DataStructures
Pointers - DataStructures
Ā 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
Ā 
News articles classification
News articles classificationNews articles classification
News articles classification
Ā 
Information Retrieval-4(inverted index_&amp;_query handling)
Information Retrieval-4(inverted index_&amp;_query handling)Information Retrieval-4(inverted index_&amp;_query handling)
Information Retrieval-4(inverted index_&amp;_query handling)
Ā 

Similar to A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts

Combining Distributional Semantics and Entity Linking for Context-aware Conte...
Combining Distributional Semantics and Entity Linking for Context-aware Conte...Combining Distributional Semantics and Entity Linking for Context-aware Conte...
Combining Distributional Semantics and Entity Linking for Context-aware Conte...Cataldo Musto
Ā 
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Cataldo Musto
Ā 
Discourse-Centric Learning Analytics
Discourse-Centric Learning AnalyticsDiscourse-Centric Learning Analytics
Discourse-Centric Learning AnalyticsSimon Buckingham Shum
Ā 
ESWC 2014 Tutorial Part 4
ESWC 2014 Tutorial Part 4ESWC 2014 Tutorial Part 4
ESWC 2014 Tutorial Part 4Miriam Fernandez
Ā 
An evaluation of SimRank and Personalized PageRank to build a recommender sys...
An evaluation of SimRank and Personalized PageRank to build a recommender sys...An evaluation of SimRank and Personalized PageRank to build a recommender sys...
An evaluation of SimRank and Personalized PageRank to build a recommender sys...Paolo Tomeo
Ā 
Corneli
CorneliCorneli
Cornelianesah
Ā 
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on TwitterAn Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on TwitterSymeon Papadopoulos
Ā 
Anu paper(IJARCCE)
Anu paper(IJARCCE)Anu paper(IJARCCE)
Anu paper(IJARCCE)Anu Maheshwari
Ā 
Impact your Library UX with Contextual Inquiry
Impact your Library UX with Contextual InquiryImpact your Library UX with Contextual Inquiry
Impact your Library UX with Contextual InquiryRachel Vacek
Ā 
Linked Open Data-enabled Strategies for Top-N Recommendations
Linked Open Data-enabled Strategies for Top-N RecommendationsLinked Open Data-enabled Strategies for Top-N Recommendations
Linked Open Data-enabled Strategies for Top-N RecommendationsCataldo Musto
Ā 
Transcript - DOIs to support citation of grey literature
Transcript - DOIs to support citation of grey literatureTranscript - DOIs to support citation of grey literature
Transcript - DOIs to support citation of grey literatureARDC
Ā 
SATANJEEV BANERJEE
SATANJEEV BANERJEESATANJEEV BANERJEE
SATANJEEV BANERJEEbutest
Ā 
Academia, part of my 2014-2015 lectures at the University of Bergamo.
Academia, part of my 2014-2015 lectures at the University of Bergamo.Academia, part of my 2014-2015 lectures at the University of Bergamo.
Academia, part of my 2014-2015 lectures at the University of Bergamo.Roberto Peretta
Ā 
FoCAS Newsletter Issue Two: January 2014
FoCAS Newsletter Issue Two: January 2014FoCAS Newsletter Issue Two: January 2014
FoCAS Newsletter Issue Two: January 2014FoCAS Initiative
Ā 
Supervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmSupervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmIJSRD
Ā 
The Effect of Different Set-based Visualizations on User Exploration of Reco...
The Effect of Different Set-based  Visualizations on User Exploration of Reco...The Effect of Different Set-based  Visualizations on User Exploration of Reco...
The Effect of Different Set-based Visualizations on User Exploration of Reco...Denis Parra Santander
Ā 
Ed-Media2010- De Liddo
Ed-Media2010- De LiddoEd-Media2010- De Liddo
Ed-Media2010- De LiddoAnna De Liddo
Ā 
From Open Content To Open Thinking
From Open Content To Open ThinkingFrom Open Content To Open Thinking
From Open Content To Open ThinkingAnna De Liddo
Ā 

Similar to A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts (20)

Combining Distributional Semantics and Entity Linking for Context-aware Conte...
Combining Distributional Semantics and Entity Linking for Context-aware Conte...Combining Distributional Semantics and Entity Linking for Context-aware Conte...
Combining Distributional Semantics and Entity Linking for Context-aware Conte...
Ā 
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Tuning Personalized PageRank for Semantics-aware Recommendations based on Lin...
Ā 
Discourse-Centric Learning Analytics
Discourse-Centric Learning AnalyticsDiscourse-Centric Learning Analytics
Discourse-Centric Learning Analytics
Ā 
ESWC 2014 Tutorial Part 4
ESWC 2014 Tutorial Part 4ESWC 2014 Tutorial Part 4
ESWC 2014 Tutorial Part 4
Ā 
An evaluation of SimRank and Personalized PageRank to build a recommender sys...
An evaluation of SimRank and Personalized PageRank to build a recommender sys...An evaluation of SimRank and Personalized PageRank to build a recommender sys...
An evaluation of SimRank and Personalized PageRank to build a recommender sys...
Ā 
Corneli
CorneliCorneli
Corneli
Ā 
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on TwitterAn Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
Ā 
Anu paper(IJARCCE)
Anu paper(IJARCCE)Anu paper(IJARCCE)
Anu paper(IJARCCE)
Ā 
Impact your Library UX with Contextual Inquiry
Impact your Library UX with Contextual InquiryImpact your Library UX with Contextual Inquiry
Impact your Library UX with Contextual Inquiry
Ā 
Linked Open Data-enabled Strategies for Top-N Recommendations
Linked Open Data-enabled Strategies for Top-N RecommendationsLinked Open Data-enabled Strategies for Top-N Recommendations
Linked Open Data-enabled Strategies for Top-N Recommendations
Ā 
Transcript - DOIs to support citation of grey literature
Transcript - DOIs to support citation of grey literatureTranscript - DOIs to support citation of grey literature
Transcript - DOIs to support citation of grey literature
Ā 
SATANJEEV BANERJEE
SATANJEEV BANERJEESATANJEEV BANERJEE
SATANJEEV BANERJEE
Ā 
Academia, part of my 2014-2015 lectures at the University of Bergamo.
Academia, part of my 2014-2015 lectures at the University of Bergamo.Academia, part of my 2014-2015 lectures at the University of Bergamo.
Academia, part of my 2014-2015 lectures at the University of Bergamo.
Ā 
Sub1557
Sub1557Sub1557
Sub1557
Ā 
N01741100102
N01741100102N01741100102
N01741100102
Ā 
FoCAS Newsletter Issue Two: January 2014
FoCAS Newsletter Issue Two: January 2014FoCAS Newsletter Issue Two: January 2014
FoCAS Newsletter Issue Two: January 2014
Ā 
Supervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithmSupervised Sentiment Classification using DTDP algorithm
Supervised Sentiment Classification using DTDP algorithm
Ā 
The Effect of Different Set-based Visualizations on User Exploration of Reco...
The Effect of Different Set-based  Visualizations on User Exploration of Reco...The Effect of Different Set-based  Visualizations on User Exploration of Reco...
The Effect of Different Set-based Visualizations on User Exploration of Reco...
Ā 
Ed-Media2010- De Liddo
Ed-Media2010- De LiddoEd-Media2010- De Liddo
Ed-Media2010- De Liddo
Ā 
From Open Content To Open Thinking
From Open Content To Open ThinkingFrom Open Content To Open Thinking
From Open Content To Open Thinking
Ā 

More from Cataldo Musto

MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...Cataldo Musto
Ā 
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationFairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationCataldo Musto
Ā 
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Cataldo Musto
Ā 
Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Cataldo Musto
Ā 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Cataldo Musto
Ā 
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Cataldo Musto
Ā 
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Cataldo Musto
Ā 
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsHybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsCataldo Musto
Ā 
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Cataldo Musto
Ā 
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeL'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeCataldo Musto
Ā 
Explanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemExplanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemCataldo Musto
Ā 
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Cataldo Musto
Ā 
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...Cataldo Musto
Ā 
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfMyrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfCataldo Musto
Ā 
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Cataldo Musto
Ā 
Holistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesHolistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesCataldo Musto
Ā 
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsA Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsCataldo Musto
Ā 
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?Cataldo Musto
Ā 
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Cataldo Musto
Ā 
Il Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkIl Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkCataldo Musto
Ā 

More from Cataldo Musto (20)

MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
Ā 
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationFairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Ā 
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Ā 
Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...
Ā 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Ā 
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Ā 
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Ā 
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsHybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Ā 
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Ā 
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeL'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
Ā 
Explanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemExplanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender System
Ā 
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Ā 
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
Ā 
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfMyrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
Ā 
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Ā 
Holistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesHolistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart Cities
Ā 
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsA Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
Ā 
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
Ā 
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Ā 
Il Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkIl Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social Network
Ā 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
Ā 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
Ā 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
Ā 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
Ā 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
Ā 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
Ā 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
Ā 
Nellā€™iperspazio con Rocket: il Framework Web di Rust!
Nellā€™iperspazio con Rocket: il Framework Web di Rust!Nellā€™iperspazio con Rocket: il Framework Web di Rust!
Nellā€™iperspazio con Rocket: il Framework Web di Rust!Commit University
Ā 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
Ā 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
Ā 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
Ā 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
Ā 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
Ā 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
Ā 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
Ā 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
Ā 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
Ā 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
Ā 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
Ā 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
Ā 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
Ā 
Hot Sexy call girls in Panjabi Bagh šŸ” 9953056974 šŸ” Delhi escort Service
Hot Sexy call girls in Panjabi Bagh šŸ” 9953056974 šŸ” Delhi escort ServiceHot Sexy call girls in Panjabi Bagh šŸ” 9953056974 šŸ” Delhi escort Service
Hot Sexy call girls in Panjabi Bagh šŸ” 9953056974 šŸ” Delhi escort Service
Ā 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
Ā 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
Ā 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
Ā 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Ā 
Nellā€™iperspazio con Rocket: il Framework Web di Rust!
Nellā€™iperspazio con Rocket: il Framework Web di Rust!Nellā€™iperspazio con Rocket: il Framework Web di Rust!
Nellā€™iperspazio con Rocket: il Framework Web di Rust!
Ā 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Ā 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
Ā 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
Ā 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
Ā 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
Ā 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
Ā 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
Ā 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
Ā 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
Ā 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
Ā 

A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts

  • 1. DART 2014 8th Internation Workshop on Information Filtering and Retrieval Pisa (Italy) December 10, 2014 A comparison of lexicon-based approaches for Sentiment Analysis of microblog posts Cataldo Musto, Giovanni Semeraro, Marco Polignano (UniversitĆ  degli Studi di Bari ā€˜Aldo Moroā€™, Italy - SWAP Research Group)
  • 2. Outline ā€¢ Background ā€¢ Sentiment Analysis ā€¢ Lexicon-based approaches ā€¢ Methodology ā€¢ State-of-the-art lexicons ā€¢ Experiments ā€¢ Conclusions Cataldo Musto, Giovanni Semeraro, Marco Polignano 2 A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 3. Background One minute on the Web Cataldo Musto, Giovanni Semeraro, Marco Polignano 3 A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 4. Background One minute on the Web 4 Information Overload Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 5. 5 Background Information Overload Obstacleor Opportunity? Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 6. 6 Opportunities (Social) Content Analytics Insight: to aggregate rough human-generated data to get valuable people-based findings Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 7. - Real-time polls 7 Social Content Analytics Applications - Social CRM - Online brand monitoring All these applications share a common denominator Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 8. - Real-time polls They all need a methodology to automatically associate an opinion and/or a polarity to each piece of content 8 Social Content Analytics Applications - Social CRM - Online brand monitoring All these applications share a common denominator Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 9. - Real-time polls 9 Social Content Analytics Applications - Social CRM Solution: - Online brand monitoring Sentiment Analysis All these applications share a common denominator They all need a methodology to automatically associate an opinion and/or a polarity to each piece of content Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 10. 10 Sentiment Analysis Definition ā€œIt is the field of study that analyzes peopleā€™s opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes ā€œ (*) (Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008) Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 11. 11 Sentiment Analysis Definition ā€œIt is the field of study that analyzes peopleā€™s opinions, sentiments, evaluations, appraisals, attitudes, and emotions towards entities such as products, services, organizations, individuals, issues, events, topics, and their attributes ā€œ (*) (Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008) We will focus on the polarity detection task Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 12. 12 Sentiment Analysis State of the art Supervised Approaches (Machine Learning-based) Unsupervised Approaches (Lexicon-based) Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 13. Man ? 13 Sentiment Analysis Supervised approaches Dog Learn a classification model relying on labeled examples Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 14. frustration - - joy +++ 14 Sentiment Analysis Unsupervised approaches Rely on external lexical resources that associate a polarity score to each term. Sentiment of the content depends on the sentiment of the terms which compose it. Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 15. 15 Sentiment Analysis Supervised vs Unsupervised Pros Cons Nakov, Preslav, et al. "Semeval-2013 task 2: Sentiment analysis in Twitter.ā€ Proceedings of SemEval 2013 Rosenthal, Sara, et al. "Semeval-2014 task 9: Sentiment analysis in Twitter." Proceedings of SemEval 2014. (*) (**) Supervised Higher Accuracy (*) (**) Pre-labeled examples Unsupervised No Training Accuracy depends on lexical resources Several lexical resources available Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 16. Pros Cons Supervised Higher Accuracy (*) (**) Pre-labeled examples Unsupervised No Training Accuracy depends on lexical resources Several lexical resources available We focus on lexicon-based approaches 16 Sentiment Analysis Supervised vs Unsupervised Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 17. 17 Contributions We propose a novel unsupervised lexicon-based approach for sentiment analysis We provide a comparison of lexical resources for sentiment analysis of microblog posts 1. 2. Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 18. 18 Methodology Lexicon-based approach Insight: The polarity of a textual content (e.g. a microblog posts) depends on the polarity of the microphrases which compose it. Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 19. 19 Methodology Lexicon-based approach Insight: The polarity of a textual content (e.g. a microblog posts) depends on the polarity of the microphrases which compose it. A microphrase is built whenever a splitting cue is found in the text Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 20. Conjunctions, adverbs and punctuations are used as 20 Methodology Lexicon-based approach Insight: The polarity of a textual content (e.g. a microblog posts) depends on the polarity of the microphrases which compose it. A microphrase is built whenever a splitting cue is found in the text splitting cues Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 21. Conjunctions, adverbs and punctuations are used as 21 Methodology Lexicon-based approach Insight: The polarity of a textual content (e.g. a microblog posts) depends on the polarity of the microphrases which compose it. A microphrase is built whenever a splitting cue is found in the text splitting cues example: ā€œI donā€™t like this food, itā€™s terribleā€ Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 22. Conjunctions, adverbs and punctuations are used as 22 Methodology Lexicon-based approach Insight: The polarity of a textual content (e.g. a microblog posts) depends on the polarity of the microphrases which compose it. A microphrase is built whenever a splitting cue is found in the text splitting cues example: ā€œI donā€™t like this food, itā€™s terribleā€ { { splitting m1 cue m2 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 23. 23 Methodology Lexicon-based approach Insight: The polarity of a textual content (e.g. a microblog posts) depends on the polarity of the microphrases which compose it. k pol(T) = Ī£ pol(mi) i=1 Tweet microphrase T={m1ā€¦mk} Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 24. 24 Methodology Lexicon-based approach Insight: The polarity of a microphrase depends on the polarity of the terms which compose it. k pol(T) = Ī£ pol(mi) i=1 Tweet microphrase n pol(mi) = Ī£ score(tj) j=1 term T={m1ā€¦mk} Mi={t1ā€¦tn} Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 25. 25 Methodology Four variant proposed Basic k pol(T) = Ī£ pol(mi) i=1 n pol(mi) = Ī£ score(tj) j=1 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 26. Four variant proposed Normalized pol(T) = Ī£ pol(mi) i=1 pol(mi) = Ī£ score(tj) 26 Methodology Basic k pol(T) = Ī£ pol(mi) i=1 n pol(mi) = Ī£ score(tj) j=1 n |mi| j=1 Score of each microphrase is normalized according to its length Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 27. Four variant proposed Normalized pol(T) = Ī£ pol(mi) i=1 pol(mi) = Ī£ score(tj) with an higher weight categories=adverbs, verbs, adjectives & valence 27 Methodology Basic k pol(T) = Ī£ pol(mi) i=1 n pol(mi) = Ī£ score(tj) j=1 n |mi| j=1 Emphasized pol(T) = Ī£ pol(mi) i=1 pol(mi) = n Ī£ score(tj) j=1 *w(tj) Specific categories are provided && valence shifters (intensifiers & downtoners) Several weights have been evaluated Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 28. Four variant proposed Normalized pol(T) = Ī£ pol(mi) i=1 pol(mi) = Ī£ score(tj) 28 Methodology Basic k pol(T) = Ī£ pol(mi) i=1 n pol(mi) = Ī£ score(tj) j=1 n |mi| j=1 Emphasized Normalized-Emphasized pol(T) = Ī£ pol(mi) i=1 pol(mi) = n Ī£ score(tj) j=1 pol(T) = Ī£ pol(mi) pol(mi) = Ī£score(tj) Combination |mi| *w(tj) *w(tj) Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 29. We have a problem Normalized pol(T) = Ī£ pol(mi) i=1 pol(mi) = Ī£ score(tj) 29 Methodology Basic k pol(T) = Ī£ pol(mi) i=1 n pol(mi) = Ī£ score(tj) j=1 n |mi| j=1 Emphasized Normalized-Emphasized pol(T) = Ī£ pol(mi) i=1 pol(mi) = n Ī£ score(tj) j=1 pol(T) = Ī£ pol(mi) pol(mi) = Ī£score(tj) |mi| *w(tj) *w(tj) Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 30. We have a problem Normalized pol(T) = Ī£ pol(mi) i=1 pol(mi) = Ī£ How to calculate score(score(tj) ? tj) 30 Methodology Basic k pol(T) = Ī£ pol(mi) i=1 n pol(mi) = Ī£ score(tj) j=1 n |mi| j=1 Emphasized Normalized-Emphasized pol(T) = Ī£ pol(mi) i=1 pol(mi) = n Ī£ score(tj) j=1 pol(T) = Ī£ pol(mi) pol(mi) = Ī£score(tj) |mi| *w(tj) *w(tj) Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 31. 31 Solution Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 32. 32 Lexical Resources State of the art We evaluated four state-of-the-art resources for sentiment analysis SentiWordNet http://sentiwordnet.isti.cnr.it WordNet Affect http://wndomains.fbk.eu/wnaffect.html SenticNet http://sentic.net MPQA http://mpqa.cs.pitt.edu Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 33. 33 Lexical Resources SentiWordNet(*) Each WordNet synset is provided with three different sentiment scores (positivity, negativity, objectivity) (*) Baccianella, Stefano, Andrea Esuli, and Fabrizio Sebastiani. "SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining." LREC. Vol. 10. 2010. Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 34. 34 Lexical Resources WordNet Affect(*) WordNet extension Affective-related synsets are mapped with an A-Label e.g. euphoria ā€”> positive-emotion illness ā€”> physical state (*) Strapparava, Carlo, and Alessandro Valitutti. "WordNet Affect: an Affective Extension of WordNet." LREC. Vol. 4. 2004. Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 35. 35 Lexical Resources SenticNet(*) Inspired by the Hourglass of Emotions model Each term is represented of the ground of the intensity of four basic emotional dimensions (sensitivity, aptitude, attention, pleasantness) The activation level of each dimension defines 16 basic emotions (*) Cambria, Erik, Daniel Olsher, and Dheeraj Rajagopal. "SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis." Twenty-eighth AAAI conference on artificial intelligence. 2014. Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 36. 36 Lexical Resources SenticNet(*) According to the triggered emotions, each term is provided with an aggregated polarity score Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 37. 37 Lexical Resources SenticNet(*) SenticNet models a sentiment score for some bigrams and trigrams as well! Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 38. 38 Lexical Resources MPQA(*) (*) Wilson, Theresa, Janyce Wiebe, and Paul Hoffmann. "Recognizing contextual polarity in phrase-level sentiment analysis." Proceedings of the conference on human language technology and empirical methods in natural language processing. Association for Computational Linguistics, 2005. Each term is (manually) provided with a discrete sentiment score +1 positive 0 neutral -1 negative Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 39. 39 Lexical Resources Comparison Resource Coverage (terms) SentiWordNet 117,659 WordNet Affect 200 SenticNet 14,000 MPQA 8,222 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 40. Cataldo Musto, Giovanni Semeraro, Marco Polignano 40 A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 41. 41 Lexical Resources Score calculation SentiWordNet Given a term, score(tj) is the mean of the sentiment score of all the possible synsets of tj score(good) = 0.75 + 0 + 1 +1 = 4 0.687 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 42. Score calculation Given a term, score(tj), WordNet Affect hierarchy is climbed until an A-Label which occur in SentiWordNet is found. tj inherits the sentiment score of the A-Label score(good) = score(benevolence) = 0.339 42 Lexical Resources WordNet Affect Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 43. 43 Lexical Resources Score calculation SenticNet Given a term, score(tj), SenticNet APIs are queried and sentiment score is extracted score(good) = 0.883 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 44. 44 Lexical Resources Score calculation MPQA Given a term, score(tj), MPQA Lexicon are queried and sentiment score is extracted score(good) = 1 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 45. 45 Methodology Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 46. Experimental Evaluation Research Hypothesis 46 1. How do the different versions of the algorithm perform with respect to state-of-the- art datasets? 2. What is the best lexical resource to detect the polarity of microblog posts? Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 47. Experimental Evaluation Description of the datasets 47 ā€¢ SemEval-2013 ā€¢ 14,435 Tweets ā€¢ 8,180 training ā€¢ 3,255 test ā€¢ Positive, Negative, Neutral ā€¢ STS Dataset ā€¢ 1,600,000 Tweets ā€¢ only 359 test ā€¢ Positive, Negative Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 48. Experimental Evaluation Statistics about Coverage 48 Lexicon SemEval-2013-Test STS-Test Vocabulary Size 18,309 6,711 SentiWordNet 4,314 883 WordNet-Affect 149 48 MPQA 897 224 SenticNet 1,497 326 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 49. Experiment 1 49 Intra-Lexicons evaluation Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 50. norm vs norm+emph significant (p < 0,0001) Basic Normalized Emphasized Norm-Emph Experiment 1 57,67 58,1 58,65 58,99 45 50 55 60 65 50 SemEval :: SentiWordNet Emphasis and Normalization improve the accuracy Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 51. Basic Normalized Emphasized Norm-Emph Experiment 1 53,92 55,05 53,95 55,08 not significant 45 50 55 60 65 51 SemEval :: WordNet Affect Emphasis and Normalization improve the accuracy Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 52. Basic Normalized Emphasized Norm-Emph Experiment 1 58,03 57,97 58,25 58,1 not significant 45 50 55 60 65 52 SemEval :: MPQA Emphasis improves the accuracy. Normalization doesnā€™t. Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 53. Basic Normalized Emphasized Norm-Emph Experiment 1 48,69 47,25 48,29 48,08 norm vs norm+emph significant (p < 0,0001) 45 50 55 60 65 53 SemEval :: SenticNet No improvement Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 54. Experiment 1 54 General Outcomes SentiWordNet WordNet Affect MPQA Emphasis leads to improvements (7 out of 8 comparisons). 1. 2. SenticNet Normalization doesnā€™t. (1 out of 4 comparisons) Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 55. Basic Normalized Emphasized Norm-Emph Experiment 1 71,87 72,42 71,31 71,59 not significant gaps 60 63,75 67,5 71,25 75 55 STS :: SentiWordNet Normalization improves the accuracy. Emphasis doesnā€™t Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 56. Basic Normalized Emphasized Norm-Emph Experiment 1 62,95 62,67 62,96 62,95 60 63,75 67,5 71,25 75 56 STS :: WordNet Affect not significant gaps Emphasis improves the accuracy. Normalization doesnā€™t Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 57. Basic Normalized Emphasized Norm-Emph Experiment 1 69,54 70,75 69,92 70,76 60 63,75 67,5 71,25 75 57 STS :: MPQA not significant gaps Both Emphasis and Normalization improve the accuracy. Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 58. Basic Normalized Emphasized Norm-Emph Experiment 1 74,37 74,65 74,65 73,82 not significant 70 71,75 73,5 75,25 77 58 STS :: SenticNet Normalization improves the accuracy. Emphasis doesnā€™t Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 59. Experiment 1 SenticNet 59 General Outcomes SentiWordNet WordNet Affect MPQA 1. Controversial behavior (normalization typically improves, emphasis doesnā€™t) 2. Little statistical significance (small dataset) Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 60. Experiment 2 60 Inter-Lexicons evaluation Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 61. Experiment 2 61 Comparison between lexicons Accuracy 80 60 40 20 0 SentiWordNet SenticNet WordNet-Affect MPQA 58,25 62,96 55,08 74,65 48,69 72,42 SemEval-2013 STS 70,76 58,99 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 62. Experiment 2 SentiWordNet is the best-performing configuration on SemEval data 62 Comparison between lexicons Accuracy 80 60 40 20 0 SentiWordNet SenticNet WordNet-Affect MPQA 58,25 62,96 55,08 74,65 48,69 72,42 SemEval-2013 STS 70,76 58,99 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 63. Experiment 2 63 Comparison between lexicons Accuracy 80 60 40 20 0 SentiWordNet SenticNet WordNet-Affect MPQA 58,25 62,96 55,08 74,65 48,69 72,42 SemEval-2013 STS 70,76 58,99 MPQA well-performs on SemEval data Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 64. Experiment 2 SenticNet has a controversial behavior: worst on SemEval - best on STS 64 Comparison between lexicons Accuracy 80 60 40 20 0 SentiWordNet SenticNet WordNet-Affect MPQA 58,25 62,96 55,08 74,65 48,69 72,42 SemEval-2013 STS 70,76 58,99 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 65. Experiment 2 Reason: SenticNet can hardly classify neutral Tweets (threshold learning?) 65 Comparison between lexicons Accuracy 80 60 40 20 0 SentiWordNet SenticNet WordNet-Affect MPQA 58,25 62,96 55,08 74,65 48,69 72,42 SemEval-2013 STS 70,76 58,99 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 66. Experiment 2 66 Comparison between lexicons Accuracy 80 60 40 20 0 SentiWordNet SenticNet WordNet-Affect MPQA 58,25 62,96 55,08 74,65 48,69 72,42 SemEval-2013 STS 70,76 58,99 SentiWordNet and MPQA confirm their performance on STS Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 67. Experiment 2 Poor coverage negatively influences Wordnet-Affect performances 67 Comparison between lexicons Accuracy 80 60 40 20 0 SentiWordNet SenticNet WordNet-Affect MPQA 58,25 62,96 55,08 74,65 48,69 72,42 SemEval-2013 STS 70,76 58,99 Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 68. Experiment 2 68 Statistical Analysis Accuracy 80 60 40 20 0 SentiWordNet SenticNet WordNet-Affect MPQA 58,25 62,96 55,08 74,65 48,69 72,42 best p < 0,0001 p < 0,001 p < 0,50 p < 0,42 best p < 0,0001 p < 0,11 SemEval-2013 STS 70,76 58,99 = not significant gap = significant gap Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 69. Experiment 2 69 Conclusions Accuracy 80 60 40 20 0 SentiWordNet SenticNet WordNet-Affect MPQA 58,25 62,96 55,08 74,65 48,69 72,42 best p < 0,0001 p < 0,001 p < 0,50 p < 0,42 best p < 0,0001 p < 0,11 SemEval-2013 STS 70,76 58,99 = best-performing lexicons Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 70. Conclusions Cataldo Musto, Giovanni Semeraro, Marco Polignano 70 A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 71. Lessons Learned INVESTIGATION ABOUT THE EFFECTIVENESS OF LEXICAL RESOURCES IN POLARITY CLASSIFICATION OF MICROBLOG POSTS Comparison of 4 state-of-the-art resources 71 SentiWordNet - SenticNet - MPQA - WordNet Affect Evaluation. Research Question: What is the impact of each lexical resource in the task of polarity classification? MPQA and SentiWordNet typically overcome other resources (interesting result, due to the smaller coverage of MPQA) SenticNet behavior is worth to be deepen investigated 1. 2. Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 72. Future Research 72 Evaluation against different datasets and with more lexical results; Better tuning of parameters (classification threshold) , integration of more complex syntactic structures, merging lexical resources Integration of the algorithm in a recommendation framework to exploit sentiment-based information to model user interests Cataldo Musto, Giovanni Semeraro, Marco Polignano A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014
  • 73. questions? Cataldo Musto, Ph.D cataldo.musto@uniba.it