A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts

DART 2014
8th Internation Workshop on
Information Filtering and Retrieval
Pisa (Italy)
December 10, 2014
A comparison of lexicon-based
approaches for Sentiment Analysis
of microblog posts
Cataldo Musto, Giovanni Semeraro, Marco Polignano
(Università degli Studi di Bari ‘Aldo Moro’, Italy - SWAP Research Group)

Outline
• Background
• Sentiment Analysis
• Lexicon-based approaches
• Methodology
• State-of-the-art
lexicons
• Experiments
• Conclusions
Cataldo Musto, Giovanni Semeraro, Marco Polignano 2
A comparison of lexicon-based approaches for sentiment analysis of microblog posts. DART 2014 Workshop, Pisa(Italy) 10.12.2014

Background
One minute on the Web

Background
One minute on the Web
4
Information
Overload

5
Background
Information Overload
Obstacleor Opportunity?

6
Opportunities
(Social) Content Analytics
Insight: to aggregate rough human-generated data to get
valuable people-based findings

- Real-time polls
7
Social Content Analytics
Applications
- Social CRM
- Online brand
monitoring
All these applications share a common denominator

- Real-time polls
They all need a methodology to automatically associate
an opinion and/or a polarity to each piece of content
8
Applications
- Social CRM
- Online brand
monitoring

- Real-time polls
9
Applications
- Social CRM
Solution:
- Online brand
monitoring
Sentiment Analysis
They all need a methodology to automatically associate
an opinion and/or a polarity to each piece of content

10
Sentiment Analysis
Definition
“It is the field of study that
analyzes people’s
opinions, sentiments,
evaluations, appraisals,
attitudes, and emotions
towards entities such as
products, services,
organizations, individuals,
issues, events, topics, and
their attributes “ (*)
(Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008)

11
Sentiment Analysis
Definition
“It is the field of study that
analyzes people’s
opinions, sentiments,
evaluations, appraisals,
attitudes, and emotions
towards entities such as
products, services,
organizations, individuals,
issues, events, topics, and
their attributes “ (*)
(Pang, Bo, and Lillian Lee. "Opinion mining and sentiment analysis." Foundations and trends in information retrieval, 2008)
We will focus on the polarity detection task

12
Sentiment Analysis
State of the art
Supervised
Approaches
(Machine Learning-based)
Unsupervised
Approaches
(Lexicon-based)

Man ?
13
Sentiment Analysis
Supervised approaches
Dog
Learn a classification model
relying on labeled examples

frustration - -
joy +++
14
Sentiment Analysis
Unsupervised approaches
Rely on external lexical resources
that associate a polarity score to each term.
Sentiment of the content depends on
the sentiment of the terms which compose it.

15
Sentiment Analysis
Supervised vs Unsupervised
Pros Cons
Nakov, Preslav, et al. "Semeval-2013 task 2: Sentiment analysis in Twitter.”
Proceedings of SemEval 2013
Rosenthal, Sara, et al. "Semeval-2014 task 9: Sentiment analysis in Twitter."
Proceedings of SemEval 2014.
(*)
(**)
Supervised
Higher Accuracy
(*) (**)
Pre-labeled
examples
Unsupervised No Training
Accuracy depends on lexical
resources
Several lexical resources available

Pros Cons
Supervised
Higher Accuracy
(*) (**)
Pre-labeled
examples
Unsupervised No Training
Accuracy depends on lexical
resources
Several lexical resources available
We focus on
lexicon-based approaches
16
Sentiment Analysis
Supervised vs Unsupervised

17
Contributions
We propose a novel
unsupervised lexicon-based
approach for
sentiment analysis
We provide a
comparison of
lexical resources for
sentiment analysis of
microblog posts
1.
2.

18
Methodology
Lexicon-based approach
Insight:
The polarity of a textual content (e.g. a
microblog posts) depends on the polarity
of the microphrases which compose it.

19
Methodology
Insight:
A microphrase is built
whenever a splitting cue
is found in the text

Conjunctions, adverbs and
punctuations are used as
20
Methodology
Insight:
splitting cues

21
Methodology
Insight:
splitting cues
example: “I don’t like this food, it’s terrible”

22
Methodology
Insight:
splitting cues
example: “I don’t like this food, it’s terrible”
{
{
splitting
m1 cue
m2

23
Methodology
Insight:
k
pol(T) = Σ pol(mi)
i=1
Tweet microphrase
T={m1…mk}

24
Methodology
Insight:
The polarity of a microphrase depends on
the polarity of the terms which compose it.
k
pol(T) = Σ pol(mi)
i=1
Tweet microphrase
n
pol(mi) = Σ score(tj)
j=1
term
T={m1…mk}
Mi={t1…tn}

25
Methodology
Four variant proposed
Basic
k
pol(T) = Σ pol(mi) i=1
n
j=1

Normalized
pol(mi) = Σ
score(tj)
26
Methodology
Basic
k
n
j=1
n
|mi|
j=1
Score of each microphrase is normalized
according to its length

Normalized
pol(mi) = Σ
score(tj)
with an higher weight
categories=adverbs, verbs, adjectives & valence
27
Methodology
Basic
k
n
j=1
n
|mi|
j=1
Emphasized
pol(mi) = n
Σ score(tj)
j=1
*w(tj)
Specific categories are provided
&&
valence shifters (intensifiers & downtoners)
Several weights have been evaluated

Normalized
pol(mi) = Σ
score(tj)
28
Methodology
Basic
k
n
j=1
n
|mi|
j=1
Emphasized Normalized-Emphasized
pol(mi) = n
Σ score(tj)
j=1
pol(T) = Σ pol(mi)
pol(mi) = Σscore(tj)
Combination
|mi| *w(tj) *w(tj)

We have a problem
Normalized
pol(mi) = Σ
score(tj)
29
Methodology
Basic
k
n
j=1
n
|mi|
j=1
pol(mi) = n
Σ score(tj)
j=1
pol(T) = Σ pol(mi)
|mi| *w(tj) *w(tj)

We have a problem
Normalized
pol(mi) = Σ
How to calculate
score(score(tj) ?
tj)
30
Methodology
Basic
k
n
j=1
n
|mi|
j=1
pol(mi) = n
Σ score(tj)
j=1
pol(T) = Σ pol(mi)
|mi| *w(tj) *w(tj)

31
Solution

32
Lexical Resources
State of the art
We evaluated four state-of-the-art
resources for sentiment analysis
SentiWordNet
http://sentiwordnet.isti.cnr.it
WordNet Affect
http://wndomains.fbk.eu/wnaffect.html
SenticNet
http://sentic.net
MPQA
http://mpqa.cs.pitt.edu

33
Lexical Resources SentiWordNet(*)
Each WordNet synset is provided with three different
sentiment scores (positivity, negativity, objectivity)
(*) Baccianella, Stefano, Andrea Esuli, and Fabrizio
Sebastiani. "SentiWordNet 3.0: An Enhanced Lexical
Resource for Sentiment Analysis and Opinion Mining."
LREC. Vol. 10. 2010.

34
Lexical Resources WordNet Affect(*)
WordNet extension
Affective-related synsets
are mapped with an A-Label
e.g. euphoria —> positive-emotion
illness —> physical state
(*) Strapparava, Carlo, and Alessandro Valitutti. "WordNet
Affect: an Affective Extension of WordNet." LREC. Vol. 4.
2004.

35
Lexical Resources SenticNet(*)
Inspired by the Hourglass of
Emotions model
Each term is represented of the
ground of the intensity of four basic
emotional dimensions (sensitivity,
aptitude, attention, pleasantness)
The activation level of each dimension
defines 16 basic emotions
(*) Cambria, Erik, Daniel Olsher, and Dheeraj Rajagopal.
"SenticNet 3: a common and common-sense knowledge
base for cognition-driven sentiment analysis." Twenty-eighth
AAAI conference on artificial intelligence. 2014.

36
According to the triggered emotions, each term
is provided with an aggregated polarity score

37
SenticNet models a sentiment score
for some bigrams and trigrams as well!

38
Lexical Resources MPQA(*)
(*) Wilson, Theresa, Janyce Wiebe, and Paul Hoffmann.
"Recognizing contextual polarity in phrase-level
sentiment analysis." Proceedings of the conference on
human language technology and empirical methods in
natural language processing. Association for Computational
Linguistics, 2005.
Each term is
(manually) provided
with a discrete
sentiment score
+1 positive
0 neutral
-1 negative

39
Lexical Resources Comparison
Resource Coverage (terms)
SentiWordNet 117,659
WordNet Affect 200
SenticNet 14,000
MPQA 8,222

41
Lexical Resources
Score calculation
SentiWordNet
Given a term,
score(tj) is the
mean of the
sentiment score of
all the possible
synsets of tj
score(good) = 0.75 + 0 + 1 +1 =
4
0.687

Score calculation
Given a term, score(tj),
WordNet Affect hierarchy is
climbed until an A-Label which
occur in SentiWordNet is found.
tj inherits the sentiment
score of the A-Label
score(good) = score(benevolence) =
0.339
42
Lexical Resources
WordNet Affect

43
Lexical Resources
Score calculation
SenticNet
Given a term,
score(tj), SenticNet
APIs are queried
and sentiment
score is extracted
score(good) = 0.883

44
Lexical Resources
Score calculation
MPQA
Given a term,
score(tj), MPQA
Lexicon are
queried and
sentiment score is
extracted
score(good) = 1

45
Methodology

Experimental Evaluation
Research Hypothesis
46
1. How do the different
versions of the algorithm
perform with respect to state-of-the-
art datasets?
2. What is the best lexical
resource to detect the polarity
of microblog posts?

Description of the datasets
47
• SemEval-2013 • 14,435 Tweets • 8,180 training • 3,255 test • Positive, Negative, Neutral • STS Dataset • 1,600,000 Tweets • only 359 test • Positive, Negative

Statistics about Coverage
48
Lexicon SemEval-2013-Test STS-Test
Vocabulary Size 18,309 6,711
SentiWordNet 4,314 883
WordNet-Affect 149 48
MPQA 897 224
SenticNet 1,497 326

Experiment 1
49
Intra-Lexicons evaluation

norm vs norm+emph
significant (p < 0,0001)
Basic
Normalized
Emphasized
Norm-Emph
Experiment 1
57,67
58,1
58,65
58,99
45 50 55 60 65
50
SemEval :: SentiWordNet
Emphasis and Normalization improve the accuracy

Basic
Normalized
Emphasized
Norm-Emph
Experiment 1
53,92
55,05
53,95
55,08
not significant
45 50 55 60 65
51
SemEval :: WordNet Affect
Emphasis and Normalization improve the accuracy

Basic
Normalized
Emphasized
Norm-Emph
Experiment 1
58,03
57,97
58,25
58,1
not significant
45 50 55 60 65
52
SemEval :: MPQA
Emphasis improves the accuracy. Normalization doesn’t.

Basic
Normalized
Emphasized
Norm-Emph
Experiment 1
48,69
47,25
48,29
48,08
norm vs norm+emph
significant (p < 0,0001)
45 50 55 60 65
53
SemEval :: SenticNet
No improvement

Experiment 1
54
General Outcomes
SentiWordNet WordNet Affect MPQA
Emphasis leads to improvements
(7 out of 8 comparisons).
1.
2.
SenticNet
Normalization doesn’t. (1 out of
4 comparisons)

Basic
Normalized
Emphasized
Norm-Emph
Experiment 1
71,87
72,42
71,31
71,59
not significant
gaps
60 63,75 67,5 71,25 75
55
STS :: SentiWordNet
Normalization improves the accuracy. Emphasis doesn’t

Basic
Normalized
Emphasized
Norm-Emph
Experiment 1
62,95
62,67
62,96
62,95
60 63,75 67,5 71,25 75
56
STS :: WordNet Affect
not significant
gaps
Emphasis improves the accuracy. Normalization doesn’t

Basic
Normalized
Emphasized
Norm-Emph
Experiment 1
69,54
70,75
69,92
70,76
60 63,75 67,5 71,25 75
57
STS :: MPQA
not significant
gaps
Both Emphasis and Normalization improve the accuracy.

Basic
Normalized
Emphasized
Norm-Emph
Experiment 1
74,37
74,65
74,65
73,82
not significant
70 71,75 73,5 75,25 77
58
STS :: SenticNet
Normalization improves the accuracy. Emphasis doesn’t

Experiment 1
SenticNet
59
General Outcomes
SentiWordNet WordNet Affect MPQA
1.
Controversial behavior (normalization
typically improves, emphasis doesn’t) 2.
Little statistical significance
(small dataset)

Experiment 2
60
Inter-Lexicons evaluation

Experiment 2
61
Comparison between lexicons
Accuracy
80
60
40
20
0
SentiWordNet SenticNet WordNet-Affect MPQA
58,25
62,96
55,08
74,65
48,69
72,42
SemEval-2013 STS
70,76
58,99

Experiment 2
SentiWordNet is the best-performing configuration on SemEval data
62
Accuracy
80
60
40
20
0
58,25
62,96
55,08
74,65
48,69
72,42
SemEval-2013 STS
70,76
58,99

Experiment 2
63
Accuracy
80
60
40
20
0
58,25
62,96
55,08
74,65
48,69
72,42
SemEval-2013 STS
70,76
58,99
MPQA well-performs on SemEval data

Experiment 2
SenticNet has a controversial behavior: worst on SemEval - best on STS
64
Accuracy
80
60
40
20
0
58,25
62,96
55,08
74,65
48,69
72,42
SemEval-2013 STS
70,76
58,99

Experiment 2
Reason: SenticNet can hardly classify neutral Tweets (threshold learning?)
65
Accuracy
80
60
40
20
0
58,25
62,96
55,08
74,65
48,69
72,42
SemEval-2013 STS
70,76
58,99

Experiment 2
66
Accuracy
80
60
40
20
0
58,25
62,96
55,08
74,65
48,69
72,42
SemEval-2013 STS
70,76
58,99
SentiWordNet and MPQA confirm their performance on STS

Experiment 2
Poor coverage negatively influences Wordnet-Affect performances
67
Accuracy
80
60
40
20
0
58,25
62,96
55,08
74,65
48,69
72,42
SemEval-2013 STS
70,76
58,99

Experiment 2
68
Statistical Analysis
Accuracy
80
60
40
20
0
58,25
62,96
55,08
74,65
48,69
72,42
best p < 0,0001 p < 0,001 p < 0,50 p < 0,42 best p < 0,0001 p < 0,11
SemEval-2013 STS
70,76
58,99
= not significant gap = significant gap

Experiment 2
69
Conclusions
Accuracy
80
60
40
20
0
58,25
62,96
55,08
74,65
48,69
72,42
best p < 0,0001 p < 0,001 p < 0,50 p < 0,42 best p < 0,0001 p < 0,11
SemEval-2013 STS
70,76
58,99
= best-performing lexicons

Conclusions

Lessons Learned
INVESTIGATION ABOUT THE EFFECTIVENESS OF LEXICAL RESOURCES IN
POLARITY CLASSIFICATION OF MICROBLOG POSTS
Comparison of 4 state-of-the-art resources
71
SentiWordNet - SenticNet - MPQA - WordNet Affect
Evaluation.
Research Question: What is the impact of each lexical resource in
the task of polarity classification?
MPQA and SentiWordNet typically overcome other resources
(interesting result, due to the smaller coverage of MPQA)
SenticNet behavior is worth to be deepen investigated
1.
2.

Future Research
72
Evaluation against different datasets and with
more lexical results;
Better tuning of parameters (classification
threshold) , integration of more complex
syntactic structures, merging lexical resources
Integration of the algorithm in a
recommendation framework to exploit
sentiment-based information to model user
interests

questions?
Cataldo Musto, Ph.D
cataldo.musto@uniba.it

A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts

Similar to A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts (20)

More from Cataldo Musto

More from Cataldo Musto (20)

Recently uploaded

Recently uploaded (20)

A comparison of Lexicon-based approaches for Sentiment Analysis of microblog posts