SlideShare a Scribd company logo
1 of 32
Rules for Inducing Hierarchies
from Social Tagging Data
iConference 2018, Sheffield, UK, March 25-28, 2018
Hang Dong, Wei Wang, Frans Coenen
Department of Computer Science,
University of Liverpool
Social Tagging Data (Folksonomies)
• Users collaboratively generate “key
words” for their interests.
• The “key words” form a taxonomy
of resources online, called
Folksonomies (Vander Wal, 2007).
Social tags for movie “Forrest Gump” in MovieLens
https://movielens.org/movies/356
Issues in social tagging data
• (i) Noisy and ambiguous.
• Data cleaning (Dong, Wang & Coenen, 2017).
• (ii) Plain structure, lack of semantic relations among tags.
• This study focuses on hierarchical/subsumption relations between tags.
• This is a challenging problem:
• a cognitive task requiring much human effort (Weller, 2010, p. 139).
• distinct from mining relations from sentences.
Research Questions
• 1. Which rule can effectively capture the hierarchical relations
between social tags?
• - Not systematically discussed in previous studies, although
some approaches were proposed & evaluated (Garcia-Silva, 2012;
Strohmaier et al., 2012).
• - (Information science & Linguistics) definition of hierarchical relations
• - Rules in the previous study
• - Proposed two new rules: Fuzzy set inclusion, Probabilistic Association
Research Questions (2)
• 2. How do rules and data representations affect the quality of the
induced hierarchies?
• Data representation: resource-based representation, probabilistic topic
representation
• Experimental Design:
• Hierarchical Generation Algorithm
• Automated evaluation against three gold-standard hierarchies
Hierarchical Relations – information science
Acknowledgement to the image in Stock, W. G. (2010). Concepts and semantic relations in information science. Journal of the Association
for Information Science and Technology, 61(10), 1951-1969.
Hierarchical Relations
• Straightforward?
• (i) Apple is a [kind of] fruit. (ii) Library science is a part of Information Science.
• Abstraction, Generalisation.
• A type of paradigmatic Relation: fit into the same grammatical slot
(Cruse, 2003).
• Tagging data only provide syntagmatic relations, but are a great
source for paradigmatic relations (Peters, 2009; Stock, 2010).
Hierarchical Relations – linguistics (1)
Definitions in Cruse (2003)
• Logical: (extensional) X is a hyponym of Y iff the
extension/objects of X’ should be included in the
extension/objects of Y’.
• Unsymmetrical
(intensional) X is a hyponym of Y iff F(X) entails, but
is not entailed by F(Y), where F(-) is a sentential
function satisfied by X or Y.
Extension
apple
fruit
Intension
fruit
apple
Hierarchical Relations – linguistics (2)
• Collocational: X is a hyponym of Y iff the normal context of X is a
subset of the normal context of Y.
You shall know a word by the company it keeps… - Firth (1957)
• Componential: X is a hyponym of Y iff the features defining Y are a
proper subset of features defining X.
Definitions in Cruse (2003)
Hierarchical relations from tags – computational rules
• Set Inclusion (Mika, 2007; De Meo, 2009)
• Graph Centrality (Heymann, 2006)
• Information-Theoretic Condition (Wang, 2010)
• Fuzzy Set Inclusion
• Probabilistic Association
Resource-based
(Res-based)
representation
Probabilistic Topic
Modelling (PTM)
based
Representation
Representing a tag as a vector
Data Representation
• Resource-based Representation:
(Markines et al., 2009)
• Probabilistic Topic Modelling Representation:
Vt[i] = number of
times the tag t is
annotated to the ith
resource
R1 R2 R3
news 1 0 0
Web2.0 1 1 1
knowledge 0 0 1
Using a probabilistic generative model to infer
the p (tag | topic) and p (resource | topic)
Then calculate p(topic | tag) from p (tag |
topic) using Bayesian’s Theorem.
tags
resources
Topic 1 Topic 2 Topic 3
news 0.8 0.1 0.1
Web2.0 0.4 0.3 0.3
knowledge 0.2 0.2 0.6
tags
topics
Each row sums to 1.
Rule 1: Set inclusion (Mika, 2007; De Meo, 2009)
• Tag A is a hyponym of Tag B if set-inc(A, B) >= p ∧ set-inc(B, A) < p ∧
sim(A, B) > s. (p=0.5)
Sim(A, B) is a similarity measure: cosine similarity.
where RA means the resource set annotated using the tag A.
Resources of
Information
Science
Resources
of Library
Science
Assumption: The logical extension of a tag is
measured as its resource context, i.e. the
resources that tag is annotated.
Rule 2: Graph Centrality (Heymann, 2006)
• Tag A is a hyponym of Tag B if graph-cent(A) < graph-cent(B) ∧ sim(A,B)
> s.
Tag similarity
graph, where
each node is a tag
and edge is
established by
similarity of tags
over a threshold.
Assumption: popularity-generality
the more popular/influential a tag,
the more general it is.
(collocational)
graph-cent(A) is a graph centrality
measure (centrality, betweenness, etc.)
of a tag A in the tag similarity graph.
Rule 3: Information-Theoretic Condition (Wang, 2010)
• Tag A is a hyponym of Tag B if DKL(PB||PA)−DKL(PA||PB) < f ∧ sim(A,B) >
s. Here PA and PB are the probability distributions of A and B over topics. f is a
noise factor of a small value (f = 0.05 in this study).
• Kullback-Leibler divergence as a measure of “surprise” of receiving PB when PA is
expected.
Rule 4: Fuzzy set inclusion
• An extension of Set Inclusion, based on probabilistic topic
representation:
• Tag A is a hyponym of tag B if fuzzy-set-inc(SA,SB) >= p ∧ fuzzy-set-
inc(SB,SA)< p ∧ sim(A,B)>s, where p is set as 0.5.
, where SA is a fuzzy set for tag A as a pair (U, m), U is the set of topics for tag and
m:U → [0, 1] is a membership function: for each topic z ∈ U, m(z) = p(A|z).
Set inclusion vs Fuzzy set inclusion
• Resource-based Representation:
• Probabilistic Topic Modelling Representation:
Vt[i] = number of
times the tag t is
annotated to the ith
resource
R1 R2 R3
news 1 0 0
Web2.0 1 1 1
knowledge 0 0 1
Using a probabilistic generative model to infer
the p (topic | tag) and p (resource | topic)
Use p (topic | tag)
Note: this is different from the previous
p (tag | topic)
tags
resources
Topic 1 Topic 2 Topic 3
news 0.57 0.17 0.1
Web2.0 0.29 0.5 0.3
knowledge 0.14 0.33 0.6
tags
topics
Each column sums to 1.
Rule 5: Probabilistic Association
• Based on PTM representation
• Tag A is a hyponym of Tag B if p(A|B) < p(B|A) ∧ sim(A, B) > s,
• p(A|B) =
(Griffith & Steyvers, 2002)
• z is a member in the set of topics.
• Assumption: componential measure of hierarchical relation.
Information
science (A)
Information
literacy (B)
An example
p (A | B) = 1
p (B| A) = 0.25
Methodology: Algorithm to Hierarchy
Generation
• For RQ1 about rules:
Replacing the isHypo()
function to one of the five
rules each time, and
compare the results.
• For RQ2 about data
representations:
• using different
representation to
calculate sim(ti,tj).
• using the compatible
data representation
for each rule.
Experiments
• Data Collection and Processing
Bibsonomy dataset 2003-2015: 3,794,882 annotations,
868,015 resources, 283,858 tags, 11,103 users.
We used a streamline to clean academic social tagging data
(Dong, Wang & Coenen, 2017):
• Unified different variants of tags.
• Selected tags having user frequency >= 4.
• Removed resources with tags < 3
The cleaned dataset contains 7,846 tag concepts and 128,782
resources.
Standard tags
Users Resources
(with 3 concepts)
Tags
Users Resources
Reference-based evaluation
Measuring the similarity of a
learned hierarchy, L, to gold-
standard hierarchies.
• Gold-standard, denoted as G:
• DBpedia (6616 concepts overlap)
• Microsoft Concept Graph (6029
concepts overlap)
• ACM computing classification
system (691 concepts overlap)
Acknowledgment to Images in
http://dbpedia.org/page/Category:Information_retrieval and
https://dl.acm.org/ccs/ccs.cfm?id=10003317&lid=0.10002951.1
0003317
gold-standard hierarchy Glearned hierarchy L
Evaluation metrics (Dellschaft, Staab, 2006)
(i) Find common concepts between the learned hierarchy L and the gold-standard hierarchy G,
(ii) Extract a characteristic excerpt for each concept. We use common direct subsumption (cdsub)
as the characteristic excerpt.
(iii) The similarity of hierarchies is defined based on the characteristic excerpts.
Information
retrieval
indexing web
information
retrieval
cross-
language
Information
retrieval
Information
needs
web information
retrieval
cross-
languageindexing
cdsub(Information retrieval, L, G) = {indexing, web information
retrieval, cross-language}
cdsub(Information retrieval, G, L) = {web information retrieval}
…
• Taxonomic Precision (TP), Taxonomic Recall (TR) and Taxonomic F-measure (TF)
• Taxonomic Overlap (TO)
• Taxonomic F’-measure (TF’)
Results - DBpedia
Results – Microsoft Concept Graph
Results – ACM Computing Classification System
Learned Hierarchies
Probabilistic Association
Rule, with resource-based
representation.
Discussions
• Q1: regarding the rules
• Set Inclusion Rule results overall best & stable hierarchies in most experimental settings.
• Fuzzy Set Inclusion and Probabilistic Association rules have competitive results.
• Q2: regarding the data representation techniques
• The Res-based representation performs best in most experimental settings.
• Except the PTM representation with Set Inclusion rule had overall best results (TF and TF’).
• Issue:
• Not consistent among three gold-standard hierarchies, demonstration the distinction of the
nature of the chosen gold-standard hierarchies.
Future Studies
• Evaluation: Not just automated evaluation.
• Higher quality hierarchies through machine learning:
• Use the rules altogether to induce hierarchies: features in supervised learning
• Add further information/context: resource contents, external lexical resources, transfer
learning, etc.
• Use deep learning approaches:
• Forget about the rules?
• Using very rich data representations: word embedding
References
• Benz, D., Hotho, A., Stumme, G., Stutzer, S.: Semantics made by you and me: Self-emerging ontologies can capture the diversity of shared
knowledge. In: Proceedings of the 2nd Web Science Conference (WebSci10) (2010)
• Cruse, D.A.: Hyponymy and its varieties. In: Green, R., Bean, C.A., Myaeng, S.H. (eds.) The Semantics of Relationships: An Interdisciplinary
Perspective, pp. 3–21. Springer, Dordrecht (2002). https://doi.org/10.1007/978-94-017-0073-3 1
• Dellschaft, K., Staab, S.: On how to perform a gold standard based evaluation of ontology learning. In: Cruz, I., Decker, S., Allemang, D.,
Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 228–241. Springer, Heidelberg (2006).
https://doi.org/10.1007/11926078 17
• Dong, H., Wang, W., Frans, C.: Deriving dynamic knowledge from academic social tagging data: a novel research direction. In: iConference
2017 Proceedings (2017)
• Griffiths, T.L., Steyvers, M.: Prediction and semantic association. In: Proceedings of the 15th International Conference on Neural Information
Processing Systems, pp. 11–18. MIT Press (2002)
• Heymann, P., Garcia-Molina, H.: Collaborative creation of communal hierarchical taxonomies in social tagging systems. Technical report,
Stanford University (2006)
• Markines, B., Cattuto, C., Menczer, F., Benz, D., Hotho, A., & Stumme, G. Evaluating similarity measures for emergent semantics of social
tagging. In Proceedings of the 18th international conference on World wide web, 641-650. ACM (2009, April).
• Meo, P. D., Quattrone, G., Ursino, D.: Exploitation of semantic relationships and hierarchical data structures to support a user in his
annotation and browsing activities in folksonomies. Inf. Syst. 34(6), 511–535 (2009)
• Mika, P.: Ontologies are us: a unified model of social networks and semantics. Web Semant.: Sci. Serv. Agents World Wide Web 5(1), 5–15
(2007)
• Peters, I., Becker, P.: Folksonomies: Indexing and Retrieval in Web 2.0. De Gruyter/Saur, Berlin (2009)
• Strohmaier, M., Helic, D., Benz, D., Korner, C., Kern, R.: Evaluation of folksonomy induction algorithms. ACM Trans. Intell. Syst. Technol. 3(4),
1–22 (2012)
• Vander Wal: Folksonomy Coinage and Definition. http://www.vanderwal.net/folksonomy.html (2007)
• Wang, W., Barnaghi, P.M., Bargiela, A.: Probabilistic topic models for learning terminological ontologies. IEEE Trans. Knowl. Data Eng. 22(7),
1028–1040 (2010)
• Weller, K.: Knowledge Representation in the Social Semantic Web. De Gruyter Saur, Berlin/New York (2010).
Thank you for your attention.
Hang Dong’s Home page: http://cgi.csc.liv.ac.uk/~hang/
Contact: hangdong@liverpool.ac.uk

More Related Content

What's hot

Lecture14 xing fei-fei
Lecture14 xing fei-feiLecture14 xing fei-fei
Lecture14 xing fei-feiTianlu Wang
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introductionYueshen Xu
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown BagDataTactics
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsClaudia Wagner
 
Composing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsComposing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsDaisuke BEKKI
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftSebastian Ruder
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
 
Do Neural Models Learn Transitivity of Veridical Inference?
Do Neural Models Learn Transitivity of Veridical Inference?Do Neural Models Learn Transitivity of Veridical Inference?
Do Neural Models Learn Transitivity of Veridical Inference?Hitomi Yanaka
 
POPL 2012 Presentation
POPL 2012 PresentationPOPL 2012 Presentation
POPL 2012 Presentationagarwal1975
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentationSoojung Hong
 
RuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative SystemsRuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative SystemsRuleML
 
Max Entropy
Max EntropyMax Entropy
Max Entropyjianingy
 
Interactive Information Retrieval inspired by Quantum Theory
Interactive Information Retrieval inspired by Quantum TheoryInteractive Information Retrieval inspired by Quantum Theory
Interactive Information Retrieval inspired by Quantum TheoryIngo Frommholz
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalBhaskar Mitra
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionMichael Stumpf
 

What's hot (20)

Lecture14 xing fei-fei
Lecture14 xing fei-feiLecture14 xing fei-fei
Lecture14 xing fei-fei
 
Topic model an introduction
Topic model an introductionTopic model an introduction
Topic model an introduction
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Topic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic ModelsTopic Models - LDA and Correlated Topic Models
Topic Models - LDA and Correlated Topic Models
 
Composing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type SemanticsComposing (Im)politeness in Dependent Type Semantics
Composing (Im)politeness in Dependent Type Semantics
 
Word Embedding In IR
Word Embedding In IRWord Embedding In IR
Word Embedding In IR
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Neural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain ShiftNeural Semi-supervised Learning under Domain Shift
Neural Semi-supervised Learning under Domain Shift
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Do Neural Models Learn Transitivity of Veridical Inference?
Do Neural Models Learn Transitivity of Veridical Inference?Do Neural Models Learn Transitivity of Veridical Inference?
Do Neural Models Learn Transitivity of Veridical Inference?
 
POPL 2012 Presentation
POPL 2012 PresentationPOPL 2012 Presentation
POPL 2012 Presentation
 
Xenia miscouridou wi mlds 4
Xenia miscouridou wi mlds 4Xenia miscouridou wi mlds 4
Xenia miscouridou wi mlds 4
 
Cluster
ClusterCluster
Cluster
 
Latent dirichletallocation presentation
Latent dirichletallocation presentationLatent dirichletallocation presentation
Latent dirichletallocation presentation
 
RuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative SystemsRuleML2015: Input-Output STIT Logic for Normative Systems
RuleML2015: Input-Output STIT Logic for Normative Systems
 
Max Entropy
Max EntropyMax Entropy
Max Entropy
 
Interactive Information Retrieval inspired by Quantum Theory
Interactive Information Retrieval inspired by Quantum TheoryInteractive Information Retrieval inspired by Quantum Theory
Interactive Information Retrieval inspired by Quantum Theory
 
Topic Models
Topic ModelsTopic Models
Topic Models
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information Retrieval
 
Considerate Approaches to ABC Model Selection
Considerate Approaches to ABC Model SelectionConsiderate Approaches to ABC Model Selection
Considerate Approaches to ABC Model Selection
 

Similar to Rules for inducing hierarchies from social tagging data

One Tag to bind them all: Measuring Term abstractness in Social Metadata
One Tag to bind them all: Measuring Term abstractness in Social MetadataOne Tag to bind them all: Measuring Term abstractness in Social Metadata
One Tag to bind them all: Measuring Term abstractness in Social MetadataInovex GmbH
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Mariana Damova, Ph.D
 
Contextual ontology alignment may 2011
Contextual ontology alignment may 2011Contextual ontology alignment may 2011
Contextual ontology alignment may 2011Mariana Damova, Ph.D
 
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...National Institute of Informatics
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modelingHiroyuki Kuromiya
 
Higher Order Learning
Higher Order LearningHigher Order Learning
Higher Order Learningbutest
 
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social MediaHarnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social MediaAmparo Elizabeth Cano Basave
 
What makes a linked data pattern interesting?
What makes a linked data pattern interesting?What makes a linked data pattern interesting?
What makes a linked data pattern interesting?Szymon Klarman
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
Formal Concept Analysis
Formal Concept AnalysisFormal Concept Analysis
Formal Concept AnalysisTzar Umang
 
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...Simon Price
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...irjes
 
Text Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated DocumentsText Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated DocumentsNelson Auner
 
Learning Relations from Social Tagging Data
Learning Relations from Social Tagging DataLearning Relations from Social Tagging Data
Learning Relations from Social Tagging DataHang Dong
 
GDSC SSN - solution Challenge : Fundamentals of Decision Making
GDSC SSN - solution Challenge : Fundamentals of Decision MakingGDSC SSN - solution Challenge : Fundamentals of Decision Making
GDSC SSN - solution Challenge : Fundamentals of Decision MakingGDSCSSN
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilaritySaswat Padhi
 

Similar to Rules for inducing hierarchies from social tagging data (20)

One Tag to bind them all: Measuring Term abstractness in Social Metadata
One Tag to bind them all: Measuring Term abstractness in Social MetadataOne Tag to bind them all: Measuring Term abstractness in Social Metadata
One Tag to bind them all: Measuring Term abstractness in Social Metadata
 
Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
 
Contextual ontology alignment may 2011
Contextual ontology alignment may 2011Contextual ontology alignment may 2011
Contextual ontology alignment may 2011
 
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
A SVM Applied Text Categorization of Academia-Industry Collaborative Research...
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
 
Higher Order Learning
Higher Order LearningHigher Order Learning
Higher Order Learning
 
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social MediaHarnessing Linked Knowledge Sources for Topic Classification in Social Media
Harnessing Linked Knowledge Sources for Topic Classification in Social Media
 
What makes a linked data pattern interesting?
What makes a linked data pattern interesting?What makes a linked data pattern interesting?
What makes a linked data pattern interesting?
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
LDA on social bookmarking systems
LDA on social bookmarking systemsLDA on social bookmarking systems
LDA on social bookmarking systems
 
Formal Concept Analysis
Formal Concept AnalysisFormal Concept Analysis
Formal Concept Analysis
 
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
 
Discovering Novel Information with sentence Level clustering From Multi-docu...
Discovering Novel Information with sentence Level clustering  From Multi-docu...Discovering Novel Information with sentence Level clustering  From Multi-docu...
Discovering Novel Information with sentence Level clustering From Multi-docu...
 
Text Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated DocumentsText Analysis: Latent Topics and Annotated Documents
Text Analysis: Latent Topics and Annotated Documents
 
Learning Relations from Social Tagging Data
Learning Relations from Social Tagging DataLearning Relations from Social Tagging Data
Learning Relations from Social Tagging Data
 
Token
TokenToken
Token
 
GDSC SSN - solution Challenge : Fundamentals of Decision Making
GDSC SSN - solution Challenge : Fundamentals of Decision MakingGDSC SSN - solution Challenge : Fundamentals of Decision Making
GDSC SSN - solution Challenge : Fundamentals of Decision Making
 
Information Retrieval using Semantic Similarity
Information Retrieval using Semantic SimilarityInformation Retrieval using Semantic Similarity
Information Retrieval using Semantic Similarity
 

More from Hang Dong

Excel VBA programming basics
Excel VBA programming basicsExcel VBA programming basics
Excel VBA programming basicsHang Dong
 
语义沙龙:如何自动建构社会标签中的语义关系
语义沙龙:如何自动建构社会标签中的语义关系语义沙龙:如何自动建构社会标签中的语义关系
语义沙龙:如何自动建构社会标签中的语义关系Hang Dong
 
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataEnrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataHang Dong
 
关联数据的消费与应用构建: 以上海图书馆家谱开放数据为例
关联数据的消费与应用构建: 以上海图书馆家谱开放数据为例关联数据的消费与应用构建: 以上海图书馆家谱开放数据为例
关联数据的消费与应用构建: 以上海图书馆家谱开放数据为例Hang Dong
 
Learning structured knowledge from social tagging data: a critical review of ...
Learning structured knowledge from social tagging data: a critical review of ...Learning structured knowledge from social tagging data: a critical review of ...
Learning structured knowledge from social tagging data: a critical review of ...Hang Dong
 
Modeling health related topics in an online forum designed for the deaf & har...
Modeling health related topics in an online forum designed for the deaf & har...Modeling health related topics in an online forum designed for the deaf & har...
Modeling health related topics in an online forum designed for the deaf & har...Hang Dong
 
Identifying Evaluation Standards for Online Information Literacy Tutorials (O...
Identifying Evaluation Standards for Online Information Literacy Tutorials (O...Identifying Evaluation Standards for Online Information Literacy Tutorials (O...
Identifying Evaluation Standards for Online Information Literacy Tutorials (O...Hang Dong
 
My hometown -- Wuhan
My hometown -- WuhanMy hometown -- Wuhan
My hometown -- WuhanHang Dong
 

More from Hang Dong (8)

Excel VBA programming basics
Excel VBA programming basicsExcel VBA programming basics
Excel VBA programming basics
 
语义沙龙:如何自动建构社会标签中的语义关系
语义沙龙:如何自动建构社会标签中的语义关系语义沙龙:如何自动建构社会标签中的语义关系
语义沙龙:如何自动建构社会标签中的语义关系
 
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked DataEnrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
Enrichment of Cross-Lingual Information on Chinese Genealogical Linked Data
 
关联数据的消费与应用构建: 以上海图书馆家谱开放数据为例
关联数据的消费与应用构建: 以上海图书馆家谱开放数据为例关联数据的消费与应用构建: 以上海图书馆家谱开放数据为例
关联数据的消费与应用构建: 以上海图书馆家谱开放数据为例
 
Learning structured knowledge from social tagging data: a critical review of ...
Learning structured knowledge from social tagging data: a critical review of ...Learning structured knowledge from social tagging data: a critical review of ...
Learning structured knowledge from social tagging data: a critical review of ...
 
Modeling health related topics in an online forum designed for the deaf & har...
Modeling health related topics in an online forum designed for the deaf & har...Modeling health related topics in an online forum designed for the deaf & har...
Modeling health related topics in an online forum designed for the deaf & har...
 
Identifying Evaluation Standards for Online Information Literacy Tutorials (O...
Identifying Evaluation Standards for Online Information Literacy Tutorials (O...Identifying Evaluation Standards for Online Information Literacy Tutorials (O...
Identifying Evaluation Standards for Online Information Literacy Tutorials (O...
 
My hometown -- Wuhan
My hometown -- WuhanMy hometown -- Wuhan
My hometown -- Wuhan
 

Recently uploaded

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Recently uploaded (20)

What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Rules for inducing hierarchies from social tagging data

  • 1. Rules for Inducing Hierarchies from Social Tagging Data iConference 2018, Sheffield, UK, March 25-28, 2018 Hang Dong, Wei Wang, Frans Coenen Department of Computer Science, University of Liverpool
  • 2. Social Tagging Data (Folksonomies) • Users collaboratively generate “key words” for their interests. • The “key words” form a taxonomy of resources online, called Folksonomies (Vander Wal, 2007). Social tags for movie “Forrest Gump” in MovieLens https://movielens.org/movies/356
  • 3. Issues in social tagging data • (i) Noisy and ambiguous. • Data cleaning (Dong, Wang & Coenen, 2017). • (ii) Plain structure, lack of semantic relations among tags. • This study focuses on hierarchical/subsumption relations between tags. • This is a challenging problem: • a cognitive task requiring much human effort (Weller, 2010, p. 139). • distinct from mining relations from sentences.
  • 4. Research Questions • 1. Which rule can effectively capture the hierarchical relations between social tags? • - Not systematically discussed in previous studies, although some approaches were proposed & evaluated (Garcia-Silva, 2012; Strohmaier et al., 2012). • - (Information science & Linguistics) definition of hierarchical relations • - Rules in the previous study • - Proposed two new rules: Fuzzy set inclusion, Probabilistic Association
  • 5. Research Questions (2) • 2. How do rules and data representations affect the quality of the induced hierarchies? • Data representation: resource-based representation, probabilistic topic representation • Experimental Design: • Hierarchical Generation Algorithm • Automated evaluation against three gold-standard hierarchies
  • 6. Hierarchical Relations – information science Acknowledgement to the image in Stock, W. G. (2010). Concepts and semantic relations in information science. Journal of the Association for Information Science and Technology, 61(10), 1951-1969.
  • 7. Hierarchical Relations • Straightforward? • (i) Apple is a [kind of] fruit. (ii) Library science is a part of Information Science. • Abstraction, Generalisation. • A type of paradigmatic Relation: fit into the same grammatical slot (Cruse, 2003). • Tagging data only provide syntagmatic relations, but are a great source for paradigmatic relations (Peters, 2009; Stock, 2010).
  • 8. Hierarchical Relations – linguistics (1) Definitions in Cruse (2003) • Logical: (extensional) X is a hyponym of Y iff the extension/objects of X’ should be included in the extension/objects of Y’. • Unsymmetrical (intensional) X is a hyponym of Y iff F(X) entails, but is not entailed by F(Y), where F(-) is a sentential function satisfied by X or Y. Extension apple fruit Intension fruit apple
  • 9. Hierarchical Relations – linguistics (2) • Collocational: X is a hyponym of Y iff the normal context of X is a subset of the normal context of Y. You shall know a word by the company it keeps… - Firth (1957) • Componential: X is a hyponym of Y iff the features defining Y are a proper subset of features defining X. Definitions in Cruse (2003)
  • 10. Hierarchical relations from tags – computational rules • Set Inclusion (Mika, 2007; De Meo, 2009) • Graph Centrality (Heymann, 2006) • Information-Theoretic Condition (Wang, 2010) • Fuzzy Set Inclusion • Probabilistic Association Resource-based (Res-based) representation Probabilistic Topic Modelling (PTM) based Representation Representing a tag as a vector
  • 11. Data Representation • Resource-based Representation: (Markines et al., 2009) • Probabilistic Topic Modelling Representation: Vt[i] = number of times the tag t is annotated to the ith resource R1 R2 R3 news 1 0 0 Web2.0 1 1 1 knowledge 0 0 1 Using a probabilistic generative model to infer the p (tag | topic) and p (resource | topic) Then calculate p(topic | tag) from p (tag | topic) using Bayesian’s Theorem. tags resources Topic 1 Topic 2 Topic 3 news 0.8 0.1 0.1 Web2.0 0.4 0.3 0.3 knowledge 0.2 0.2 0.6 tags topics Each row sums to 1.
  • 12. Rule 1: Set inclusion (Mika, 2007; De Meo, 2009) • Tag A is a hyponym of Tag B if set-inc(A, B) >= p ∧ set-inc(B, A) < p ∧ sim(A, B) > s. (p=0.5) Sim(A, B) is a similarity measure: cosine similarity. where RA means the resource set annotated using the tag A. Resources of Information Science Resources of Library Science Assumption: The logical extension of a tag is measured as its resource context, i.e. the resources that tag is annotated.
  • 13. Rule 2: Graph Centrality (Heymann, 2006) • Tag A is a hyponym of Tag B if graph-cent(A) < graph-cent(B) ∧ sim(A,B) > s. Tag similarity graph, where each node is a tag and edge is established by similarity of tags over a threshold. Assumption: popularity-generality the more popular/influential a tag, the more general it is. (collocational) graph-cent(A) is a graph centrality measure (centrality, betweenness, etc.) of a tag A in the tag similarity graph.
  • 14. Rule 3: Information-Theoretic Condition (Wang, 2010) • Tag A is a hyponym of Tag B if DKL(PB||PA)−DKL(PA||PB) < f ∧ sim(A,B) > s. Here PA and PB are the probability distributions of A and B over topics. f is a noise factor of a small value (f = 0.05 in this study). • Kullback-Leibler divergence as a measure of “surprise” of receiving PB when PA is expected.
  • 15. Rule 4: Fuzzy set inclusion • An extension of Set Inclusion, based on probabilistic topic representation: • Tag A is a hyponym of tag B if fuzzy-set-inc(SA,SB) >= p ∧ fuzzy-set- inc(SB,SA)< p ∧ sim(A,B)>s, where p is set as 0.5. , where SA is a fuzzy set for tag A as a pair (U, m), U is the set of topics for tag and m:U → [0, 1] is a membership function: for each topic z ∈ U, m(z) = p(A|z).
  • 16. Set inclusion vs Fuzzy set inclusion • Resource-based Representation: • Probabilistic Topic Modelling Representation: Vt[i] = number of times the tag t is annotated to the ith resource R1 R2 R3 news 1 0 0 Web2.0 1 1 1 knowledge 0 0 1 Using a probabilistic generative model to infer the p (topic | tag) and p (resource | topic) Use p (topic | tag) Note: this is different from the previous p (tag | topic) tags resources Topic 1 Topic 2 Topic 3 news 0.57 0.17 0.1 Web2.0 0.29 0.5 0.3 knowledge 0.14 0.33 0.6 tags topics Each column sums to 1.
  • 17. Rule 5: Probabilistic Association • Based on PTM representation • Tag A is a hyponym of Tag B if p(A|B) < p(B|A) ∧ sim(A, B) > s, • p(A|B) = (Griffith & Steyvers, 2002) • z is a member in the set of topics. • Assumption: componential measure of hierarchical relation. Information science (A) Information literacy (B) An example p (A | B) = 1 p (B| A) = 0.25
  • 18. Methodology: Algorithm to Hierarchy Generation • For RQ1 about rules: Replacing the isHypo() function to one of the five rules each time, and compare the results. • For RQ2 about data representations: • using different representation to calculate sim(ti,tj). • using the compatible data representation for each rule.
  • 19. Experiments • Data Collection and Processing Bibsonomy dataset 2003-2015: 3,794,882 annotations, 868,015 resources, 283,858 tags, 11,103 users. We used a streamline to clean academic social tagging data (Dong, Wang & Coenen, 2017): • Unified different variants of tags. • Selected tags having user frequency >= 4. • Removed resources with tags < 3 The cleaned dataset contains 7,846 tag concepts and 128,782 resources. Standard tags Users Resources (with 3 concepts) Tags Users Resources
  • 20. Reference-based evaluation Measuring the similarity of a learned hierarchy, L, to gold- standard hierarchies. • Gold-standard, denoted as G: • DBpedia (6616 concepts overlap) • Microsoft Concept Graph (6029 concepts overlap) • ACM computing classification system (691 concepts overlap) Acknowledgment to Images in http://dbpedia.org/page/Category:Information_retrieval and https://dl.acm.org/ccs/ccs.cfm?id=10003317&lid=0.10002951.1 0003317
  • 21. gold-standard hierarchy Glearned hierarchy L Evaluation metrics (Dellschaft, Staab, 2006) (i) Find common concepts between the learned hierarchy L and the gold-standard hierarchy G, (ii) Extract a characteristic excerpt for each concept. We use common direct subsumption (cdsub) as the characteristic excerpt. (iii) The similarity of hierarchies is defined based on the characteristic excerpts. Information retrieval indexing web information retrieval cross- language Information retrieval Information needs web information retrieval cross- languageindexing cdsub(Information retrieval, L, G) = {indexing, web information retrieval, cross-language} cdsub(Information retrieval, G, L) = {web information retrieval} …
  • 22. • Taxonomic Precision (TP), Taxonomic Recall (TR) and Taxonomic F-measure (TF) • Taxonomic Overlap (TO) • Taxonomic F’-measure (TF’)
  • 24. Results – Microsoft Concept Graph
  • 25. Results – ACM Computing Classification System
  • 26. Learned Hierarchies Probabilistic Association Rule, with resource-based representation.
  • 27.
  • 28.
  • 29. Discussions • Q1: regarding the rules • Set Inclusion Rule results overall best & stable hierarchies in most experimental settings. • Fuzzy Set Inclusion and Probabilistic Association rules have competitive results. • Q2: regarding the data representation techniques • The Res-based representation performs best in most experimental settings. • Except the PTM representation with Set Inclusion rule had overall best results (TF and TF’). • Issue: • Not consistent among three gold-standard hierarchies, demonstration the distinction of the nature of the chosen gold-standard hierarchies.
  • 30. Future Studies • Evaluation: Not just automated evaluation. • Higher quality hierarchies through machine learning: • Use the rules altogether to induce hierarchies: features in supervised learning • Add further information/context: resource contents, external lexical resources, transfer learning, etc. • Use deep learning approaches: • Forget about the rules? • Using very rich data representations: word embedding
  • 31. References • Benz, D., Hotho, A., Stumme, G., Stutzer, S.: Semantics made by you and me: Self-emerging ontologies can capture the diversity of shared knowledge. In: Proceedings of the 2nd Web Science Conference (WebSci10) (2010) • Cruse, D.A.: Hyponymy and its varieties. In: Green, R., Bean, C.A., Myaeng, S.H. (eds.) The Semantics of Relationships: An Interdisciplinary Perspective, pp. 3–21. Springer, Dordrecht (2002). https://doi.org/10.1007/978-94-017-0073-3 1 • Dellschaft, K., Staab, S.: On how to perform a gold standard based evaluation of ontology learning. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 228–241. Springer, Heidelberg (2006). https://doi.org/10.1007/11926078 17 • Dong, H., Wang, W., Frans, C.: Deriving dynamic knowledge from academic social tagging data: a novel research direction. In: iConference 2017 Proceedings (2017) • Griffiths, T.L., Steyvers, M.: Prediction and semantic association. In: Proceedings of the 15th International Conference on Neural Information Processing Systems, pp. 11–18. MIT Press (2002) • Heymann, P., Garcia-Molina, H.: Collaborative creation of communal hierarchical taxonomies in social tagging systems. Technical report, Stanford University (2006) • Markines, B., Cattuto, C., Menczer, F., Benz, D., Hotho, A., & Stumme, G. Evaluating similarity measures for emergent semantics of social tagging. In Proceedings of the 18th international conference on World wide web, 641-650. ACM (2009, April). • Meo, P. D., Quattrone, G., Ursino, D.: Exploitation of semantic relationships and hierarchical data structures to support a user in his annotation and browsing activities in folksonomies. Inf. Syst. 34(6), 511–535 (2009) • Mika, P.: Ontologies are us: a unified model of social networks and semantics. Web Semant.: Sci. Serv. Agents World Wide Web 5(1), 5–15 (2007) • Peters, I., Becker, P.: Folksonomies: Indexing and Retrieval in Web 2.0. De Gruyter/Saur, Berlin (2009) • Strohmaier, M., Helic, D., Benz, D., Korner, C., Kern, R.: Evaluation of folksonomy induction algorithms. ACM Trans. Intell. Syst. Technol. 3(4), 1–22 (2012) • Vander Wal: Folksonomy Coinage and Definition. http://www.vanderwal.net/folksonomy.html (2007) • Wang, W., Barnaghi, P.M., Bargiela, A.: Probabilistic topic models for learning terminological ontologies. IEEE Trans. Knowl. Data Eng. 22(7), 1028–1040 (2010) • Weller, K.: Knowledge Representation in the Social Semantic Web. De Gruyter Saur, Berlin/New York (2010).
  • 32. Thank you for your attention. Hang Dong’s Home page: http://cgi.csc.liv.ac.uk/~hang/ Contact: hangdong@liverpool.ac.uk

Editor's Notes

  1. There are 3794882 (around 3.8m) annotations on Bibsonomy till July 2015. On the right hand side, it is just a graphical form of knowledge structure, but we will build a knowledge base on it, and make it useful for real applications.
  2. https://en.wikipedia.org/wiki/Extensional_and_intensional_definitions In logic and mathematics, an intensional definition gives the meaning of a term by specifying necessary and sufficient conditions for when the term should be used. In the case of nouns, this is equivalent to specifying the properties that an object needs to have in order to be counted as a referent of the term. An extensional definition of a concept or term formulates its meaning by specifying its extension, that is, every object that falls under the definition of the concept or term in question.
  3. Didn’t use deep learning.