SlideShare a Scribd company logo
1 of 36
.nju.edu.cn




                An Empirical Study of Vocabulary Relatedness
                and Its Application to Recommender Systems




                   Gong Cheng, Saisai Gong, Yuzhong Qu
State Key Laboratory for Novel Software Technology, Nanjing University, China
                             gcheng@nju.edu.cn



                          Presented at ISWC2011
ws .nju.edu.cn
                                                    Measuring term similarity




                                                                  0.9

                                        FacultyMember                         Faculty




                                    FullProfessor           0.8         Professor
                                                                            AssistantProfessor
                                             AssistantProfessor
     Vocabulary matching                                          1.0




Gong Cheng (程龚) gcheng@nju.edu.cn                                                         2 of 36
ws .nju.edu.cn
                                                  Measuring vocabulary similarity




                                     Semantic Web for Research
                                       Communities (SWRC)
                                                                         Foundational Model of
                                                                            Anatomy (FMA)
                                            0.8                             0.5
      Vocabulary distance
                                                                       GALEN              0.6
                                                        0.02
                                    eBiquity Person              0.5


                                                                   NCBI organismal classification
     Vocabulary matching                                                   (NCBITaxon)




Gong Cheng (程龚) gcheng@nju.edu.cn                                                          3 of 36
ws .nju.edu.cn
                                            Measuring vocabulary relatedness



    Vocabulary relatedness

                                       FacultyMember             Postgraduate-Research-
                                                                         Degree


      Vocabulary distance

                                    FullProfessor

                                                                  PhD           EngD
                                            AssistantProfessor
     Vocabulary matching
                                          not that similar, but somewhat related




Gong Cheng (程龚) gcheng@nju.edu.cn                                                  4 of 36
Contributions
                                                                                       ws .nju.edu.cn

        How to measure vocabulary relatedness?
            6 measures, from 4 aspects


        How about vocabulary relatedness in real-life cases?
            Empirical analysis of 2,996 vocabularies and other 4 billion RDF triples


        Where to apply vocabulary relatedness?
            Post-selection vocabulary recommendation in vocabulary search




Gong Cheng (程龚) gcheng@nju.edu.cn                                                      5 of 36
Outline
                                                   ws .nju.edu.cn

        Data set
        Vocabulary relatedness
        Post-selection vocabulary recommendation
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn                  6 of 36
Data set statistics
                                                    ws .nju.edu.cn

        Crawled from February 2010 to May 2011 by




Gong Cheng (程龚) gcheng@nju.edu.cn                   7 of 36
Data set distributions
                                               ws .nju.edu.cn

        RDF documents over pay-level domains




Gong Cheng (程龚) gcheng@nju.edu.cn              8 of 36
Data set distributions
                                              ws .nju.edu.cn

        Vocabularies over top-level domains




Gong Cheng (程龚) gcheng@nju.edu.cn             9 of 36
Outline
                                                   ws .nju.edu.cn

        Data set
        Vocabulary relatedness
        Post-selection vocabulary recommendation
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn                  10 of 36
Vocabulary relatedness
                                               ws .nju.edu.cn

        6 numerical measures, from 4 aspects
            Semantic relatedness
                Explicit
                Implicit
                Hybrid
            Content similarity
            Expressivity closeness
            Distributional relatedness
        Comparison




Gong Cheng (程龚) gcheng@nju.edu.cn              11 of 36
Measure 1: explicit semantic relatedness
                                                                               ws .nju.edu.cn
         E                                    1
        RS v i , v j
                       weight of a shortestpathbetween vi and v j in GE




                                    1                         2
      GE               v1                       v2                        v3



                             owl:imports             owl:priorVersion
                       v1                                                 v3
                                                v2
                             rdfs:seeAlso



Gong Cheng (程龚) gcheng@nju.edu.cn                                              12 of 36
Measure 2: implicit semantic relatedness
                                                                               ws .nju.edu.cn
         I                                    1
        RS v i , v j
                       weight of a shortestpathbetween vi and v j in GI




                                    1                         2
       GI              v2                       v3                        v4




                             owl:inverseOf            rdfs:subClassOf
                       t2                                                 t4
                                                 t3
                             owl:inverseOf

                       v2                       v3                        v4
Gong Cheng (程龚) gcheng@nju.edu.cn                                              13 of 36
Measure 3: hybrid semantic relatedness
                                                                                   ws .nju.edu.cn
         E   I                                      1
        RS       vi , v j
                            weight of a shortestpathbetween vi and v j in GE   I




                                  1        v2
     GE+I                                       1                v4
                             v1
                                                             2
                                                    v3




Gong Cheng (程龚) gcheng@nju.edu.cn                                                  14 of 36
Empirical analysis (1)
                                                    ws .nju.edu.cn

        Statistical properties of GE, GI and GE+I




Gong Cheng (程龚) gcheng@nju.edu.cn                   15 of 36
Empirical analysis (2)
                                                  ws .nju.edu.cn

        Explicit relations between vocabularies




Gong Cheng (程龚) gcheng@nju.edu.cn                 16 of 36
Measure 4: content similarity
                                                                       ws .nju.edu.cn




                     Harmonic mean

                             Maximum similarity between their labels




Gong Cheng (程龚) gcheng@nju.edu.cn                                      17 of 36
Empirical analysis (3)
                                                                                            ws .nju.edu.cn

        86 label-like properties
            rdfs:label, dc:title, and their subproperties (e.g. skos:prefLabel)
        and local name




            Terms and their labels                              Vocabulary distribution

             36.33%                                                               36.21%
                          63.67%            w/                                             w/
                                                                   63.79%
                                            w/o                                            w/o




Gong Cheng (程龚) gcheng@nju.edu.cn                                                          18 of 36
Measure 5: expressivity closeness
                                                                                ws .nju.edu.cn


                 tp       owl:TransitiveProperty
                                                            MetaTerms
                                                   rdfs:domain
                                                               owl:TransitiveProperty
                                                       owl:inverseOf
                                                                      rdf:type
      tq                    tr




                      Jaccard




Gong Cheng (程龚) gcheng@nju.edu.cn                                              19 of 36
Empirical analysis (4)
                                                               ws .nju.edu.cn

        4,978 meta-level terms, 469 (9.42%) in >1 vocabulary
        Most popular meta-level terms
         1.   rdf:type
         2.   rdfs:domain
         3.   rdfs:range
         4.   …
        and after excluding language constructs




        10.13 meta-level terms per vocabulary
        ≤20 meta-level terms in 92.96% vocabularies
        but hundreds in Cyc



Gong Cheng (程龚) gcheng@nju.edu.cn                              20 of 36
Measure 6: distributional relatedness
                                                                         ws .nju.edu.cn

        Distributional profile

                    p v1 | v
                    p v2 | v
        DP v                          RD vi , v j   cos DP vi , DP v j
                       ...
                    p vn | v




Gong Cheng (程龚) gcheng@nju.edu.cn                                        21 of 36
Empirical analysis (5)
                                                              ws .nju.edu.cn

        Instantiation found for 1,874 (62.55%) vocabularies

        Most popular vocabularies (excluding languages)




Gong Cheng (程龚) gcheng@nju.edu.cn                             22 of 36
Empirical analysis (6)
                                                                         ws .nju.edu.cn

        Co-instantiation found for 9,763 pairs of vocabularies

        Most popular vocabulary co-instantiation (excluding languages)




Gong Cheng (程龚) gcheng@nju.edu.cn                                        23 of 36
Vocabulary relatedness
                                               ws .nju.edu.cn

        6 numerical measures, from 4 aspects
            Semantic relatedness
                Explicit
                Implicit
                Hybrid
            Content similarity
            Expressivity closeness
            Distributional relatedness
        Comparison




Gong Cheng (程龚) gcheng@nju.edu.cn              24 of 36
Agreement between measures
                                                             ws .nju.edu.cn

        Spearman’s rank correlation coefficient (ρ∈[-1,1])




        Single-link hierarchical clustering




Gong Cheng (程龚) gcheng@nju.edu.cn                            25 of 36
Outline
                                                   ws .nju.edu.cn

        Data set
        Vocabulary relatedness
        Post-selection vocabulary recommendation
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn                  26 of 36
Relatedness-based ranking
                                        ws .nju.edu.cn

        Ranking by single measure:




        Ranking by multiple measures:




Gong Cheng (程龚) gcheng@nju.edu.cn       27 of 36
Popularity-based re-ranking
                                                                   ws .nju.edu.cn




      Degree of influence of popularity

                    Number of pay-level domains instantiating vi




Gong Cheng (程龚) gcheng@nju.edu.cn                                  28 of 36
Evaluation settings
                                                                                   ws .nju.edu.cn

        20 “selections” randomly selected from 1,302 moderate-sized vocabularies
        Depth-10 pooling with

        2 experts
        Ratings
            Closely related: 2
            Somewhat related: 1
            Unrelated: 0


        Metric: NDCG




Gong Cheng (程龚) gcheng@nju.edu.cn                                                  29 of 36
Gold standard
                                                                              ws .nju.edu.cn

        739 assessments
                                          Assessments
                                    7.85%                  Closely related
                                         10.55%

                           81.60%                          Somewhat related

                                                           Unrelated


        Agreement between experts
            80%
            or 91% when “closely related = somewhat related = related”




Gong Cheng (程龚) gcheng@nju.edu.cn                                             30 of 36
Evaluation results --- individual measures
                                                                           ws .nju.edu.cn


       56.88% isolated vocabularies in GE        37.45% uninstantiated vocabularies




Gong Cheng (程龚) gcheng@nju.edu.cn                                         31 of 36
Evaluation results --- combinations of measures
                                                      ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn                     32 of 36
Relatedness vs. popularity
                                                                  ws .nju.edu.cn

        NDCG@1 vs. number of pay-level domains instantiating it




Gong Cheng (程龚) gcheng@nju.edu.cn                                 33 of 36
Outline
                                                   ws .nju.edu.cn

        Data set
        Vocabulary relatedness
        Post-selection vocabulary recommendation
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn                  34 of 36
Conclusions
                                                           ws .nju.edu.cn

        Vocabulary-level relatedness
            4 aspects, 6 measures
        Empirical analysis
            Statistical findings
            Comparison
        Post-selection vocabulary recommendation
            Relatedness-based ranking
            Popularity-based re-ranking
            Evaluation


        Falcons Ontology Search
            http://ws.nju.edu.cn/falcons/ontologysearch/




Gong Cheng (程龚) gcheng@nju.edu.cn                          35 of 36
Take away
                                                                       ws .nju.edu.cn

        Vocabulary meta-descriptions are incomplete.
        Terms lack labels.
        Co-instantiated ∝ explicitly related




                        http://ws.nju.edu.cn/falcons/ontologysearch/




Gong Cheng (程龚) gcheng@nju.edu.cn                                      36 of 36

More Related Content

Viewers also liked

s1140177ChiemiHanyu_Thesis
s1140177ChiemiHanyu_Thesiss1140177ChiemiHanyu_Thesis
s1140177ChiemiHanyu_Thesischiemihanyu
 
Summarizing Semantic Data
Summarizing Semantic DataSummarizing Semantic Data
Summarizing Semantic DataGong Cheng
 
Web的图结构分析
Web的图结构分析Web的图结构分析
Web的图结构分析Gong Cheng
 
NJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryNJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryGong Cheng
 
Term Dependence on the Semantic Web
Term Dependence on the Semantic WebTerm Dependence on the Semantic Web
Term Dependence on the Semantic WebGong Cheng
 
知识的摘要
知识的摘要知识的摘要
知识的摘要Gong Cheng
 
Taking up the Gaokao Challenge: An Information Retrieval Approach
Taking up the Gaokao Challenge: An Information Retrieval ApproachTaking up the Gaokao Challenge: An Information Retrieval Approach
Taking up the Gaokao Challenge: An Information Retrieval ApproachGong Cheng
 
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Gong Cheng
 
Surviving (and Thriving in) the Online Identity Wars
Surviving (and Thriving in) the Online Identity WarsSurviving (and Thriving in) the Online Identity Wars
Surviving (and Thriving in) the Online Identity WarsJohn McCrea
 
What an "RP" Wants
What an "RP" WantsWhat an "RP" Wants
What an "RP" WantsJohn McCrea
 
Falcon-AO: Results for OAEI 2007
Falcon-AO: Results for OAEI 2007Falcon-AO: Results for OAEI 2007
Falcon-AO: Results for OAEI 2007Gong Cheng
 
Searching Semantic Web Objects Based on Class Hierarchies
Searching Semantic Web Objects Based on Class HierarchiesSearching Semantic Web Objects Based on Class Hierarchies
Searching Semantic Web Objects Based on Class HierarchiesGong Cheng
 

Viewers also liked (14)

s1140177ChiemiHanyu_Thesis
s1140177ChiemiHanyu_Thesiss1140177ChiemiHanyu_Thesis
s1140177ChiemiHanyu_Thesis
 
Summarizing Semantic Data
Summarizing Semantic DataSummarizing Semantic Data
Summarizing Semantic Data
 
Web的图结构分析
Web的图结构分析Web的图结构分析
Web的图结构分析
 
How
HowHow
How
 
NJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryNJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary Repository
 
Term Dependence on the Semantic Web
Term Dependence on the Semantic WebTerm Dependence on the Semantic Web
Term Dependence on the Semantic Web
 
知识的摘要
知识的摘要知识的摘要
知识的摘要
 
Taking up the Gaokao Challenge: An Information Retrieval Approach
Taking up the Gaokao Challenge: An Information Retrieval ApproachTaking up the Gaokao Challenge: An Information Retrieval Approach
Taking up the Gaokao Challenge: An Information Retrieval Approach
 
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
 
Surviving (and Thriving in) the Online Identity Wars
Surviving (and Thriving in) the Online Identity WarsSurviving (and Thriving in) the Online Identity Wars
Surviving (and Thriving in) the Online Identity Wars
 
What an "RP" Wants
What an "RP" WantsWhat an "RP" Wants
What an "RP" Wants
 
Falcon-AO: Results for OAEI 2007
Falcon-AO: Results for OAEI 2007Falcon-AO: Results for OAEI 2007
Falcon-AO: Results for OAEI 2007
 
Searching Semantic Web Objects Based on Class Hierarchies
Searching Semantic Web Objects Based on Class HierarchiesSearching Semantic Web Objects Based on Class Hierarchies
Searching Semantic Web Objects Based on Class Hierarchies
 
Aflp
AflpAflp
Aflp
 

Similar to An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

BipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary DescriptionsBipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary DescriptionsGong Cheng
 
Continuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfContinuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfdevangmittal4
 
Semantic Relatedness for Evaluation of Course Equivalencies
Semantic Relatedness for Evaluation of Course EquivalenciesSemantic Relatedness for Evaluation of Course Equivalencies
Semantic Relatedness for Evaluation of Course EquivalenciesBeibei Yang
 
Context Representation for the Semantic Web
Context Representation for the Semantic Web Context Representation for the Semantic Web
Context Representation for the Semantic Web Jie Bao
 
Latent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationLatent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationElaheh Barati
 

Similar to An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems (7)

BipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary DescriptionsBipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary Descriptions
 
RESUME
RESUMERESUME
RESUME
 
Continuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdfContinuous bag of words cbow word2vec word embedding work .pdf
Continuous bag of words cbow word2vec word embedding work .pdf
 
Semantic Relatedness for Evaluation of Course Equivalencies
Semantic Relatedness for Evaluation of Course EquivalenciesSemantic Relatedness for Evaluation of Course Equivalencies
Semantic Relatedness for Evaluation of Course Equivalencies
 
Ontology Dev
Ontology DevOntology Dev
Ontology Dev
 
Context Representation for the Semantic Web
Context Representation for the Semantic Web Context Representation for the Semantic Web
Context Representation for the Semantic Web
 
Latent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationLatent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text Summarization
 

More from Gong Cheng

Towards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and BeyondTowards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and BeyondGong Cheng
 
从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探Gong Cheng
 
知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法Gong Cheng
 
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...Gong Cheng
 
知识图谱中的关联搜索
知识图谱中的关联搜索知识图谱中的关联搜索
知识图谱中的关联搜索Gong Cheng
 
面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探Gong Cheng
 
知识图谱中的实体关联搜索
知识图谱中的实体关联搜索知识图谱中的实体关联搜索
知识图谱中的实体关联搜索Gong Cheng
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationGong Cheng
 
Semantic Web related top conference review
Semantic Web related top conference reviewSemantic Web related top conference review
Semantic Web related top conference reviewGong Cheng
 
Relatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationRelatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationGong Cheng
 
Generating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGenerating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGong Cheng
 
常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析Gong Cheng
 
Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Gong Cheng
 
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationHIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationGong Cheng
 
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Gong Cheng
 
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Gong Cheng
 
RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization
RELIN: Relatedness and Informativeness-based Centrality for Entity SummarizationRELIN: Relatedness and Informativeness-based Centrality for Entity Summarization
RELIN: Relatedness and Informativeness-based Centrality for Entity SummarizationGong Cheng
 
Browsing Linked Data with MyView
Browsing Linked Data with MyViewBrowsing Linked Data with MyView
Browsing Linked Data with MyViewGong Cheng
 

More from Gong Cheng (18)

Towards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and BeyondTowards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and Beyond
 
从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探
 
知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法
 
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
 
知识图谱中的关联搜索
知识图谱中的关联搜索知识图谱中的关联搜索
知识图谱中的关联搜索
 
面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探
 
知识图谱中的实体关联搜索
知识图谱中的实体关联搜索知识图谱中的实体关联搜索
知识图谱中的实体关联搜索
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
 
Semantic Web related top conference review
Semantic Web related top conference reviewSemantic Web related top conference review
Semantic Web related top conference review
 
Relatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationRelatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity Summarization
 
Generating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGenerating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the Web
 
常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析
 
Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...
 
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationHIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
 
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
 
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...
 
RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization
RELIN: Relatedness and Informativeness-based Centrality for Entity SummarizationRELIN: Relatedness and Informativeness-based Centrality for Entity Summarization
RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization
 
Browsing Linked Data with MyView
Browsing Linked Data with MyViewBrowsing Linked Data with MyView
Browsing Linked Data with MyView
 

Recently uploaded

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Recently uploaded (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems

  • 1. .nju.edu.cn An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems Gong Cheng, Saisai Gong, Yuzhong Qu State Key Laboratory for Novel Software Technology, Nanjing University, China gcheng@nju.edu.cn Presented at ISWC2011
  • 2. ws .nju.edu.cn Measuring term similarity 0.9 FacultyMember Faculty FullProfessor 0.8 Professor AssistantProfessor AssistantProfessor Vocabulary matching 1.0 Gong Cheng (程龚) gcheng@nju.edu.cn 2 of 36
  • 3. ws .nju.edu.cn Measuring vocabulary similarity Semantic Web for Research Communities (SWRC) Foundational Model of Anatomy (FMA) 0.8 0.5 Vocabulary distance GALEN 0.6 0.02 eBiquity Person 0.5 NCBI organismal classification Vocabulary matching (NCBITaxon) Gong Cheng (程龚) gcheng@nju.edu.cn 3 of 36
  • 4. ws .nju.edu.cn Measuring vocabulary relatedness Vocabulary relatedness FacultyMember Postgraduate-Research- Degree Vocabulary distance FullProfessor PhD EngD AssistantProfessor Vocabulary matching not that similar, but somewhat related Gong Cheng (程龚) gcheng@nju.edu.cn 4 of 36
  • 5. Contributions ws .nju.edu.cn How to measure vocabulary relatedness? 6 measures, from 4 aspects How about vocabulary relatedness in real-life cases? Empirical analysis of 2,996 vocabularies and other 4 billion RDF triples Where to apply vocabulary relatedness? Post-selection vocabulary recommendation in vocabulary search Gong Cheng (程龚) gcheng@nju.edu.cn 5 of 36
  • 6. Outline ws .nju.edu.cn Data set Vocabulary relatedness Post-selection vocabulary recommendation Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 6 of 36
  • 7. Data set statistics ws .nju.edu.cn Crawled from February 2010 to May 2011 by Gong Cheng (程龚) gcheng@nju.edu.cn 7 of 36
  • 8. Data set distributions ws .nju.edu.cn RDF documents over pay-level domains Gong Cheng (程龚) gcheng@nju.edu.cn 8 of 36
  • 9. Data set distributions ws .nju.edu.cn Vocabularies over top-level domains Gong Cheng (程龚) gcheng@nju.edu.cn 9 of 36
  • 10. Outline ws .nju.edu.cn Data set Vocabulary relatedness Post-selection vocabulary recommendation Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 10 of 36
  • 11. Vocabulary relatedness ws .nju.edu.cn 6 numerical measures, from 4 aspects Semantic relatedness Explicit Implicit Hybrid Content similarity Expressivity closeness Distributional relatedness Comparison Gong Cheng (程龚) gcheng@nju.edu.cn 11 of 36
  • 12. Measure 1: explicit semantic relatedness ws .nju.edu.cn E 1 RS v i , v j weight of a shortestpathbetween vi and v j in GE 1 2 GE v1 v2 v3 owl:imports owl:priorVersion v1 v3 v2 rdfs:seeAlso Gong Cheng (程龚) gcheng@nju.edu.cn 12 of 36
  • 13. Measure 2: implicit semantic relatedness ws .nju.edu.cn I 1 RS v i , v j weight of a shortestpathbetween vi and v j in GI 1 2 GI v2 v3 v4 owl:inverseOf rdfs:subClassOf t2 t4 t3 owl:inverseOf v2 v3 v4 Gong Cheng (程龚) gcheng@nju.edu.cn 13 of 36
  • 14. Measure 3: hybrid semantic relatedness ws .nju.edu.cn E I 1 RS vi , v j weight of a shortestpathbetween vi and v j in GE I 1 v2 GE+I 1 v4 v1 2 v3 Gong Cheng (程龚) gcheng@nju.edu.cn 14 of 36
  • 15. Empirical analysis (1) ws .nju.edu.cn Statistical properties of GE, GI and GE+I Gong Cheng (程龚) gcheng@nju.edu.cn 15 of 36
  • 16. Empirical analysis (2) ws .nju.edu.cn Explicit relations between vocabularies Gong Cheng (程龚) gcheng@nju.edu.cn 16 of 36
  • 17. Measure 4: content similarity ws .nju.edu.cn Harmonic mean Maximum similarity between their labels Gong Cheng (程龚) gcheng@nju.edu.cn 17 of 36
  • 18. Empirical analysis (3) ws .nju.edu.cn 86 label-like properties rdfs:label, dc:title, and their subproperties (e.g. skos:prefLabel) and local name Terms and their labels Vocabulary distribution 36.33% 36.21% 63.67% w/ w/ 63.79% w/o w/o Gong Cheng (程龚) gcheng@nju.edu.cn 18 of 36
  • 19. Measure 5: expressivity closeness ws .nju.edu.cn tp owl:TransitiveProperty MetaTerms rdfs:domain owl:TransitiveProperty owl:inverseOf rdf:type tq tr Jaccard Gong Cheng (程龚) gcheng@nju.edu.cn 19 of 36
  • 20. Empirical analysis (4) ws .nju.edu.cn 4,978 meta-level terms, 469 (9.42%) in >1 vocabulary Most popular meta-level terms 1. rdf:type 2. rdfs:domain 3. rdfs:range 4. … and after excluding language constructs 10.13 meta-level terms per vocabulary ≤20 meta-level terms in 92.96% vocabularies but hundreds in Cyc Gong Cheng (程龚) gcheng@nju.edu.cn 20 of 36
  • 21. Measure 6: distributional relatedness ws .nju.edu.cn Distributional profile p v1 | v p v2 | v DP v RD vi , v j cos DP vi , DP v j ... p vn | v Gong Cheng (程龚) gcheng@nju.edu.cn 21 of 36
  • 22. Empirical analysis (5) ws .nju.edu.cn Instantiation found for 1,874 (62.55%) vocabularies Most popular vocabularies (excluding languages) Gong Cheng (程龚) gcheng@nju.edu.cn 22 of 36
  • 23. Empirical analysis (6) ws .nju.edu.cn Co-instantiation found for 9,763 pairs of vocabularies Most popular vocabulary co-instantiation (excluding languages) Gong Cheng (程龚) gcheng@nju.edu.cn 23 of 36
  • 24. Vocabulary relatedness ws .nju.edu.cn 6 numerical measures, from 4 aspects Semantic relatedness Explicit Implicit Hybrid Content similarity Expressivity closeness Distributional relatedness Comparison Gong Cheng (程龚) gcheng@nju.edu.cn 24 of 36
  • 25. Agreement between measures ws .nju.edu.cn Spearman’s rank correlation coefficient (ρ∈[-1,1]) Single-link hierarchical clustering Gong Cheng (程龚) gcheng@nju.edu.cn 25 of 36
  • 26. Outline ws .nju.edu.cn Data set Vocabulary relatedness Post-selection vocabulary recommendation Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 26 of 36
  • 27. Relatedness-based ranking ws .nju.edu.cn Ranking by single measure: Ranking by multiple measures: Gong Cheng (程龚) gcheng@nju.edu.cn 27 of 36
  • 28. Popularity-based re-ranking ws .nju.edu.cn Degree of influence of popularity Number of pay-level domains instantiating vi Gong Cheng (程龚) gcheng@nju.edu.cn 28 of 36
  • 29. Evaluation settings ws .nju.edu.cn 20 “selections” randomly selected from 1,302 moderate-sized vocabularies Depth-10 pooling with 2 experts Ratings Closely related: 2 Somewhat related: 1 Unrelated: 0 Metric: NDCG Gong Cheng (程龚) gcheng@nju.edu.cn 29 of 36
  • 30. Gold standard ws .nju.edu.cn 739 assessments Assessments 7.85% Closely related 10.55% 81.60% Somewhat related Unrelated Agreement between experts 80% or 91% when “closely related = somewhat related = related” Gong Cheng (程龚) gcheng@nju.edu.cn 30 of 36
  • 31. Evaluation results --- individual measures ws .nju.edu.cn 56.88% isolated vocabularies in GE 37.45% uninstantiated vocabularies Gong Cheng (程龚) gcheng@nju.edu.cn 31 of 36
  • 32. Evaluation results --- combinations of measures ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 32 of 36
  • 33. Relatedness vs. popularity ws .nju.edu.cn NDCG@1 vs. number of pay-level domains instantiating it Gong Cheng (程龚) gcheng@nju.edu.cn 33 of 36
  • 34. Outline ws .nju.edu.cn Data set Vocabulary relatedness Post-selection vocabulary recommendation Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 34 of 36
  • 35. Conclusions ws .nju.edu.cn Vocabulary-level relatedness 4 aspects, 6 measures Empirical analysis Statistical findings Comparison Post-selection vocabulary recommendation Relatedness-based ranking Popularity-based re-ranking Evaluation Falcons Ontology Search http://ws.nju.edu.cn/falcons/ontologysearch/ Gong Cheng (程龚) gcheng@nju.edu.cn 35 of 36
  • 36. Take away ws .nju.edu.cn Vocabulary meta-descriptions are incomplete. Terms lack labels. Co-instantiated ∝ explicitly related http://ws.nju.edu.cn/falcons/ontologysearch/ Gong Cheng (程龚) gcheng@nju.edu.cn 36 of 36