SlideShare a Scribd company logo
1 of 32
.nju.edu.cn




                RELIN: Relatedness and Informativeness-based
                    Centrality for Entity Summarization




                     Gong Cheng1, Thanh Tran2, Yuzhong Qu1
1 State Key Laboratory for Novel Software Technology, Nanjing University, China

          2 Institute AIFB, Karlsruhe Institute of Technology, Germany

                               gcheng@nju.edu.cn



                           Presented at ISWC2011
Motivation
                                                                                            ws .nju.edu.cn

        DBpedia describes 3.64M entities with 1B RDF triples.
            1B/3.64M = 281 RDF triples per entity
        A piece of lengthy entity description is unacceptable in tasks that require quick
        identification of the underlying entity.




Gong Cheng (程龚) gcheng@nju.edu.cn                                                           2 of 30
Entity search --- find entities that match an information need
                                                                     ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn                                    3 of 30
Pay-as-you-go data integration --- judge whether two entities denote the same
                                                                                    ws .nju.edu.cn




                                     sameAs?




Gong Cheng (程龚) gcheng@nju.edu.cn                                                   4 of 30
Motivation
                                                                                            ws .nju.edu.cn

        DBpedia describes 3.64M entities with 1B RDF triples.
            1B/3.64M = 281 RDF triples per entity
        A piece of lengthy entity description is unacceptable in tasks that require quick
        identification of the underlying entity.
        Problem: to summarize lengthy entity descriptions




Gong Cheng (程龚) gcheng@nju.edu.cn                                                           5 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   6 of 30
Data graph
                                    ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn   7 of 30
Feature set
                                    ws .nju.edu.cn




Gong Cheng (程龚) gcheng@nju.edu.cn   8 of 30
Entity summarization
                                                 ws .nju.edu.cn

        Entity summarization = feature ranking
        Entity summary = k top-ranked features




Gong Cheng (程龚) gcheng@nju.edu.cn                9 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   10 of 30
Centrality-based ranking: concepts
                                                                                 ws .nju.edu.cn

        Widely applied to text summarization and ontology summarization
        By constructing a graph
            Nodes: data elements to be ranked
            Edges: connecting related nodes
        and then, measuring node centrality
            e.g. degree, PageRank, …



                     f2
                                       f1       f4
                                                                Relatednesss ≥ threshold

                                                         f5
   Relatednesss < threshold
                                                 f3



Gong Cheng (程龚) gcheng@nju.edu.cn                                               11 of 30
PageRank
                                                                                ws .nju.edu.cn

        Simulating a random surfer’s behavior who navigates from node to node
        Two types of action
            Following a random edge (with a uniform probability distribution)
            Jumping at random (with a uniform probability distribution)
        Ranking based on the stationary distribution of such a Markov chain




                     f2
                                     f1               f4

                                                                 f5

                                                       f3



Gong Cheng (程龚) gcheng@nju.edu.cn                                               12 of 30
Centrality-based ranking for entity summarization: problems
                                                                                            ws .nju.edu.cn

        How to define a good feature
            Not only capturing the main themes of the entity description
            But also distinguishing the entity from others
        Loss of information
            Float-valued function  boolean-valued function




                     f2
                                      f1               f4
                                                                           Relatednesss ≥ threshold

                                                                  f5
   Relatednesss < threshold
                                                        f3



Gong Cheng (程龚) gcheng@nju.edu.cn                                                          13 of 30
RELIN: concepts
                                                                                               ws .nju.edu.cn

        An extension of PageRank
            Following a random edge (           )
            within a complete graph, with a probability proportional to the relatedness between the
            two associated nodes, i.e. no threshold needed
            Jumping at random (          )
            with a probability proportional to the amount of information carried by the target that
            helps to identify the entity




Gong Cheng (程龚) gcheng@nju.edu.cn                                                              14 of 30
RELIN: RELatedness and INformativeness-based centrality
                                                                                                ws .nju.edu.cn

        Two kinds of action
            Relational move --- more likely to a feature that carries related information about the
            theme currently under investigation
            Informational jump --- more likely to a feature that provides a large amount of new
            information for clarifying the identity of the underlying entity
        Two non-uniform probability distributions




Gong Cheng (程龚) gcheng@nju.edu.cn                                                               15 of 30
Formalization
                                                                                                    ws .nju.edu.cn

        Actions (given the current feature fq)
            P(M|fq): the probability of performing a relational move from fq
            P(J|fq): the probability of performing an informational jump from fq
            subject to P(M|fq) + P(J|fq) = 1
        Targets for actions (given FS the feature set)
            P(fp|fq,M): the probability of performing a relational move from fq to fp
            P(fp|fq,J): the probability of performing an informational jump from fq to fp
            subject to      P f p | f q , M 1 and      P f p | fq , J 1
                           f p FS                         f p FS
        Result
            x(t): |FS|-dimensional vector
            xp(t): the probability that the surfer visits fp at step t
            Finally,
                 xp t 1                  xq t   P M | fq P f p | fq , M   P J | fq P f p | fq , J
                                    f q FS

            and
                 lim x t       x
                 t




Gong Cheng (程龚) gcheng@nju.edu.cn                                                                   16 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   17 of 30
Actions
                                        ws .nju.edu.cn

        P(M|fq) = 1 – λ
        P(J|fq) = λ
        λ: to be tuned in experiments




Gong Cheng (程龚) gcheng@nju.edu.cn       18 of 30
Relatedness --- P(fp|fq,M)
                                                                                      ws .nju.edu.cn

        Relatedness between features (i.e. property-value pairs) combines
            Relatedness between properties (i.e. resources)
            Relatedness between values (i.e. resources)
        Relatedness between resources = relatedness between resource names
            URI: label or local name
            Literal: lexical form
        Distributional relatedness between resource names
            More related = more often co-occur in certain contexts (e.g. documents)
        Estimated via “pointwise mutual information + Google”

                                                                      Hits si , s j
                                                      P si , s j
                                                                              N


                                                                   Hits s j
                                                      P sj
                                                                      N




Gong Cheng (程龚) gcheng@nju.edu.cn                                                     19 of 30
Informativeness --- P(fp|fq,J)
                                                                                           ws .nju.edu.cn

        Self-information

        o: informational jump from fq to fp
        P(fp|fq): the probability that fp belongs to a feature set given fq also does so
        Estimated via a statistical analysis of the data set




        Approximation: P(fp|fq) = P(fp)




Gong Cheng (程龚) gcheng@nju.edu.cn                                                          20 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   21 of 30
Experiments
                                    ws .nju.edu.cn

        Intrinsic evaluation
        Extrinsic evaluation




Gong Cheng (程龚) gcheng@nju.edu.cn   22 of 30
Intrinsic evaluation --- design
                                                                                ws .nju.edu.cn

        Task
            To manually construct ideal entity summaries as the gold standard
        Participants
            24 students majoring in computer science
        Test cases
            149 entity descriptions randomly selected from DBpedia 3.4
        Assignment
            4.43 participants per entity description
        Output
            Top-5 features
            Top-10 features




Gong Cheng (程龚) gcheng@nju.edu.cn                                               23 of 30
Intrinsic evaluation --- results
                                                                          ws .nju.edu.cn

        Metric: overlap between summaries

        Agreement between participants about ideal summaries
            2.91 when k=5
            7.86 when k=10
        Quality of summaries computed under different approach settings




             Baselines
              Ours




Gong Cheng (程龚) gcheng@nju.edu.cn                                         24 of 30
Extrinsic evaluation --- design
                                                                                 ws .nju.edu.cn

        Task
            To manually confirm entity mappings by using summaries
        Participants
            19 students majoring in computer science
        Test cases
            47 pairs of entity descriptions (DBpedia 3.4 ↔ Freebase Dec. 2009)
            Gold-standard judgments based on owl:sameAs links
                 24 correct and 23 incorrect
        Assignment
            3.62 participants per pair, per approach setting
        Output
            Judgment: correct or incorrect




Gong Cheng (程龚) gcheng@nju.edu.cn                                                25 of 30
Extrinsic evaluation --- results
                                                                                         ws .nju.edu.cn

        Metrics
            Accuracy of the judgments
                  1.0 = consistent with the gold standard
                  0.0 = inconsistent
            Time spent
                  Normalized by the average time per judgment spent by the participant
                  1.0 = medium efficiency
                  Smaller value = higher efficiency


        Results




Gong Cheng (程龚) gcheng@nju.edu.cn                                                        26 of 30
Discussion
                                                                                      ws .nju.edu.cn

        Automatically computed summaries are still not as good as handcrafted ones.

                                                                         k=5   k=10
         Agreement between ideal summaries                              2.91 7.86
         Agreement between computed summaries and ideal summaries       2.40 4.88

        User-specific notion of informativeness
            Longitude and latitude are highly informative, but …
        Information redundancy
            Longitude + latitude = point
            What if multiple sources …
        Summarization = what + how (to present)




Gong Cheng (程龚) gcheng@nju.edu.cn                                                     27 of 30
Outline
                                    ws .nju.edu.cn

        Problem statement
        The RELIN model
        Implementation
        Experiments
        Conclusions




Gong Cheng (程龚) gcheng@nju.edu.cn   28 of 30
Conclusions
                                                                                            ws .nju.edu.cn

        Problem of entity summarization
            Extractive
            About identifying the entity that underlies a lengthy description
        The RELIN model
            Variant of the random surfer model
            Non-uniform probability distributions
            Informativeness + relatedness
        Implementation
            Based on linguistic and information theory concepts
            Using information captured by the labels of nodes and edges in the data graph
        Experiments
            Closer to handcrafted ideal summaries
            Assisting users in confirming entity mappings more accurately




Gong Cheng (程龚) gcheng@nju.edu.cn                                                           29 of 30
Future work --- application-specific entity summarization
                                                                ws .nju.edu.cn




                                    sameAs?




Gong Cheng (程龚) gcheng@nju.edu.cn                               30 of 30
Related work --- summarization
                                                                                 ws .nju.edu.cn



   Paradigm               Approach             Measure             Model
                                                                        RELIN
         Extractive                                              - Relatedness
    - Text                  Centrality-based     PageRank-like   - Informativeness
    - Ontology                                                   - Non-uniform
                                                                 probability distribution



                                                     Others           PageRank
      Non-extractive
                                               - Degree          - Relatedness
    - Database               Centroid-based
                                               - Betweenness     - Uniform probability
    - Graph
                                               -…                distribution




Gong Cheng (程龚) gcheng@nju.edu.cn                                                31 of 30
Related work --- ranking
                                                                                         ws .nju.edu.cn

        Different goals --- to best identify the underlying entity
            B. Aleman-Meza et al., Ranking Complex Relationships on the Semantic Web. IEEE
            Internet Comput. 2005.
            R. Delbru et al., Hierarchical Link Analysis for Ranking Web Data. ESWC 2010.
            T. Franz. TripleRank: Ranking Semantic Web Data By Tensor Decomposition. ISWC
            2009.
            …
        Exploitation of data semantics at different levels --- use labels of nodes and edges
            T. Penin et al., Snippet Generation for Semantic Web Search Engines. ASWC 2009.
            X. Zhang et al., Ontology Summarization Based on RDF Sentence Graph. WWW 2007.
            …




Gong Cheng (程龚) gcheng@nju.edu.cn                                                       32 of 30

More Related Content

Viewers also liked

Browsing Linked Data with MyView
Browsing Linked Data with MyViewBrowsing Linked Data with MyView
Browsing Linked Data with MyViewGong Cheng
 
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Gong Cheng
 
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationHIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationGong Cheng
 
Web的图结构分析
Web的图结构分析Web的图结构分析
Web的图结构分析Gong Cheng
 
Towards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataTowards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataGong Cheng
 
BipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary DescriptionsBipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary DescriptionsGong Cheng
 

Viewers also liked (6)

Browsing Linked Data with MyView
Browsing Linked Data with MyViewBrowsing Linked Data with MyView
Browsing Linked Data with MyView
 
Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...Facilitating Human Intervention in Coreference Resolution with Comparative En...
Facilitating Human Intervention in Coreference Resolution with Comparative En...
 
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset SummarizationHIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
HIEDS: A Generic and Efficient Approach to Hierarchical Dataset Summarization
 
Web的图结构分析
Web的图结构分析Web的图结构分析
Web的图结构分析
 
Towards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataTowards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web Data
 
BipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary DescriptionsBipRank: Ranking and Summarizing RDF Vocabulary Descriptions
BipRank: Ranking and Summarizing RDF Vocabulary Descriptions
 

More from Gong Cheng

Towards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and BeyondTowards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and BeyondGong Cheng
 
从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探Gong Cheng
 
知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法Gong Cheng
 
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...Gong Cheng
 
知识图谱中的关联搜索
知识图谱中的关联搜索知识图谱中的关联搜索
知识图谱中的关联搜索Gong Cheng
 
面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探Gong Cheng
 
知识图谱中的实体关联搜索
知识图谱中的实体关联搜索知识图谱中的实体关联搜索
知识图谱中的实体关联搜索Gong Cheng
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationGong Cheng
 
Semantic Web related top conference review
Semantic Web related top conference reviewSemantic Web related top conference review
Semantic Web related top conference reviewGong Cheng
 
Relatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationRelatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationGong Cheng
 
Generating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGenerating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGong Cheng
 
常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析Gong Cheng
 
Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Gong Cheng
 
Summarizing Semantic Data
Summarizing Semantic DataSummarizing Semantic Data
Summarizing Semantic DataGong Cheng
 
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Gong Cheng
 
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Gong Cheng
 
Towards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based ApproachTowards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based ApproachGong Cheng
 
NJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryNJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryGong Cheng
 
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...Gong Cheng
 
Term Dependence on the Semantic Web
Term Dependence on the Semantic WebTerm Dependence on the Semantic Web
Term Dependence on the Semantic WebGong Cheng
 

More from Gong Cheng (20)

Towards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and BeyondTowards Content-Based Dataset Search - Test Collections and Beyond
Towards Content-Based Dataset Search - Test Collections and Beyond
 
从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探从元数据到内容——新一代知识图谱搜索引擎初探
从元数据到内容——新一代知识图谱搜索引擎初探
 
知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法知识图谱中的实体摘要:基于神经网络的方法
知识图谱中的实体摘要:基于神经网络的方法
 
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
Generating Compact and Relaxable Answers to Keyword Queries over Knowledge Gr...
 
知识图谱中的关联搜索
知识图谱中的关联搜索知识图谱中的关联搜索
知识图谱中的关联搜索
 
面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探面向高考机器人的知识表示与推理初探
面向高考机器人的知识表示与推理初探
 
知识图谱中的实体关联搜索
知识图谱中的实体关联搜索知识图谱中的实体关联搜索
知识图谱中的实体关联搜索
 
Semantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and SummarizationSemantic Data Retrieval: Search, Ranking, and Summarization
Semantic Data Retrieval: Search, Ranking, and Summarization
 
Semantic Web related top conference review
Semantic Web related top conference reviewSemantic Web related top conference review
Semantic Web related top conference review
 
Relatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity SummarizationRelatedness-based Multi-Entity Summarization
Relatedness-based Multi-Entity Summarization
 
Generating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the WebGenerating Illustrative Snippets for Open Data on the Web
Generating Illustrative Snippets for Open Data on the Web
 
常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析常识推理在地理自动答题中的需求分析
常识推理在地理自动答题中的需求分析
 
Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...Efficient Algorithms for Association Finding and Frequent Association Pattern...
Efficient Algorithms for Association Finding and Frequent Association Pattern...
 
Summarizing Semantic Data
Summarizing Semantic DataSummarizing Semantic Data
Summarizing Semantic Data
 
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
Summarizing Entity Descriptions for Effective and Efficient Human-centered En...
 
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...Explass: Exploring Associations between Entities via Top-K Ontological Patter...
Explass: Exploring Associations between Entities via Top-K Ontological Patter...
 
Towards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based ApproachTowards Exploratory Relationship Search: A Clustering-based Approach
Towards Exploratory Relationship Search: A Clustering-based Approach
 
NJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary RepositoryNJVR: The NanJing Vocabulary Repository
NJVR: The NanJing Vocabulary Repository
 
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
An Empirical Study of Vocabulary Relatedness and Its Application to Recommend...
 
Term Dependence on the Semantic Web
Term Dependence on the Semantic WebTerm Dependence on the Semantic Web
Term Dependence on the Semantic Web
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization

  • 1. .nju.edu.cn RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization Gong Cheng1, Thanh Tran2, Yuzhong Qu1 1 State Key Laboratory for Novel Software Technology, Nanjing University, China 2 Institute AIFB, Karlsruhe Institute of Technology, Germany gcheng@nju.edu.cn Presented at ISWC2011
  • 2. Motivation ws .nju.edu.cn DBpedia describes 3.64M entities with 1B RDF triples. 1B/3.64M = 281 RDF triples per entity A piece of lengthy entity description is unacceptable in tasks that require quick identification of the underlying entity. Gong Cheng (程龚) gcheng@nju.edu.cn 2 of 30
  • 3. Entity search --- find entities that match an information need ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 3 of 30
  • 4. Pay-as-you-go data integration --- judge whether two entities denote the same ws .nju.edu.cn sameAs? Gong Cheng (程龚) gcheng@nju.edu.cn 4 of 30
  • 5. Motivation ws .nju.edu.cn DBpedia describes 3.64M entities with 1B RDF triples. 1B/3.64M = 281 RDF triples per entity A piece of lengthy entity description is unacceptable in tasks that require quick identification of the underlying entity. Problem: to summarize lengthy entity descriptions Gong Cheng (程龚) gcheng@nju.edu.cn 5 of 30
  • 6. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 6 of 30
  • 7. Data graph ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 7 of 30
  • 8. Feature set ws .nju.edu.cn Gong Cheng (程龚) gcheng@nju.edu.cn 8 of 30
  • 9. Entity summarization ws .nju.edu.cn Entity summarization = feature ranking Entity summary = k top-ranked features Gong Cheng (程龚) gcheng@nju.edu.cn 9 of 30
  • 10. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 10 of 30
  • 11. Centrality-based ranking: concepts ws .nju.edu.cn Widely applied to text summarization and ontology summarization By constructing a graph Nodes: data elements to be ranked Edges: connecting related nodes and then, measuring node centrality e.g. degree, PageRank, … f2 f1 f4 Relatednesss ≥ threshold f5 Relatednesss < threshold f3 Gong Cheng (程龚) gcheng@nju.edu.cn 11 of 30
  • 12. PageRank ws .nju.edu.cn Simulating a random surfer’s behavior who navigates from node to node Two types of action Following a random edge (with a uniform probability distribution) Jumping at random (with a uniform probability distribution) Ranking based on the stationary distribution of such a Markov chain f2 f1 f4 f5 f3 Gong Cheng (程龚) gcheng@nju.edu.cn 12 of 30
  • 13. Centrality-based ranking for entity summarization: problems ws .nju.edu.cn How to define a good feature Not only capturing the main themes of the entity description But also distinguishing the entity from others Loss of information Float-valued function  boolean-valued function f2 f1 f4 Relatednesss ≥ threshold f5 Relatednesss < threshold f3 Gong Cheng (程龚) gcheng@nju.edu.cn 13 of 30
  • 14. RELIN: concepts ws .nju.edu.cn An extension of PageRank Following a random edge ( ) within a complete graph, with a probability proportional to the relatedness between the two associated nodes, i.e. no threshold needed Jumping at random ( ) with a probability proportional to the amount of information carried by the target that helps to identify the entity Gong Cheng (程龚) gcheng@nju.edu.cn 14 of 30
  • 15. RELIN: RELatedness and INformativeness-based centrality ws .nju.edu.cn Two kinds of action Relational move --- more likely to a feature that carries related information about the theme currently under investigation Informational jump --- more likely to a feature that provides a large amount of new information for clarifying the identity of the underlying entity Two non-uniform probability distributions Gong Cheng (程龚) gcheng@nju.edu.cn 15 of 30
  • 16. Formalization ws .nju.edu.cn Actions (given the current feature fq) P(M|fq): the probability of performing a relational move from fq P(J|fq): the probability of performing an informational jump from fq subject to P(M|fq) + P(J|fq) = 1 Targets for actions (given FS the feature set) P(fp|fq,M): the probability of performing a relational move from fq to fp P(fp|fq,J): the probability of performing an informational jump from fq to fp subject to P f p | f q , M 1 and P f p | fq , J 1 f p FS f p FS Result x(t): |FS|-dimensional vector xp(t): the probability that the surfer visits fp at step t Finally, xp t 1 xq t P M | fq P f p | fq , M P J | fq P f p | fq , J f q FS and lim x t x t Gong Cheng (程龚) gcheng@nju.edu.cn 16 of 30
  • 17. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 17 of 30
  • 18. Actions ws .nju.edu.cn P(M|fq) = 1 – λ P(J|fq) = λ λ: to be tuned in experiments Gong Cheng (程龚) gcheng@nju.edu.cn 18 of 30
  • 19. Relatedness --- P(fp|fq,M) ws .nju.edu.cn Relatedness between features (i.e. property-value pairs) combines Relatedness between properties (i.e. resources) Relatedness between values (i.e. resources) Relatedness between resources = relatedness between resource names URI: label or local name Literal: lexical form Distributional relatedness between resource names More related = more often co-occur in certain contexts (e.g. documents) Estimated via “pointwise mutual information + Google” Hits si , s j P si , s j N Hits s j P sj N Gong Cheng (程龚) gcheng@nju.edu.cn 19 of 30
  • 20. Informativeness --- P(fp|fq,J) ws .nju.edu.cn Self-information o: informational jump from fq to fp P(fp|fq): the probability that fp belongs to a feature set given fq also does so Estimated via a statistical analysis of the data set Approximation: P(fp|fq) = P(fp) Gong Cheng (程龚) gcheng@nju.edu.cn 20 of 30
  • 21. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 21 of 30
  • 22. Experiments ws .nju.edu.cn Intrinsic evaluation Extrinsic evaluation Gong Cheng (程龚) gcheng@nju.edu.cn 22 of 30
  • 23. Intrinsic evaluation --- design ws .nju.edu.cn Task To manually construct ideal entity summaries as the gold standard Participants 24 students majoring in computer science Test cases 149 entity descriptions randomly selected from DBpedia 3.4 Assignment 4.43 participants per entity description Output Top-5 features Top-10 features Gong Cheng (程龚) gcheng@nju.edu.cn 23 of 30
  • 24. Intrinsic evaluation --- results ws .nju.edu.cn Metric: overlap between summaries Agreement between participants about ideal summaries 2.91 when k=5 7.86 when k=10 Quality of summaries computed under different approach settings Baselines Ours Gong Cheng (程龚) gcheng@nju.edu.cn 24 of 30
  • 25. Extrinsic evaluation --- design ws .nju.edu.cn Task To manually confirm entity mappings by using summaries Participants 19 students majoring in computer science Test cases 47 pairs of entity descriptions (DBpedia 3.4 ↔ Freebase Dec. 2009) Gold-standard judgments based on owl:sameAs links 24 correct and 23 incorrect Assignment 3.62 participants per pair, per approach setting Output Judgment: correct or incorrect Gong Cheng (程龚) gcheng@nju.edu.cn 25 of 30
  • 26. Extrinsic evaluation --- results ws .nju.edu.cn Metrics Accuracy of the judgments 1.0 = consistent with the gold standard 0.0 = inconsistent Time spent Normalized by the average time per judgment spent by the participant 1.0 = medium efficiency Smaller value = higher efficiency Results Gong Cheng (程龚) gcheng@nju.edu.cn 26 of 30
  • 27. Discussion ws .nju.edu.cn Automatically computed summaries are still not as good as handcrafted ones. k=5 k=10 Agreement between ideal summaries 2.91 7.86 Agreement between computed summaries and ideal summaries 2.40 4.88 User-specific notion of informativeness Longitude and latitude are highly informative, but … Information redundancy Longitude + latitude = point What if multiple sources … Summarization = what + how (to present) Gong Cheng (程龚) gcheng@nju.edu.cn 27 of 30
  • 28. Outline ws .nju.edu.cn Problem statement The RELIN model Implementation Experiments Conclusions Gong Cheng (程龚) gcheng@nju.edu.cn 28 of 30
  • 29. Conclusions ws .nju.edu.cn Problem of entity summarization Extractive About identifying the entity that underlies a lengthy description The RELIN model Variant of the random surfer model Non-uniform probability distributions Informativeness + relatedness Implementation Based on linguistic and information theory concepts Using information captured by the labels of nodes and edges in the data graph Experiments Closer to handcrafted ideal summaries Assisting users in confirming entity mappings more accurately Gong Cheng (程龚) gcheng@nju.edu.cn 29 of 30
  • 30. Future work --- application-specific entity summarization ws .nju.edu.cn sameAs? Gong Cheng (程龚) gcheng@nju.edu.cn 30 of 30
  • 31. Related work --- summarization ws .nju.edu.cn Paradigm Approach Measure Model RELIN Extractive - Relatedness - Text Centrality-based PageRank-like - Informativeness - Ontology - Non-uniform probability distribution Others PageRank Non-extractive - Degree - Relatedness - Database Centroid-based - Betweenness - Uniform probability - Graph -… distribution Gong Cheng (程龚) gcheng@nju.edu.cn 31 of 30
  • 32. Related work --- ranking ws .nju.edu.cn Different goals --- to best identify the underlying entity B. Aleman-Meza et al., Ranking Complex Relationships on the Semantic Web. IEEE Internet Comput. 2005. R. Delbru et al., Hierarchical Link Analysis for Ranking Web Data. ESWC 2010. T. Franz. TripleRank: Ranking Semantic Web Data By Tensor Decomposition. ISWC 2009. … Exploitation of data semantics at different levels --- use labels of nodes and edges T. Penin et al., Snippet Generation for Semantic Web Search Engines. ASWC 2009. X. Zhang et al., Ontology Summarization Based on RDF Sentence Graph. WWW 2007. … Gong Cheng (程龚) gcheng@nju.edu.cn 32 of 30