SlideShare a Scribd company logo
1 of 14
Analyzing Statistics
with Background Knowledge
from Linked Open Data

10/22/13 Ristoski, Heiko Paulheim
Petar Ristoski, Heiko Paulheim
Petar

1
Idea
•

Background knowledge from LOD can help
– finding explanations
– creating more sophisticated visualizations

•

Steps taken
– linking the statistics datasets to LOD datasets
– DBpedia, Eurostat, GADM, Linked Geo Data
– Extracting features

•

Finding correlations with unemployment rate
– using only one target variable for demonstration purposes
– works for arbitrary target variables

10/22/13

Petar Ristoski, Heiko Paulheim

2
Linking to LOD Datasets
•

Linking to DBpedia
– using DBpedia Lookup
– restricting results to Place and AdministrativeArea
– select from many results by minimum edit distance

•

Linking to Eurostat
– using SPARQL to query for labels
– querying for word 1-grams, 2-grams, … from original labels
– selecting by minimum edit distance

10/22/13

Petar Ristoski, Heiko Paulheim

3
Linking to LOD Datasets
•

Linking to GADM
– searching by name turned out to be error-prone
– searching by coordinates (from DBpedia) is precise
• but suffers from low recall
– two-stage approach:
• searching by coordinates
• searching by average coordinates of all linked objects (in DBpedia)

•

Total figures:
– France regions: 27/27 DBpedia, 26/27 Eurostat, 27/27 GADM
– France departments: 101/101 DBpedia, 101/101 GADM
– Australia states: 8/9 DBpedia, 9/9 GADM
– Australia SA3/SA4: no satisfying results, discarded

10/22/13

Petar Ristoski, Heiko Paulheim

4
Feature Extraction
•

Once the links have been created
– get polygon shapes from GADM (for visualization)
– get datatype properties from Eurostat/DBpedia
– get direct types from DBpedia (incl. YAGO types)
– get qualified relations from DBpedia

•

Using information from Linked Geo Data
– extract objects within GADM polygon, aggregate by type
(e.g., region contains 125 police stations)
– spatial queries only possible with rectangles
– workaround: use minimum enclosing rectangle and filter afterwards

10/22/13

Petar Ristoski, Heiko Paulheim

5
Visualization with GADM Polygons
•

Polygons from GADM allow for visualization
of unemployment on maps

10/22/13

Petar Ristoski, Heiko Paulheim

6
Finding Correlations
•

Using extracted features to find interesting correlations

•

Example correlation for unemployment in France:
– African islands, Islands in the Indian Ocean,
Outermost regions of the EU (positive)
– GDP (negative)
– Disposable income (negative)
– Hospital beds/inhabitants (negative)
– RnD spendings (negative)
– Energy consumption (negative)
– Population growth (positive)
– Casualties in traffic accidents (negative)
– Fast food restaurants (positive)
– Police stations (positive)

10/22/13

Petar Ristoski, Heiko Paulheim

7
Visualization Correlations
•

e.g., unemployment rate ~ number of police stations

10/22/13

Petar Ristoski, Heiko Paulheim

8
Tools
•

FeGeLOD/Explain-a-LOD (ESWC 2012: best demo award)

10/22/13

Petar Ristoski, Heiko Paulheim

9
Tools
•

RapidMiner Linked Open Data Extension (2013)

10/22/13

Petar Ristoski, Heiko Paulheim

10
Assets
•

New modules for RapidMiner LOD extension
– DBpedia Lookup linker
– Label-based linker

•

New links for DBpedia 3.9 release
– DBpedia – GADM (39,000 links)

10/22/13

Petar Ristoski, Heiko Paulheim

11
Conclusions & Lessons Learned
•

Linked Open Data provides useful background knowledge
– For finding explanations
– For creating visualizations

•

Some data sources are more suitable than others
– official data sources (e.g. Eurostat) provide best results

10/22/13

Petar Ristoski, Heiko Paulheim

12
Conclusions & Lessons Learned
•

Negative correlation: traffic accident casualties ~ unemployment
– Fight unemployment by increasing traffic accidents?

http://xkcd.com/552/

10/22/13

Petar Ristoski, Heiko Paulheim

13
Analyzing Statistics
with Background Knowledge
from Linked Open Data

10/22/13 Ristoski, Heiko Paulheim
Petar Ristoski, Heiko Paulheim
Petar

14

More Related Content

Similar to Analyzing Statistics with Background Knowledge from Linked Open Data

Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerHeiko Paulheim
 
Osm presentation workshop 19 sept 2018
Osm presentation workshop 19 sept 2018Osm presentation workshop 19 sept 2018
Osm presentation workshop 19 sept 2018osimod
 
INSPIRE Data harmonisation : methodology and tools
INSPIRE Data harmonisation : methodology and toolsINSPIRE Data harmonisation : methodology and tools
INSPIRE Data harmonisation : methodology and toolsGIM_nv
 
EMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniquesEMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniquesPiet J.H. Daas
 
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC
 
P. Struijs, Toward the Use of Big Data for European Statistics
P. Struijs, Toward the Use of Big Data for European StatisticsP. Struijs, Toward the Use of Big Data for European Statistics
P. Struijs, Toward the Use of Big Data for European StatisticsIstituto nazionale di statistica
 
Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...
Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...
Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...Platforma Otwartej Nauki
 
Jean claude burgelman implications of open data
Jean claude burgelman implications of open dataJean claude burgelman implications of open data
Jean claude burgelman implications of open dataPlatforma Otwartej Nauki
 
08 wp7 progresses&results-20130221
08 wp7 progresses&results-2013022108 wp7 progresses&results-20130221
08 wp7 progresses&results-20130221fruitbreedomics
 
Statistics project outline team the oni
Statistics project outline team the oniStatistics project outline team the oni
Statistics project outline team the oniPark JunPyo
 
DS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesDS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesPetar Ristoski
 
4. empirical and practical issues
4. empirical and practical issues4. empirical and practical issues
4. empirical and practical issuesLandDegradation
 
Data accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphereData accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphereAlex Hardisty
 
Data in Switzerland: BFS at OKCon 2013
Data in Switzerland: BFS at OKCon 2013Data in Switzerland: BFS at OKCon 2013
Data in Switzerland: BFS at OKCon 2013CH_Bundesarchiv
 
Item 2 : Results of the Spectral Soil Data - Needs and capacities questionnaires
Item 2 : Results of the Spectral Soil Data - Needs and capacities questionnairesItem 2 : Results of the Spectral Soil Data - Needs and capacities questionnaires
Item 2 : Results of the Spectral Soil Data - Needs and capacities questionnairesSoils FAO-GSP
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...Statistisk sentralbyrå
 
TFMAP: Optimizing MAP for Top-N Context-aware Recommendation
TFMAP: Optimizing MAP for Top-N Context-aware RecommendationTFMAP: Optimizing MAP for Top-N Context-aware Recommendation
TFMAP: Optimizing MAP for Top-N Context-aware RecommendationAlexandros Karatzoglou
 
DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in ...
DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in ...DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in ...
DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in ...Heiko Paulheim
 

Similar to Analyzing Statistics with Background Knowledge from Linked Open Data (20)

Mining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMinerMining the Web of Linked Data with RapidMiner
Mining the Web of Linked Data with RapidMiner
 
Osm presentation workshop 19 sept 2018
Osm presentation workshop 19 sept 2018Osm presentation workshop 19 sept 2018
Osm presentation workshop 19 sept 2018
 
INSPIRE Data harmonisation : methodology and tools
INSPIRE Data harmonisation : methodology and toolsINSPIRE Data harmonisation : methodology and tools
INSPIRE Data harmonisation : methodology and tools
 
EMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniquesEMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniques
 
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
PaNOSC Overview - ExPaNDS kick-off meeting - September 2019
 
P. Struijs, Toward the Use of Big Data for European Statistics
P. Struijs, Toward the Use of Big Data for European StatisticsP. Struijs, Toward the Use of Big Data for European Statistics
P. Struijs, Toward the Use of Big Data for European Statistics
 
Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...
Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...
Open Research Data: Present and planned EC Policy, Jean-Claude Burgelman impl...
 
Jean claude burgelman implications of open data
Jean claude burgelman implications of open dataJean claude burgelman implications of open data
Jean claude burgelman implications of open data
 
08 wp7 progresses&results-20130221
08 wp7 progresses&results-2013022108 wp7 progresses&results-20130221
08 wp7 progresses&results-20130221
 
Statistics project outline team the oni
Statistics project outline team the oniStatistics project outline team the oni
Statistics project outline team the oni
 
DS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spacesDS2014: Feature selection in hierarchical feature spaces
DS2014: Feature selection in hierarchical feature spaces
 
4. empirical and practical issues
4. empirical and practical issues4. empirical and practical issues
4. empirical and practical issues
 
Open statistics Belgium
Open statistics BelgiumOpen statistics Belgium
Open statistics Belgium
 
Data accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphereData accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphere
 
Data in Switzerland: BFS at OKCon 2013
Data in Switzerland: BFS at OKCon 2013Data in Switzerland: BFS at OKCon 2013
Data in Switzerland: BFS at OKCon 2013
 
Item 2 : Results of the Spectral Soil Data - Needs and capacities questionnaires
Item 2 : Results of the Spectral Soil Data - Needs and capacities questionnairesItem 2 : Results of the Spectral Soil Data - Needs and capacities questionnaires
Item 2 : Results of the Spectral Soil Data - Needs and capacities questionnaires
 
Querying Patent Data for Empirical Scholarship : Tools and Strategies
Querying Patent Data for Empirical Scholarship : Tools and StrategiesQuerying Patent Data for Empirical Scholarship : Tools and Strategies
Querying Patent Data for Empirical Scholarship : Tools and Strategies
 
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
BigInsight seminar on Practical Privacy-Preserving Distributed Statistical Co...
 
TFMAP: Optimizing MAP for Top-N Context-aware Recommendation
TFMAP: Optimizing MAP for Top-N Context-aware RecommendationTFMAP: Optimizing MAP for Top-N Context-aware Recommendation
TFMAP: Optimizing MAP for Top-N Context-aware Recommendation
 
DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in ...
DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in ...DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in ...
DBpediaNYD - A Silver Standard Benchmark Dataset for Semantic Relatedness in ...
 

More from Heiko Paulheim

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...Heiko Paulheim
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfHeiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsHeiko Paulheim
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsHeiko Paulheim
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph BlockHeiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Heiko Paulheim
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge GraphsHeiko Paulheim
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphHeiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Heiko Paulheim
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Heiko Paulheim
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsHeiko Paulheim
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterHeiko Paulheim
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingHeiko Paulheim
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the WebHeiko Paulheim
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyHeiko Paulheim
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine LearningHeiko Paulheim
 

More from Heiko Paulheim (20)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
Weakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on TwitterWeakly Supervised Learning for Fake News Detection on Twitter
Weakly Supervised Learning for Fake News Detection on Twitter
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
 

Recently uploaded

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

Analyzing Statistics with Background Knowledge from Linked Open Data

  • 1. Analyzing Statistics with Background Knowledge from Linked Open Data 10/22/13 Ristoski, Heiko Paulheim Petar Ristoski, Heiko Paulheim Petar 1
  • 2. Idea • Background knowledge from LOD can help – finding explanations – creating more sophisticated visualizations • Steps taken – linking the statistics datasets to LOD datasets – DBpedia, Eurostat, GADM, Linked Geo Data – Extracting features • Finding correlations with unemployment rate – using only one target variable for demonstration purposes – works for arbitrary target variables 10/22/13 Petar Ristoski, Heiko Paulheim 2
  • 3. Linking to LOD Datasets • Linking to DBpedia – using DBpedia Lookup – restricting results to Place and AdministrativeArea – select from many results by minimum edit distance • Linking to Eurostat – using SPARQL to query for labels – querying for word 1-grams, 2-grams, … from original labels – selecting by minimum edit distance 10/22/13 Petar Ristoski, Heiko Paulheim 3
  • 4. Linking to LOD Datasets • Linking to GADM – searching by name turned out to be error-prone – searching by coordinates (from DBpedia) is precise • but suffers from low recall – two-stage approach: • searching by coordinates • searching by average coordinates of all linked objects (in DBpedia) • Total figures: – France regions: 27/27 DBpedia, 26/27 Eurostat, 27/27 GADM – France departments: 101/101 DBpedia, 101/101 GADM – Australia states: 8/9 DBpedia, 9/9 GADM – Australia SA3/SA4: no satisfying results, discarded 10/22/13 Petar Ristoski, Heiko Paulheim 4
  • 5. Feature Extraction • Once the links have been created – get polygon shapes from GADM (for visualization) – get datatype properties from Eurostat/DBpedia – get direct types from DBpedia (incl. YAGO types) – get qualified relations from DBpedia • Using information from Linked Geo Data – extract objects within GADM polygon, aggregate by type (e.g., region contains 125 police stations) – spatial queries only possible with rectangles – workaround: use minimum enclosing rectangle and filter afterwards 10/22/13 Petar Ristoski, Heiko Paulheim 5
  • 6. Visualization with GADM Polygons • Polygons from GADM allow for visualization of unemployment on maps 10/22/13 Petar Ristoski, Heiko Paulheim 6
  • 7. Finding Correlations • Using extracted features to find interesting correlations • Example correlation for unemployment in France: – African islands, Islands in the Indian Ocean, Outermost regions of the EU (positive) – GDP (negative) – Disposable income (negative) – Hospital beds/inhabitants (negative) – RnD spendings (negative) – Energy consumption (negative) – Population growth (positive) – Casualties in traffic accidents (negative) – Fast food restaurants (positive) – Police stations (positive) 10/22/13 Petar Ristoski, Heiko Paulheim 7
  • 8. Visualization Correlations • e.g., unemployment rate ~ number of police stations 10/22/13 Petar Ristoski, Heiko Paulheim 8
  • 9. Tools • FeGeLOD/Explain-a-LOD (ESWC 2012: best demo award) 10/22/13 Petar Ristoski, Heiko Paulheim 9
  • 10. Tools • RapidMiner Linked Open Data Extension (2013) 10/22/13 Petar Ristoski, Heiko Paulheim 10
  • 11. Assets • New modules for RapidMiner LOD extension – DBpedia Lookup linker – Label-based linker • New links for DBpedia 3.9 release – DBpedia – GADM (39,000 links) 10/22/13 Petar Ristoski, Heiko Paulheim 11
  • 12. Conclusions & Lessons Learned • Linked Open Data provides useful background knowledge – For finding explanations – For creating visualizations • Some data sources are more suitable than others – official data sources (e.g. Eurostat) provide best results 10/22/13 Petar Ristoski, Heiko Paulheim 12
  • 13. Conclusions & Lessons Learned • Negative correlation: traffic accident casualties ~ unemployment – Fight unemployment by increasing traffic accidents? http://xkcd.com/552/ 10/22/13 Petar Ristoski, Heiko Paulheim 13
  • 14. Analyzing Statistics with Background Knowledge from Linked Open Data 10/22/13 Ristoski, Heiko Paulheim Petar Ristoski, Heiko Paulheim Petar 14