SlideShare a Scribd company logo
1 of 43
Download to read offline
Apache Any23
Anything to Triples
Michele Mostarda <me@michelemostarda.it>, Twitter: @micmos v1.1
Outline
• Scenario
• From Web 1.0 to Web 3.0
• What is RDF
• Machine Readable Annotations
• What is Apache Any23
• Supported Formats
• How does it work
• Web and REST UI
• CLI
• Next Steps
• History of Any23
• How to Contribute
• Slowdown of Web 3.0 and raise of Knowledge Graph
Scenario
Thanks to the promotion of a common set of machine
readable annotation standards such as
Microformats and RDFa, sponsored by the main
global Web players, we’ve observed, starting from
2009, a large adoption of embedded markup for data
structures (products, reviews, posts, people,
organizations, events) in worldwide websites.
The adoption of machine readable markup is the key
enabler to the Web of Data.
For further details about trends please see reference [1].
From Web 1.0 to Web 3.0
• Web 1.0: Web of interconnected readonly content.
• Web 2.0: the read/write Web. Pages are scriptable
and dynamic, APIs enabling cross interactivity.
• Web 3.0: the Web of Data. Complete decoupling
data from presentation. Enables the read/write/
understand mode.
What is RDF (1)
The Resource Description Framework is a formalism to describe
structured data as a graph.
In this graph nodes are both Web entities (URIs) or literal data
(primitive data types) and edges are ontology properties described
as URIs.
Every graph is a composition of statements <Subject, Predicate,
Object> where Subject is both an Entity or a Blank Node, Object can
be an Entity, a Blank Node or a Literal.
RDF was adopted as a W3C recommendation in 1999. The RDF 1.0
specification was published in 2004, the RDF 1.1 specification in
2014.
What is RDF (2)
http://chucknorris.com/data#me
Carlos Ray Norris
foaf:name
http://stevenseagal.com/data#me
foaf:knows
http://brucelee.com/data#me
foaf:knows
45.45385^^xsd:float
10.19273^^xsd:float
geo:Point
rdf:type
foaf:Person
rdf:type
rdf:type
Namespaces
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
foaf: http://xmlns.com/foaf/0.1
geo: http://www.w3.org/2003/01/geo/wgs84_pos#
geo:lat
geo:longStatements
http://chucknorris.com/data#me rdf:Type foaf:Person .
http://chucknorris.com/data#me foaf:name "Carlos Ray Norris" .
http://chucknorris.com/data#me foaf:based_near _loc1 .
_loc1 rdf:type geo:Location .
_loc1 geo:lat 10.19273^^xsd:float .
_loc1 geo:long 45.45385^^xsd:float .
http://chucknorris.com/data#me foaf:knows http://stevenseagal.com/data#me .
http://chucknorris.com/data#me foaf:knows http://brucelee.com/data#me .
http://stevenseagal.com/data#me rdf:Type foaf:Person .
http://brucelee.com/data#me rdf:Type foaf:Person .
foaf:based_near
rdf:type
_loc1
Machine Readable
Annotations
Enable machines (web crawlers, web scrapers,
browsers, etc) to understand page contents in
unambiguous way.
Usually expressed as invisible HTML markup
around visible content.
Microformats
• First attempt to standardize structured markup
in Web content.
• Don’t specifies any ontology, just a simplified
data model.
http://microformats.org/
RDFa
• RDF in Attributes. Set of standard (W3C) XML attributes
to embed RDF statements within visual markup.
• W3C Specification with many variants:
• RDFa 1.0: applicable only on XML formats (SVG,
XHTML)
• RDFa 1.1: extension of RDFa 1.0 to non XML language
domains.
• RDFa Lite: minimal subset of attributes for general
markup operations.
https://www.w3.org/TR/rdfa-syntax/
Microdata
• W3C Specification concurrent to (and simplifying)
RDFa.
• Mainly adopted as Open Standard by main search
engines with the publication of Schema.org.
https://dev.w3.org/html5/md-LC
Schema.org (1)
Joint effort of Google, Microsoft, Yahoo and Yandex
to define a general purpose aligned vocabulary for
RDFa, Microdata and JSON-LD.
http://schema.org/
Schema.org (2)
http://schema.org/Restaurant
What is Apache Any23
Any23 is a library, a Web Service and a Command Line
Tool written in Java (1.6), that extracts structured RDF
data from a variety of Web documents and markup
formats.
Any23 is an Apache Software Foundation top level project.
http://any23.apache.org/
Purpose
Purpose of Any23 is manifold, principal applications can be
identified in the following areas:
• Semantic Web scraping: scraping of of entire websites (or
contiguous parts of them) to get a materialization of the
underlying Web 3.0 data structures.
• Page navigation enhancement end enrichment: identification
of relevant entities during page navigation to be used to perform
live content enhancement (point out relevant information) and
enrichment (add relevant information from other sources).
• Data mashup: mix of contents from different pages and
websites to get a unified intelligent data view.
Supported Formats
Any23 currently supports the following input formats.
• RDF Specific formats: RDF/XML, Turtle, Notation3.
• RDFa with RDFa1.1 (prefix mechanism).
• Microformats: Adr, Geo, hCalendar, hCard, hListing, hRecipe, hReview, License, XFN and
Species.
• HTML5 Microdata, such as Schema.org.
• JSON-LD: JSON for Linking Data: a lightweight Linked Data format based on the already
successful JSON format and provides a way to help JSON data interoperate at Web-scale.
• CSV: Comma Separated Values.
• Vocabularies: Extraction support for CSV, Dublin Core Terms, Description of a Career,
Description Of A Project, Friend Of A Friend, GEO Names, ICAL, lkif-core, Open Graph
Protocol, BBC Programmes Ontology, RDF Review Vocabulary, schema.org, VCard, BBC
Wildlife Ontology and XHTML.
How does it work
Reporting
RDF
CLI
org.apache.any23.cli
Plugin Management
org.apache.any23.plugin
Serialization
Turtle Writer
NQuads Writer
org.apache.any23.writer
6
Activated Extractors
RDFa
Extractor
Microdata
Extractor
H*
Extractor
4
Web
rover extractor crawlervocab...
Web
Service
org.apache.any23.service
REST API
Data Fetching
org.apache.any23.source
1
MIME Type Detection
org.apache.any23.mime
2
ContentValidation and Patching
org.apache.any23.validator
3
Data Extraction
org.apache.any23.extractor
8
a b c
Filtering Chain
Filter 1
org.apache.any23.filter
Filter N...
7
9
Validation and Fixing Rules
About
NotUriRule
MetaName
MisuseRule
OpenGraph
Namespace Fix
5
Data Fetching
Responsible for negotiating content retrieval
with data source.
Data Fetching
org.apache.any23.source
1
MIME Type Detection
• Identify data MIME Type if not explicitly
specified during fetching phase.
• Uses Apache Tika, a rule-based content
analysis toolkit.
Activated Extractors
RDFa
Extractor
Microdata
Extractor
H*
Extractor
MIME Type Detection
org.apache.any23.mime
2
Content Validation and
Patching
• Most Web markup contains errors or well
known structural issues, time to fix them
before parsing.
• Use an extensible rule based engine to
perform some common fixes.
For further details about common web markup issues please refer to reference [3].
ContentValidation and Patching
org.apache.any23.validator
3
Validation and Fixing Rules
About
NotUriRule
MetaName
MisuseRule
OpenGraph
Namespace Fix
5
Data Extraction
• All Extractors associated to the detected MIME-Type
are activated.
• These Extractors inspect the source content, detect
specific data structures and emit these as RDF
statements.
• Extractors also provide warnings and errors whether
they cannot accomplish their tasks.
Reporting
Activated Extractors
RDFa
Extractor
Microdata
Extractor
H*
Extractor
Data Extraction
org.apache.any23.extractor
8
Filtering
• Optionally it is possible to exclude some tipe of
information which is not interesting for the purpose
of extraction.
• Filtering is based on RDF Ontology prefixing or on
programmatic Filter logic.
Filtering Chain
Filter 1
org.apache.any23.filter
Filter N...
7
Serialization
• The statements generated along the pipeline can
be finally converted in some of the available RDF
transport formats.
RDF
Serialization
Turtle Writer
NQuads Writer
org.apache.any23.writer
6
Service Module
•Provides the REST API and the Web UI.
Web
REST API Service
org.apache.any23.service
a
CLI Module
• Command Line Interface for Any23.
• Exposes main functionalities like:
• rover: equivalent to Any23 REST Service
• extractor: supports CLI testing for specific extractor
• vocab: access to the entire RDF vocabulary
supported by Any23 modules
• crawler: simple Web crawler to scrape markup content
from small websites.
CLI
org.apache.any23.cli
b
rover extractor crawlervocab...
9
Plugin Management
Most Any23 components (Extractors, Filters,
ValidationRules, Fixes) can be added as external plugins
just registering them in the Any23 classpath.
Plugin Management
org.apache.any23.plugin
c
Web and REST UI
Web UI (1)
Web UI (2)
REST UI (1)
REST UI (2)
REST UI (3)
Plain Output Format
@prefix : <http://xmlns.com/foaf/0.1/> .
@prefix B: <https://www.w3.org/People/Berners-Lee/> .
@prefix Be: <http://www.w3.org/People/Berners-Lee/> .
@prefix blog: <http://dig.csail.mit.edu/breadcrumbs/blog/> .
@prefix card: <https://www.w3.org/People/Berners-Lee/card#> .
@prefix cc: <http://creativecommons.org/ns#> .
@prefix cert: <http://www.w3.org/ns/auth/cert#> .
@prefix con: <http://www.w3.org/2000/10/swap/pim/contact#> .
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix dct: <http://purl.org/dc/terms/> .
@prefix doap: <http://usefulinc.com/ns/doap#> .
[…]
blog:4 dc:title "timbl's blog" ;
s:seeAlso <http://dig.csail.mit.edu/breadcrumbs/blog/feed/4> ;
:maker card:i .
<http://dig.csail.mit.edu/data#DIG> :member card:i .
<http://wiki.ontoworld.org/index.php/_IRW2006> dc:title "Identity, Reference and the Web workshop 2006" ;
con:participant card:i .
<http://www.ecs.soton.ac.uk/~dt2/dlstuff/www2006_data#panel-panelk01> s:label "The Next Wave of the Web (Plenary
Panel)" ;
con:participant card:i .
<http://www.w3.org/2000/10/swap/data#Cwm> doap:developer card:i .
<http://www.w3.org/2011/Talks/0331-hyderabad-tbl/data#talk> dct:title "Designing the Web for an Open Society" ;
:maker card:i .
//www4.wiwiss.fu-berlin.de/booksMeshup/books/006251587X> dc:creator card:i ;
dc:title "Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web" .
<https://timbl.com/timbl/Public/friends.ttl> a :PersonalProfileDocument ;
cc:license <http://creativecommons.org/licenses/by-nc/3.0/> ;
dc:title "Tim Berners-Lee's editable FOAF file" ;
:maker card:i ;
:primaryTopic card:i .
[…]
Report Output Format
<?xml version="1.0" encoding="UTF-8" ?>
<response>
<extractors>
<extractor>extractor1-name</extractor>
<extractor>extractor2-name</extractor>
</extractors>
<report>
<message/>
<error/>
<issueReport>
</issueReport>
<validationReport>
<errors>
</errors>
<issues>
</issues>
<ruleActivations>
</ruleActivations>
</validationReport>
</report>
<data>
<![CDATA[
plain-output-format-here
]]>
</data>
</response>
CLI (1)
bin/any23 —help
Get help
Usage: any23 [options] [command] [command options]
Options:
-h, --help
Display help information.
Default: false
--plugins-dir
The Any23 plugins directory.
Default: /Users/hardest/.any23/plugins
-X, --verbose
Produce execution verbose output.
Default: false
-v, --version
Display version information.
Default: false
Commands:
extractor Utility for obtaining documentation about metadata extractors.
Usage: extractor [options] Extractor name
Options:
-a, --all
shows a report about all available extractors
Default: false
-i, --input
shows example input for the given extractor
Default: false
-l, --list
shows the names of all available extractors
Default: false
[…]
CLI (2)
bin/any23 extractor -a
Get input output samples for every registered Extractor
[…]
Extractor: rdf-jsonld
type: ContentExtractor
-------- Example Input --------
{
"@context": {
"name": "http://xmlns.com/foaf/0.1/name",
"knows": "http://xmlns.com/foaf/0.1/knows"
},
"@id": "http://me.markus-lanthaler.com/",
"name": "Markus Lanthaler",
"knows": [
{
"@id": "http://manu.sporny.org/about#manu",
"name": "Manu Sporny"
},
{
"name": "Dave Longley"
}
]
}
[…]
-------- Example Output --------
<http://me.markus-lanthaler.com/> <http://xmlns.com/foaf/0.1/knows> <http://manu.sporny.org/about#manu> ,
_:b0 ;
<http://xmlns.com/foaf/0.1/name> "Markus Lanthaler"^^<http://www.w3.org/2001/XMLSchema#string> .
<http://manu.sporny.org/about#manu> <http://xmlns.com/foaf/0.1/name> "Manu Sporny"^^<http://www.w3.org/2001/
XMLSchema#string> .
_:b0 <http://xmlns.com/foaf/0.1/name> "Dave Longley"^^<http://www.w3.org/2001/XMLSchema#string> .
[…]
CLI (3)
bin/any23 rover http://www.w3.org/People/Berners-Lee/card
Extract all possible data from a specific URL
{
"quads": [
[{
"type": "uri",
"value": "http://www.w3.org/DesignIssues/Overview.html"
}, "http://purl.org/dc/elements/1.1/title", {
"type": "literal",
"value": "Design Issues for the World Wide Web",
"lang": null,
"datatype": null
}, null],
[{
"type": "uri",
"value": "http://www.w3.org/DesignIssues/Overview.html"
}, "http://xmlns.com/foaf/0.1/maker", {
"type": "uri",
"value": "https://www.w3.org/People/Berners-Lee/card#i"
}, null],
[{
"type": "uri",
"value": "http://www.w3.org/People/Berners-Lee/card"
}, "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", {
"type": "uri",
"value": "http://xmlns.com/foaf/0.1/PersonalProfileDocument"
}, null],
[…]
CLI (4)
bin/any23 vocab -f turtle
Dump the Any23 internal vocabulary (in Turtle format)
<http://www.estrellaproject.org/lkif-core/role.owl#Role> a <http://www.w3.org/
2000/01/rdf-schema#Class> ;
<http://www.w3.org/2000/01/rdf-schema#member> <http://www.estrellaproject.org/
lkif-core/role.owl#> .
<http://www.estrellaproject.org/lkif-core/role.owl#Function> a <http://www.w3.org/
2000/01/rdf-schema#Class> ;
<http://www.w3.org/2000/01/rdf-schema#member> <http://www.estrellaproject.org/
lkif-core/role.owl#> .
<http://www.estrellaproject.org/lkif-core/role.owl#Social_Role> a <http://www.w3.org/
2000/01/rdf-schema#Class> ;
<http://www.w3.org/2000/01/rdf-schema#member> <http://www.estrellaproject.org/
lkif-core/role.owl#> .
[…]
Next Steps
• Introduce a scriptable DSL to speedup
development of Extractors, Rules and Fixes.
• Introduce a public repo to croudify plugin
development leveraging on cloning / pulling
mechanism.
How to Contribute
http://any23.apache.org/developers.html
http://www.apache.org/foundation/how-it-works.html
Bit of history
• Any23 starts in 2008 in DERI (now Insights Centre,
insight-centre.org), become part of the sindice.com
data acquisition pipeline.
• In 2010 is promoted as Apache Incubator project
(incubator.apache.org/projects/any23.html), in
2012 is it promoted to top level project
(any23.apache.org).
Sindice.com and
SindiceTech
• Sindice.com has been an attempt to build the first
Semantic Search Engine. Operating from 2007 to 2014
under Insight Centre (former DERI), at the end it had
indexed 700M semantically annotated Web pages
with a refresh rate of 20M pages per day for a total of
6B statements.
• SindiceTech is a startup with mission of
commercialize the technologies developed along the
Sindice.com experience.
Slowdown of Web 3.0 and Raise of
Knowledge Graph
• Many issues in massive adoption of Machine Readable embedded markup in
Web contents.
• Most data publishers don’t have the culture of structured data.
• Even worst, most data publishers add markup only for a small subset of data
available in their Web pages because afraid of the loss of traffic.
• the Web of Data model still needs to define a valuable business model.
• Since 2012, thanks to the Schema.org initiative and HTTP friendly formats like
JSON-LD, the Web of Data is living a new momentum.
• Schema.org and other collective initiatives like WikiData are the pillars of the
Google Knowledge Graph, which is a proprietary alternative to DBpedia
and Linked Data.
References
• [1] Web Data Commons - RDFa, Microdata, and Microformat
Data Sets (http://webdatacommons.org/structureddata)
• [2] Deployment of RDFa, Microdata, and Microformats on the
Web – A Quantitative Analysis (http://hannes.muehleisen.org/
Bizer-etal-DeploymentRDFaMicrodataMicroformats-ISWC-
InUse-2013.pdf)
• [3] Heuristics for Fixing Common Errors in Deployed
schema.org Microdata (http://dws.informatik.uni-
mannheim.de/fileadmin/lehrstuehle/ki/pub/MeuselPaulheim-
HeuristicsForFixingCommonErrorsInDeployedSchemaOrgMi
crodata-ESWC2015.pdf)

More Related Content

What's hot

Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Jeremy Zawodny
 
Using MongoDB and a Relational Database at MongoDB Day
Using MongoDB and a Relational Database at MongoDB DayUsing MongoDB and a Relational Database at MongoDB Day
Using MongoDB and a Relational Database at MongoDB Dayhayesdavis
 
Developing CouchApps
Developing CouchAppsDeveloping CouchApps
Developing CouchAppswesthoff
 
Dcm#8 elastic search
Dcm#8  elastic searchDcm#8  elastic search
Dcm#8 elastic searchIvan Wallarm
 
Strengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDBStrengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDBlehresman
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Jeremy Zawodny
 
Difference between xml and json
Difference between xml and jsonDifference between xml and json
Difference between xml and jsonUmar Ali
 
Non-Framework MVC sites with PHP
Non-Framework MVC sites with PHPNon-Framework MVC sites with PHP
Non-Framework MVC sites with PHPCésar Rodas
 
MongoDB Mojo: Building a Basic Perl App
MongoDB Mojo: Building a Basic Perl AppMongoDB Mojo: Building a Basic Perl App
MongoDB Mojo: Building a Basic Perl AppStennie Steneker
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive
 
NiFi - First approach
NiFi - First approachNiFi - First approach
NiFi - First approachMickael Cassy
 
RESTful web
RESTful webRESTful web
RESTful webAlvin Qi
 
Exciting JavaScript - Part II
Exciting JavaScript - Part IIExciting JavaScript - Part II
Exciting JavaScript - Part IIEugene Lazutkin
 
Introduction to MongoDB (from Austin Code Camp)
Introduction to MongoDB (from Austin Code Camp)Introduction to MongoDB (from Austin Code Camp)
Introduction to MongoDB (from Austin Code Camp)Chris Edwards
 
Using Sphinx for Search in PHP
Using Sphinx for Search in PHPUsing Sphinx for Search in PHP
Using Sphinx for Search in PHPMike Lively
 
Connecting to a REST API in iOS
Connecting to a REST API in iOSConnecting to a REST API in iOS
Connecting to a REST API in iOSgillygize
 
Fedora 4 Deep Dive
Fedora 4 Deep DiveFedora 4 Deep Dive
Fedora 4 Deep DiveDavid Wilcox
 

What's hot (20)

KeyValue Stores
KeyValue StoresKeyValue Stores
KeyValue Stores
 
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
 
Using MongoDB and a Relational Database at MongoDB Day
Using MongoDB and a Relational Database at MongoDB DayUsing MongoDB and a Relational Database at MongoDB Day
Using MongoDB and a Relational Database at MongoDB Day
 
Developing CouchApps
Developing CouchAppsDeveloping CouchApps
Developing CouchApps
 
Dcm#8 elastic search
Dcm#8  elastic searchDcm#8  elastic search
Dcm#8 elastic search
 
Strengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDBStrengths and Weaknesses of MongoDB
Strengths and Weaknesses of MongoDB
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012
 
Difference between xml and json
Difference between xml and jsonDifference between xml and json
Difference between xml and json
 
Non-Framework MVC sites with PHP
Non-Framework MVC sites with PHPNon-Framework MVC sites with PHP
Non-Framework MVC sites with PHP
 
MongoDB Mojo: Building a Basic Perl App
MongoDB Mojo: Building a Basic Perl AppMongoDB Mojo: Building a Basic Perl App
MongoDB Mojo: Building a Basic Perl App
 
The Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDBThe Hive Think Tank: Rocking the Database World with RocksDB
The Hive Think Tank: Rocking the Database World with RocksDB
 
NiFi - First approach
NiFi - First approachNiFi - First approach
NiFi - First approach
 
RESTful web
RESTful webRESTful web
RESTful web
 
Exciting JavaScript - Part II
Exciting JavaScript - Part IIExciting JavaScript - Part II
Exciting JavaScript - Part II
 
Rails meets no sql
Rails meets no sqlRails meets no sql
Rails meets no sql
 
Introduction to MongoDB (from Austin Code Camp)
Introduction to MongoDB (from Austin Code Camp)Introduction to MongoDB (from Austin Code Camp)
Introduction to MongoDB (from Austin Code Camp)
 
Using Sphinx for Search in PHP
Using Sphinx for Search in PHPUsing Sphinx for Search in PHP
Using Sphinx for Search in PHP
 
Connecting to a REST API in iOS
Connecting to a REST API in iOSConnecting to a REST API in iOS
Connecting to a REST API in iOS
 
MongoDB
MongoDBMongoDB
MongoDB
 
Fedora 4 Deep Dive
Fedora 4 Deep DiveFedora 4 Deep Dive
Fedora 4 Deep Dive
 

Viewers also liked

A Machine learning based Data Quality Analysis Approach
A Machine learning based Data Quality Analysis ApproachA Machine learning based Data Quality Analysis Approach
A Machine learning based Data Quality Analysis ApproachNandeep Nagarkar
 
A War Story: Porting Android 4.0 to a Custom Board (ELCE 2012)
A War Story: Porting Android 4.0 to a Custom Board (ELCE 2012)A War Story: Porting Android 4.0 to a Custom Board (ELCE 2012)
A War Story: Porting Android 4.0 to a Custom Board (ELCE 2012)Matthias Brugger
 
Adobe AEM CQ5 - Developer Introduction
Adobe AEM CQ5 - Developer IntroductionAdobe AEM CQ5 - Developer Introduction
Adobe AEM CQ5 - Developer IntroductionYash Mody
 
seo-150 edu site
seo-150 edu siteseo-150 edu site
seo-150 edu sitesami dib
 
Sample_Energy_Survey_Report
Sample_Energy_Survey_ReportSample_Energy_Survey_Report
Sample_Energy_Survey_ReportBrian T. Gaudet
 
Creating Value Beyond the Firm's Boundaries: Networks, Social Media, and Virt...
Creating Value Beyond the Firm's Boundaries: Networks, Social Media, and Virt...Creating Value Beyond the Firm's Boundaries: Networks, Social Media, and Virt...
Creating Value Beyond the Firm's Boundaries: Networks, Social Media, and Virt...Robin Teigland
 
Mit csail-tr-2007-034
Mit csail-tr-2007-034Mit csail-tr-2007-034
Mit csail-tr-2007-034vafopoulos
 
bbliografia.doc
bbliografia.docbbliografia.doc
bbliografia.docbutest
 
058 q humanity_earth_soil[1]
058 q humanity_earth_soil[1]058 q humanity_earth_soil[1]
058 q humanity_earth_soil[1]Dora SCerruti
 
Leven In Media 2010
Leven In Media 2010Leven In Media 2010
Leven In Media 2010Mark Deuze
 
The Drought Tolerant Garden - Monterey, California
The Drought Tolerant Garden - Monterey, CaliforniaThe Drought Tolerant Garden - Monterey, California
The Drought Tolerant Garden - Monterey, CaliforniaDanousis85z
 
Academic Blogging. How? Why? What?
Academic Blogging. How? Why? What?Academic Blogging. How? Why? What?
Academic Blogging. How? Why? What?Chris Rowell
 
02 problem solving_02
02 problem solving_0202 problem solving_02
02 problem solving_02Nika Stuard
 
Interesting Times: Will Business Survive?
Interesting Times: Will Business Survive?Interesting Times: Will Business Survive?
Interesting Times: Will Business Survive?Ben Tomhave
 

Viewers also liked (20)

A Machine learning based Data Quality Analysis Approach
A Machine learning based Data Quality Analysis ApproachA Machine learning based Data Quality Analysis Approach
A Machine learning based Data Quality Analysis Approach
 
A War Story: Porting Android 4.0 to a Custom Board (ELCE 2012)
A War Story: Porting Android 4.0 to a Custom Board (ELCE 2012)A War Story: Porting Android 4.0 to a Custom Board (ELCE 2012)
A War Story: Porting Android 4.0 to a Custom Board (ELCE 2012)
 
Introduction to CQ5
Introduction to CQ5Introduction to CQ5
Introduction to CQ5
 
Adobe AEM CQ5 - Developer Introduction
Adobe AEM CQ5 - Developer IntroductionAdobe AEM CQ5 - Developer Introduction
Adobe AEM CQ5 - Developer Introduction
 
Data in RDF
Data in RDFData in RDF
Data in RDF
 
Week1
Week1Week1
Week1
 
seo-150 edu site
seo-150 edu siteseo-150 edu site
seo-150 edu site
 
Police.power
Police.powerPolice.power
Police.power
 
Sample_Energy_Survey_Report
Sample_Energy_Survey_ReportSample_Energy_Survey_Report
Sample_Energy_Survey_Report
 
eme2040 Day One 15 May2008
eme2040 Day One 15 May2008eme2040 Day One 15 May2008
eme2040 Day One 15 May2008
 
Creating Value Beyond the Firm's Boundaries: Networks, Social Media, and Virt...
Creating Value Beyond the Firm's Boundaries: Networks, Social Media, and Virt...Creating Value Beyond the Firm's Boundaries: Networks, Social Media, and Virt...
Creating Value Beyond the Firm's Boundaries: Networks, Social Media, and Virt...
 
Asis13
Asis13Asis13
Asis13
 
Mit csail-tr-2007-034
Mit csail-tr-2007-034Mit csail-tr-2007-034
Mit csail-tr-2007-034
 
bbliografia.doc
bbliografia.docbbliografia.doc
bbliografia.doc
 
058 q humanity_earth_soil[1]
058 q humanity_earth_soil[1]058 q humanity_earth_soil[1]
058 q humanity_earth_soil[1]
 
Leven In Media 2010
Leven In Media 2010Leven In Media 2010
Leven In Media 2010
 
The Drought Tolerant Garden - Monterey, California
The Drought Tolerant Garden - Monterey, CaliforniaThe Drought Tolerant Garden - Monterey, California
The Drought Tolerant Garden - Monterey, California
 
Academic Blogging. How? Why? What?
Academic Blogging. How? Why? What?Academic Blogging. How? Why? What?
Academic Blogging. How? Why? What?
 
02 problem solving_02
02 problem solving_0202 problem solving_02
02 problem solving_02
 
Interesting Times: Will Business Survive?
Interesting Times: Will Business Survive?Interesting Times: Will Business Survive?
Interesting Times: Will Business Survive?
 

Similar to Apache Any23 - Anything to Triples

Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosEUCLID project
 
Linked Media Management with Apache Marmotta
Linked Media Management with Apache MarmottaLinked Media Management with Apache Marmotta
Linked Media Management with Apache MarmottaThomas Kurz
 
Linked services: Connecting services to the Web of Data
Linked services: Connecting services to the Web of DataLinked services: Connecting services to the Web of Data
Linked services: Connecting services to the Web of DataJohn Domingue
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015Cason Snow
 
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014Robert Meusel
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataGiorgos Santipantakis
 
A hands on overview of the semantic web
A hands on overview of the semantic webA hands on overview of the semantic web
A hands on overview of the semantic webMarakana Inc.
 
RDFa Introductory Course Session 2/4 How RDFa
RDFa Introductory Course Session 2/4 How RDFaRDFa Introductory Course Session 2/4 How RDFa
RDFa Introductory Course Session 2/4 How RDFaPlatypus
 
Creating web applications with LODSPeaKr
Creating web applications with LODSPeaKrCreating web applications with LODSPeaKr
Creating web applications with LODSPeaKrAlvaro Graves
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commonsJesse Wang
 
Linked data for librarians
Linked data for librariansLinked data for librarians
Linked data for librarianstrevorthornton
 
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data SourcesVirtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sourcesrumito
 
Site Interoperability Projects at DERI Galway's SW Cluster
Site Interoperability Projects at DERI Galway's SW ClusterSite Interoperability Projects at DERI Galway's SW Cluster
Site Interoperability Projects at DERI Galway's SW ClusterJohn Breslin
 

Similar to Apache Any23 - Anything to Triples (20)

Usage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application ScenariosUsage of Linked Data: Introduction and Application Scenarios
Usage of Linked Data: Introduction and Application Scenarios
 
Linked Media Management with Apache Marmotta
Linked Media Management with Apache MarmottaLinked Media Management with Apache Marmotta
Linked Media Management with Apache Marmotta
 
Linked services: Connecting services to the Web of Data
Linked services: Connecting services to the Web of DataLinked services: Connecting services to the Web of Data
Linked services: Connecting services to the Web of Data
 
RDFauthor (EKAW)
RDFauthor (EKAW)RDFauthor (EKAW)
RDFauthor (EKAW)
 
Linked data HHS 2015
Linked data HHS 2015Linked data HHS 2015
Linked data HHS 2015
 
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
A hands on overview of the semantic web
A hands on overview of the semantic webA hands on overview of the semantic web
A hands on overview of the semantic web
 
RDFa Introductory Course Session 2/4 How RDFa
RDFa Introductory Course Session 2/4 How RDFaRDFa Introductory Course Session 2/4 How RDFa
RDFa Introductory Course Session 2/4 How RDFa
 
How RDFa works
How RDFa worksHow RDFa works
How RDFa works
 
Creating web applications with LODSPeaKr
Creating web applications with LODSPeaKrCreating web applications with LODSPeaKr
Creating web applications with LODSPeaKr
 
Semantic Web talk TEMPLATE
Semantic Web talk TEMPLATESemantic Web talk TEMPLATE
Semantic Web talk TEMPLATE
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Linked data for librarians
Linked data for librariansLinked data for librarians
Linked data for librarians
 
Semantic web
Semantic webSemantic web
Semantic web
 
Web Topics
Web TopicsWeb Topics
Web Topics
 
Linked services
Linked servicesLinked services
Linked services
 
Linked data and voyager
Linked data and voyagerLinked data and voyager
Linked data and voyager
 
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data SourcesVirtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
Virtuoso Sponger - RDFizer Middleware for creating RDF from non RDF Data Sources
 
Site Interoperability Projects at DERI Galway's SW Cluster
Site Interoperability Projects at DERI Galway's SW ClusterSite Interoperability Projects at DERI Galway's SW Cluster
Site Interoperability Projects at DERI Galway's SW Cluster
 

Recently uploaded

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 

Recently uploaded (20)

04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 

Apache Any23 - Anything to Triples

  • 1. Apache Any23 Anything to Triples Michele Mostarda <me@michelemostarda.it>, Twitter: @micmos v1.1
  • 2. Outline • Scenario • From Web 1.0 to Web 3.0 • What is RDF • Machine Readable Annotations • What is Apache Any23 • Supported Formats • How does it work • Web and REST UI • CLI • Next Steps • History of Any23 • How to Contribute • Slowdown of Web 3.0 and raise of Knowledge Graph
  • 3. Scenario Thanks to the promotion of a common set of machine readable annotation standards such as Microformats and RDFa, sponsored by the main global Web players, we’ve observed, starting from 2009, a large adoption of embedded markup for data structures (products, reviews, posts, people, organizations, events) in worldwide websites. The adoption of machine readable markup is the key enabler to the Web of Data. For further details about trends please see reference [1].
  • 4. From Web 1.0 to Web 3.0 • Web 1.0: Web of interconnected readonly content. • Web 2.0: the read/write Web. Pages are scriptable and dynamic, APIs enabling cross interactivity. • Web 3.0: the Web of Data. Complete decoupling data from presentation. Enables the read/write/ understand mode.
  • 5. What is RDF (1) The Resource Description Framework is a formalism to describe structured data as a graph. In this graph nodes are both Web entities (URIs) or literal data (primitive data types) and edges are ontology properties described as URIs. Every graph is a composition of statements <Subject, Predicate, Object> where Subject is both an Entity or a Blank Node, Object can be an Entity, a Blank Node or a Literal. RDF was adopted as a W3C recommendation in 1999. The RDF 1.0 specification was published in 2004, the RDF 1.1 specification in 2014.
  • 6. What is RDF (2) http://chucknorris.com/data#me Carlos Ray Norris foaf:name http://stevenseagal.com/data#me foaf:knows http://brucelee.com/data#me foaf:knows 45.45385^^xsd:float 10.19273^^xsd:float geo:Point rdf:type foaf:Person rdf:type rdf:type Namespaces rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns# foaf: http://xmlns.com/foaf/0.1 geo: http://www.w3.org/2003/01/geo/wgs84_pos# geo:lat geo:longStatements http://chucknorris.com/data#me rdf:Type foaf:Person . http://chucknorris.com/data#me foaf:name "Carlos Ray Norris" . http://chucknorris.com/data#me foaf:based_near _loc1 . _loc1 rdf:type geo:Location . _loc1 geo:lat 10.19273^^xsd:float . _loc1 geo:long 45.45385^^xsd:float . http://chucknorris.com/data#me foaf:knows http://stevenseagal.com/data#me . http://chucknorris.com/data#me foaf:knows http://brucelee.com/data#me . http://stevenseagal.com/data#me rdf:Type foaf:Person . http://brucelee.com/data#me rdf:Type foaf:Person . foaf:based_near rdf:type _loc1
  • 7. Machine Readable Annotations Enable machines (web crawlers, web scrapers, browsers, etc) to understand page contents in unambiguous way. Usually expressed as invisible HTML markup around visible content.
  • 8. Microformats • First attempt to standardize structured markup in Web content. • Don’t specifies any ontology, just a simplified data model. http://microformats.org/
  • 9. RDFa • RDF in Attributes. Set of standard (W3C) XML attributes to embed RDF statements within visual markup. • W3C Specification with many variants: • RDFa 1.0: applicable only on XML formats (SVG, XHTML) • RDFa 1.1: extension of RDFa 1.0 to non XML language domains. • RDFa Lite: minimal subset of attributes for general markup operations. https://www.w3.org/TR/rdfa-syntax/
  • 10. Microdata • W3C Specification concurrent to (and simplifying) RDFa. • Mainly adopted as Open Standard by main search engines with the publication of Schema.org. https://dev.w3.org/html5/md-LC
  • 11. Schema.org (1) Joint effort of Google, Microsoft, Yahoo and Yandex to define a general purpose aligned vocabulary for RDFa, Microdata and JSON-LD. http://schema.org/
  • 13. What is Apache Any23 Any23 is a library, a Web Service and a Command Line Tool written in Java (1.6), that extracts structured RDF data from a variety of Web documents and markup formats. Any23 is an Apache Software Foundation top level project. http://any23.apache.org/
  • 14. Purpose Purpose of Any23 is manifold, principal applications can be identified in the following areas: • Semantic Web scraping: scraping of of entire websites (or contiguous parts of them) to get a materialization of the underlying Web 3.0 data structures. • Page navigation enhancement end enrichment: identification of relevant entities during page navigation to be used to perform live content enhancement (point out relevant information) and enrichment (add relevant information from other sources). • Data mashup: mix of contents from different pages and websites to get a unified intelligent data view.
  • 15. Supported Formats Any23 currently supports the following input formats. • RDF Specific formats: RDF/XML, Turtle, Notation3. • RDFa with RDFa1.1 (prefix mechanism). • Microformats: Adr, Geo, hCalendar, hCard, hListing, hRecipe, hReview, License, XFN and Species. • HTML5 Microdata, such as Schema.org. • JSON-LD: JSON for Linking Data: a lightweight Linked Data format based on the already successful JSON format and provides a way to help JSON data interoperate at Web-scale. • CSV: Comma Separated Values. • Vocabularies: Extraction support for CSV, Dublin Core Terms, Description of a Career, Description Of A Project, Friend Of A Friend, GEO Names, ICAL, lkif-core, Open Graph Protocol, BBC Programmes Ontology, RDF Review Vocabulary, schema.org, VCard, BBC Wildlife Ontology and XHTML.
  • 16. How does it work Reporting RDF CLI org.apache.any23.cli Plugin Management org.apache.any23.plugin Serialization Turtle Writer NQuads Writer org.apache.any23.writer 6 Activated Extractors RDFa Extractor Microdata Extractor H* Extractor 4 Web rover extractor crawlervocab... Web Service org.apache.any23.service REST API Data Fetching org.apache.any23.source 1 MIME Type Detection org.apache.any23.mime 2 ContentValidation and Patching org.apache.any23.validator 3 Data Extraction org.apache.any23.extractor 8 a b c Filtering Chain Filter 1 org.apache.any23.filter Filter N... 7 9 Validation and Fixing Rules About NotUriRule MetaName MisuseRule OpenGraph Namespace Fix 5
  • 17. Data Fetching Responsible for negotiating content retrieval with data source. Data Fetching org.apache.any23.source 1
  • 18. MIME Type Detection • Identify data MIME Type if not explicitly specified during fetching phase. • Uses Apache Tika, a rule-based content analysis toolkit. Activated Extractors RDFa Extractor Microdata Extractor H* Extractor MIME Type Detection org.apache.any23.mime 2
  • 19. Content Validation and Patching • Most Web markup contains errors or well known structural issues, time to fix them before parsing. • Use an extensible rule based engine to perform some common fixes. For further details about common web markup issues please refer to reference [3]. ContentValidation and Patching org.apache.any23.validator 3 Validation and Fixing Rules About NotUriRule MetaName MisuseRule OpenGraph Namespace Fix 5
  • 20. Data Extraction • All Extractors associated to the detected MIME-Type are activated. • These Extractors inspect the source content, detect specific data structures and emit these as RDF statements. • Extractors also provide warnings and errors whether they cannot accomplish their tasks. Reporting Activated Extractors RDFa Extractor Microdata Extractor H* Extractor Data Extraction org.apache.any23.extractor 8
  • 21. Filtering • Optionally it is possible to exclude some tipe of information which is not interesting for the purpose of extraction. • Filtering is based on RDF Ontology prefixing or on programmatic Filter logic. Filtering Chain Filter 1 org.apache.any23.filter Filter N... 7
  • 22. Serialization • The statements generated along the pipeline can be finally converted in some of the available RDF transport formats. RDF Serialization Turtle Writer NQuads Writer org.apache.any23.writer 6
  • 23. Service Module •Provides the REST API and the Web UI. Web REST API Service org.apache.any23.service a
  • 24. CLI Module • Command Line Interface for Any23. • Exposes main functionalities like: • rover: equivalent to Any23 REST Service • extractor: supports CLI testing for specific extractor • vocab: access to the entire RDF vocabulary supported by Any23 modules • crawler: simple Web crawler to scrape markup content from small websites. CLI org.apache.any23.cli b rover extractor crawlervocab... 9
  • 25. Plugin Management Most Any23 components (Extractors, Filters, ValidationRules, Fixes) can be added as external plugins just registering them in the Any23 classpath. Plugin Management org.apache.any23.plugin c
  • 32. Plain Output Format @prefix : <http://xmlns.com/foaf/0.1/> . @prefix B: <https://www.w3.org/People/Berners-Lee/> . @prefix Be: <http://www.w3.org/People/Berners-Lee/> . @prefix blog: <http://dig.csail.mit.edu/breadcrumbs/blog/> . @prefix card: <https://www.w3.org/People/Berners-Lee/card#> . @prefix cc: <http://creativecommons.org/ns#> . @prefix cert: <http://www.w3.org/ns/auth/cert#> . @prefix con: <http://www.w3.org/2000/10/swap/pim/contact#> . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix dct: <http://purl.org/dc/terms/> . @prefix doap: <http://usefulinc.com/ns/doap#> . […] blog:4 dc:title "timbl's blog" ; s:seeAlso <http://dig.csail.mit.edu/breadcrumbs/blog/feed/4> ; :maker card:i . <http://dig.csail.mit.edu/data#DIG> :member card:i . <http://wiki.ontoworld.org/index.php/_IRW2006> dc:title "Identity, Reference and the Web workshop 2006" ; con:participant card:i . <http://www.ecs.soton.ac.uk/~dt2/dlstuff/www2006_data#panel-panelk01> s:label "The Next Wave of the Web (Plenary Panel)" ; con:participant card:i . <http://www.w3.org/2000/10/swap/data#Cwm> doap:developer card:i . <http://www.w3.org/2011/Talks/0331-hyderabad-tbl/data#talk> dct:title "Designing the Web for an Open Society" ; :maker card:i . //www4.wiwiss.fu-berlin.de/booksMeshup/books/006251587X> dc:creator card:i ; dc:title "Weaving the Web: The Original Design and Ultimate Destiny of the World Wide Web" . <https://timbl.com/timbl/Public/friends.ttl> a :PersonalProfileDocument ; cc:license <http://creativecommons.org/licenses/by-nc/3.0/> ; dc:title "Tim Berners-Lee's editable FOAF file" ; :maker card:i ; :primaryTopic card:i . […]
  • 33. Report Output Format <?xml version="1.0" encoding="UTF-8" ?> <response> <extractors> <extractor>extractor1-name</extractor> <extractor>extractor2-name</extractor> </extractors> <report> <message/> <error/> <issueReport> </issueReport> <validationReport> <errors> </errors> <issues> </issues> <ruleActivations> </ruleActivations> </validationReport> </report> <data> <![CDATA[ plain-output-format-here ]]> </data> </response>
  • 34. CLI (1) bin/any23 —help Get help Usage: any23 [options] [command] [command options] Options: -h, --help Display help information. Default: false --plugins-dir The Any23 plugins directory. Default: /Users/hardest/.any23/plugins -X, --verbose Produce execution verbose output. Default: false -v, --version Display version information. Default: false Commands: extractor Utility for obtaining documentation about metadata extractors. Usage: extractor [options] Extractor name Options: -a, --all shows a report about all available extractors Default: false -i, --input shows example input for the given extractor Default: false -l, --list shows the names of all available extractors Default: false […]
  • 35. CLI (2) bin/any23 extractor -a Get input output samples for every registered Extractor […] Extractor: rdf-jsonld type: ContentExtractor -------- Example Input -------- { "@context": { "name": "http://xmlns.com/foaf/0.1/name", "knows": "http://xmlns.com/foaf/0.1/knows" }, "@id": "http://me.markus-lanthaler.com/", "name": "Markus Lanthaler", "knows": [ { "@id": "http://manu.sporny.org/about#manu", "name": "Manu Sporny" }, { "name": "Dave Longley" } ] } […] -------- Example Output -------- <http://me.markus-lanthaler.com/> <http://xmlns.com/foaf/0.1/knows> <http://manu.sporny.org/about#manu> , _:b0 ; <http://xmlns.com/foaf/0.1/name> "Markus Lanthaler"^^<http://www.w3.org/2001/XMLSchema#string> . <http://manu.sporny.org/about#manu> <http://xmlns.com/foaf/0.1/name> "Manu Sporny"^^<http://www.w3.org/2001/ XMLSchema#string> . _:b0 <http://xmlns.com/foaf/0.1/name> "Dave Longley"^^<http://www.w3.org/2001/XMLSchema#string> . […]
  • 36. CLI (3) bin/any23 rover http://www.w3.org/People/Berners-Lee/card Extract all possible data from a specific URL { "quads": [ [{ "type": "uri", "value": "http://www.w3.org/DesignIssues/Overview.html" }, "http://purl.org/dc/elements/1.1/title", { "type": "literal", "value": "Design Issues for the World Wide Web", "lang": null, "datatype": null }, null], [{ "type": "uri", "value": "http://www.w3.org/DesignIssues/Overview.html" }, "http://xmlns.com/foaf/0.1/maker", { "type": "uri", "value": "https://www.w3.org/People/Berners-Lee/card#i" }, null], [{ "type": "uri", "value": "http://www.w3.org/People/Berners-Lee/card" }, "http://www.w3.org/1999/02/22-rdf-syntax-ns#type", { "type": "uri", "value": "http://xmlns.com/foaf/0.1/PersonalProfileDocument" }, null], […]
  • 37. CLI (4) bin/any23 vocab -f turtle Dump the Any23 internal vocabulary (in Turtle format) <http://www.estrellaproject.org/lkif-core/role.owl#Role> a <http://www.w3.org/ 2000/01/rdf-schema#Class> ; <http://www.w3.org/2000/01/rdf-schema#member> <http://www.estrellaproject.org/ lkif-core/role.owl#> . <http://www.estrellaproject.org/lkif-core/role.owl#Function> a <http://www.w3.org/ 2000/01/rdf-schema#Class> ; <http://www.w3.org/2000/01/rdf-schema#member> <http://www.estrellaproject.org/ lkif-core/role.owl#> . <http://www.estrellaproject.org/lkif-core/role.owl#Social_Role> a <http://www.w3.org/ 2000/01/rdf-schema#Class> ; <http://www.w3.org/2000/01/rdf-schema#member> <http://www.estrellaproject.org/ lkif-core/role.owl#> . […]
  • 38. Next Steps • Introduce a scriptable DSL to speedup development of Extractors, Rules and Fixes. • Introduce a public repo to croudify plugin development leveraging on cloning / pulling mechanism.
  • 40. Bit of history • Any23 starts in 2008 in DERI (now Insights Centre, insight-centre.org), become part of the sindice.com data acquisition pipeline. • In 2010 is promoted as Apache Incubator project (incubator.apache.org/projects/any23.html), in 2012 is it promoted to top level project (any23.apache.org).
  • 41. Sindice.com and SindiceTech • Sindice.com has been an attempt to build the first Semantic Search Engine. Operating from 2007 to 2014 under Insight Centre (former DERI), at the end it had indexed 700M semantically annotated Web pages with a refresh rate of 20M pages per day for a total of 6B statements. • SindiceTech is a startup with mission of commercialize the technologies developed along the Sindice.com experience.
  • 42. Slowdown of Web 3.0 and Raise of Knowledge Graph • Many issues in massive adoption of Machine Readable embedded markup in Web contents. • Most data publishers don’t have the culture of structured data. • Even worst, most data publishers add markup only for a small subset of data available in their Web pages because afraid of the loss of traffic. • the Web of Data model still needs to define a valuable business model. • Since 2012, thanks to the Schema.org initiative and HTTP friendly formats like JSON-LD, the Web of Data is living a new momentum. • Schema.org and other collective initiatives like WikiData are the pillars of the Google Knowledge Graph, which is a proprietary alternative to DBpedia and Linked Data.
  • 43. References • [1] Web Data Commons - RDFa, Microdata, and Microformat Data Sets (http://webdatacommons.org/structureddata) • [2] Deployment of RDFa, Microdata, and Microformats on the Web – A Quantitative Analysis (http://hannes.muehleisen.org/ Bizer-etal-DeploymentRDFaMicrodataMicroformats-ISWC- InUse-2013.pdf) • [3] Heuristics for Fixing Common Errors in Deployed schema.org Microdata (http://dws.informatik.uni- mannheim.de/fileadmin/lehrstuehle/ki/pub/MeuselPaulheim- HeuristicsForFixingCommonErrorsInDeployedSchemaOrgMi crodata-ESWC2015.pdf)