Linked Data: Opportunities for Entrepreneurs

Dr. David Wood
david@3roundstones.com
@prototypo
12 September 2013
Linked Data: Opportunities
for Entrepreneurs

David Wood
B.S. Mechanical Engineering
B.S. Electrical Engineering (equivalency)
M.S.Astronautical Engineering
Aeronautical & Astronautical Engineer
Ph.D. Software Engineering

David Wood
ongoing
ongoing
company founded products disposition
2002
2005
@𝛑Plugged In Software

David Wood
RDF Database
RDF Database
Management
RDF Usage ongoing
Linked Data
Management
ongoing
company founded products disposition
2002
2005
@𝛑Plugged In Software

Readable by
people
Data in the Physical World

Machine readable
Readable
by
motivated
people

40% annual growth in data produced
5% annual growth in IT spending
1.8 ZB
35 ZB
2012 2020
Digital Information Produced
294B
1 Trillion
2 Trillion
3 Trillion
4 Trillion
5 Trillion
Online Ad
Impressions
Emails Tweets
Daily (2013)
230M
4.8T

“Perfection is achieved, not when
there is nothing left to add, but when
there is nothing left to remove.”
-- Antoine de Saint-Exupéry

“The Web is the minimal concession to
hypertext that a sequence-and-hierarchy
chauvinist could possibly make.”
“HTML is precisely what we were trying to
PREVENT-- ever-breaking links, links
going outward only, quotes you can't
follow to their origins, no version
management, no rights management.”
“The "Browser" is an extremely silly
concept-- a window for looking sequentially
at a large parallel structure. It does not
show this structure in a useful way.”

The Web makes graphs
out of hierarchies

New Data Requirements
• Global access
• Open format
• Record context
• to allow sharing
• to allow reuse
• Record provenance

Challenges
• Global access: Need to publish to the Web
• Open format: Most data currently bound
to proprietary tools/formats
• Context: Data often structured for
individual use without thought to sharing
• Provenance: Paradoxically easy given
solutions to the others

Linked Data on the Web
my data
collector
collected by
measurement
Michael
ﬁrst name
Hausenblaslast name
Person
a
a measurement
2011-01-01
date
0
value
units of measure
degrees
Centigrade
...
Galway Airport
collected at
or

johnson@example.com
Appropriate Copy Problem

Someone else (we don’t know)
Schemas/Vocabularies

YouTube HDTV
watch videos
watch Better
videos
Publish videos
Share videos
Rate videos
Discuss videos

Linked Data RDBMS
Use data Use data
Publish data
Share data
Rate data
Discuss data

Credit: Bradley P.Allen, Elsevier Labs

HTTP-accessible endpoints capable of returning XML or textual content
Convert XML or textual results to
RDF
Render RDF to HTML via template
User resolves a
single URI to an
Active PURL
Multiple targets queried
independently
1
David Wood1 and Tom Plasterer2
1david@3roundstones.com, 2Tom.Plasterer@astrazeneca.com
Active PURLs for Clinical Study Aggregation
The problem: No coordinated view of clinical study information. Information is distributed across departments, subsidiaries and government data sources.
The solution: Gather, convert, aggregate and format for display
Challenges
Next steps
How semantic technologies help
3 Round Stones and AstraZeneca created a system to allow coordinated views of distributed clinical trial information. The system extended the Callimachus
Project, an Open Source management system for Linked Data.
Persistent URLs, or PURLs, were used to provide globally unique and resolvable identifiers for each clinical study. The PURL concept was extended to enable
PURLs to have multiple targets and for the results of each target to undergo arbitrary transformation. PURLs which have such capabilities are called Active PURLs.
Information sources relevant to clinical studies were identified, regardless of whether their location was internal or external to the pharmaceutical company's
network. Active PURLs were used to resolve data sources having HTTP endpoints capable of returning XML or textual results. Each information source is
dynamically transformed into Resource Description Framework (RDF) formats and all sources' results then merged into a single, temporary graph of RDF data.
Information is rendered to end users as coordinated HTML descriptions regarding each clinical trial using the Callimachus template engine. Machine-readable
versions of the data are also available.
Linked Data techniques can help to address both the availability of clinical trial information and provide a means to build effective information systems using it.
Linked Data techniques allow for "cooperation without coordination". Publishers of data provide context for use by third parties in other portions of a distributed
enterprise. Users of Linked Data can combine information from multiple sources. Subsequent publication can create a virtuous circle of positive feedback, allowing
researchers, informaticists and support staff to collaboratively and distributively build a reusable knowledge base.
Distributed queries have many known
limitations, such as the introduction of
multiple single points of failure in any
given PURL resolution. HTTP timeouts,
auth/auth errors or other network failures
can slow or stop a pipeline from returning
correctly.
Similarly, distributed queries can result
in variant query-time performance due to
complex network and endpoint perform-
ance variances.
Proactive caching and cache manage-
meant strategies can improve runtime
performance and protect end users from
the limitations inherent in a distributed
query architecture. Caching of
intermediate results from endpoints has
not yet been implemented.
References
User experience
Users resolve a URL that
provides a unique identifier for
a clinical study, drug, chemical
or other concept managed by
this system. The user may
be presented with the URL on
HTML pages, search it via full-
text techniques or discover it
via semantic search.
1
2 Users are presented with a
dynamically generated Web
page representing aggregated
clinical study information. Users
are isolated from the complex
and distributed information
environment.

• Linked Data warehouses
10B USD annually
• Linked Data supply chains
205M USD annually (Web)
6B USD annually (enterprise)
• Linked Data analytics
16B USD annually
Your Opportunity?

This work is Copyright © 2011 3 Round Stones Inc.
It is licensed under the Creative Commons Attribution 3.0 Unported License
Full details at: http://creativecommons.org/licenses/by/3.0/
You are free:
to Share — to copy, distribute and transmit the work
to Remix — to adapt the work
Under the following conditions:
Attribution. You must attribute the work in the manner specified by the
author or licensor (but not in any way that suggests that they endorse
you or your use of the work).
Share Alike. If you alter, transform, or build upon this work, you may
distribute the resulting work only under the same or similar license to this
one.

Linked Data: Opportunities for Entrepreneurs

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Linked Data: Opportunities for Entrepreneurs

Similar to Linked Data: Opportunities for Entrepreneurs (20)

More from 3 Round Stones

More from 3 Round Stones (20)

Recently uploaded

Recently uploaded (20)

Linked Data: Opportunities for Entrepreneurs