The document provides an overview of linked open data and the EPA's efforts to publish its environmental data as linked open data. It discusses the need for improved data platforms to share integrated environmental data. Linked open data uses international standards to publish and connect data on the web, providing context and allowing for improved access and reuse of data. The EPA publishes a large amount of data in CSV files and is now moving to a cloud-based linked open data system to publish facility, chemical, and pollution reports, making the data more reusable and helping more types of audiences use the data.
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Structured Data on the Web: The Jargon-free Version
1. Linked Data:
Structured Data on the Web
(the jargon-free version)
US EPA Linked Data
!
Bernadette Hyland, CEO
bhyland@3RoundStones.com
@BernHyland
General:
info@3RoundStones.com
@3RoundStones
Main +1-877-290-2127
2. Agenda
• Intros ...
• What is the need?
• Jargon-free overview of Linked Open
Data
• Trends in data management
• Government data publication
• EPA is moving towards Linked Data
3. Demand for
environmental data
•High demand for improved information
platforms to publish, share and visualize
integrated data
•e.g., chemicals, pollution, air quality,
regulated facilities
•Goal: Increase data quality & comparability to
facilitate access & re-use
4. Data Sharing & Management Snafu in 3 short acts:
https://www.youtube.com/watch?
feature=player_embedded&v=N2zK3sAtr-4
7. • Linked Data is
about publishing
and consuming
data using
international data
standards
• Based on 20+ year
old idea
• A system of linked
information systems
10. What is driving us?
“We’re moving from managing
documents to managing discrete pieces of
open data and content which can be
tagged, shared, secured, mashed up and
presented in the way that is most useful
for the consumer of that information.”
!
-- Report on Digital Government: Building a 21st Century Platform to
Better Serve the American People
13. 5 Trillion
Daily (2013)
4.8T
4 Trillion
Digital Information Produced
35 ZB
3 Trillion
2 Trillion
1.8 ZB
1 Trillion
2012
2020
294B
Online Ad
Impressions
Emails
230M
Tweets
5% annual growth in IT spending
40% annual growth in data produced
14. The United States
in 2012
314 million
Total population
90 million
software end users
55 million
users of spreadsheets/
databases
13 million
“end user programmers”
3 million
professional programmers
15. “Most programs today are written not by professional
software developers, but by people with expertise in
other domains working towards goals for which they
need computational support.”
28. Linked Data
on the Web
Person
Michael
a
Galway Airport
first name
Hausenblas
last name
collector
collected at
collected by
my data
measurement
...
a measurement
date
2011-01-01
or
value
units of measure
0
degrees
Centigrade
29. Summary of Problems
• How can we archive our data in an open
manner?
• How can we record data context?
• How can we record data provenance?
• How can we know whether our data is up
to date?
• How can we share our data with others?
31. Linked Data
• Provides an international standard
mechanism to put reusable data on the
World Wide Web
• Provides a single data model with multiple
formats
• Provides context, provenance and access
• Allows for both human and machine reuse
32. Linked Data Principles
• Name data files and elements with URIs
• Use HTTP URIs so people can resolve them on
the Web
• Provide useful information at those URIs, using
the standards (RDF, SPARQL)
• Include links to other URIs so people can
discover more information.
33.
34. US EPA Linked Data
• Cloud-based Linked Data provision
• 2.9M Facilities (FRS)
• 100K substances (SRS)
• 25 years of toxic pollution reports (TRI)
• 3 years of chemical usage reports (CDR)
• Considering: Hazardous & non-hazardous waste
management (RCRA) & GHG data
• FISMA compliant
• Millions of pages driven by < 20 Web templates
• Launch Spring 2014
42. Audience for EPA Data
•
Middle school student doing a science project
•
Concerned citizen worried about local pollution
•
Environmental Science PhD from EPA
•
Doctor from NIH writing a research paper
53. Potential Audience
•
XMiddle school student doing a science project
•
XConcerned citizen worried about local pollution
•
✔Environmental Science PhD from EPA
•
XDoctor from NIH writing a research paper
59. Potential Audience
•
✔Middle school student doing a science project
•
✔Concerned citizen worried about local pollution
•
✔Environmental Science PhD from EPA
•
✔Doctor from NIH writing a research paper
63. Increase re-use by publishing
Linked Data
•
Empower users to create their own views of data to
satisfy different applications
•
Build a community around the data in which users help
each other to curate and connect as needed
•
Skip the supermodel - Leave data in the multiple “best of
breed” systems; wrap and expose on the Web of Data