Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, and Synchronization of Resources on the Web
1. An overview of capabilities and real-world use cases for discovery,
harvesting, and synchronization of resources on the web
http://www.openarchives.org/rs #resourcesync
ResourceSync
ANSI/NISO Z39.99-2017
Martin
Klein
Gretchen
Gueguen
Mark
Matienzo
Petr
Knoth
2. ResourceSync was funded by the Sloan Foundation & JISC
Martin Klein
Los Alamos National Laboratory
@mart1nkle1n
http://www.openarchives.org/rs #resourcesync
ResourceSync
ANSI/NISO Z39.99-2017
3. ResourceSync - @mart1nkle1n
DPLAfest, Chicago, April 20 2017
Background - OAI-PMH
• Recurrent metadata exchange
from a Data Provider to Service
Providers
• XML metadata only
• Repository centric
• Devised 1999-2002, prior to
REST, prior to dominance of
web search engines
4. ResourceSync - @mart1nkle1n
DPLAfest, Chicago, April 20 2017
Revisit the Problem Domain - ResourceSync
• Synchronization of resources
from a Source to Destinations
• Web resources, anything with
an HTTP URI & representation
• Resource centric
• Devised 2012-2013, leverages
key ingredients of web
interoperability, existing
specifications
• Updated in 2017 to v1.1
14. ResourceSync - @mart1nkle1n
DPLAfest, Chicago, April 20 2017
ResourceSync Change Notifications
• Notifications about change events to resources
• Source notifies subscribed Destinations (cf. recurrent pull)
• Push-based approach via WebSub
• Similar, sitemap-based payload
• Decrease synchronization latency between Source and Destination
• Change Notification Specification v1.0
15. ResourceSync - @mart1nkle1n
DPLAfest, Chicago, April 20 2017
EHRI Use Case
• Aggregation of information about Holocaust collections
• held by 1,800+ organizations worldwide
• into a central service
• EAD as exchange format
• Diversity of data sources and locations
• databases, spreadsheets (“home collections”)
https://ehri-project.eu/
http://portal.ehri-project.eu
https://twitter.com/EHRIproject
16. ResourceSync - @mart1nkle1n
DPLAfest, Chicago, April 20 2017
EHRI Use Case
• Special ResourceSync implementation
• Bridges gap between local systems and ResourceSync
capability documents on a web server
• Filters local resources by subject, time period, etc
• Set up by EHRI technical staff, run by contributing party
• Baseline synchronization: Resource Lists
• Incremental synchronization: Change Lists
• Together with EAD files moved from local system to web server
• Dropbox, FTP, USB stick
• Service: partners expose EADs, server collects and offers value-
added services e.g., graph database
https://ehri-project.eu/
http://portal.ehri-project.eu
https://twitter.com/EHRIproject
17. ResourceSync - @mart1nkle1n
DPLAfest, Chicago, April 20 2017
CLARIAH Use Case
• Various institutions host evolving collections
• Make collection items uniformly available via RDF graph
• Central registry holds description of all collections
• Researchers use Virtual Research Environment to
• Discover collections (via registry)
• Collect graphs from respective institution
• Keep graphs up to date
https://www.clariah.nl/
https://twitter.com/CLARIAH_NL
18. ResourceSync - @mart1nkle1n
DPLAfest, Chicago, April 20 2017
CLARIAH Use Case
• Baseline synchronization
• Download graph from DB
• Serialized as one or more files, one RDF triple per line
(+ s p o graph_name)
• + stands for “add”
• URIs of files listed in Resource List
• Incremental synchronization
• Changes logged in one or more files, one change per line
(+/- s p o graph_name)
• + stands for “add”, “-” for delete
• URIs of files listed in Change List
https://www.clariah.nl/
https://twitter.com/CLARIAH_NL
20. Hyku & DPLA
ResourceSync Implementations
Gretchen Gueguen, Data Services Coordinator
Digital Public Library of America, gretchen@dp.la
21. Project Background
● IMLS National Leadership Grant
(30 months)
● Foster a national digital
platform through
community-based repository
infrastructure
● Leverage & contribute to
Hydra, both in code and
community
22. Primary Project Goals
1. Develop turnkey (“easy to install, easy to maintain”)
Hydra-based application that leverages and improves on
core code components
2. Develop metadata aggregation & enrichment tools
3. Work toward a hosted service in the cloud
24. Metadata Aggregation @DPLA
Methods for Data Aggregation:
● OAI PMH (21 providers)
● Custom APIs/other (8 providers)
● Direct file transfer (3 providers)
Biggest Drawbacks:
● Re-synchronizing entire data sets
● Relying on http requests
25. ResourceSync and Hyku
● ResourceSync publishing support built into MVP
● Test application with 50,000 records to start
○ Limit for a single list. To add more, we would need to make a list of
lists.
● Resource lists and change lists are supported
● Resource or change dumps not currently supported
● Content negotiation for JSON-LD, N-Triples, and Turtle
26. ResourceSync and DPLA
Harvester developed for Hyku endpoint
● Development for this specific endpoint means that it’s
not a full test of all ResourceSync capabilities
● We suspect that we will prefer the Dump to the List
○ Using the List means making HTTP calls for each item in order to do
the content negotiation
○ Dump allows us to just download specifically what we need
○ We will still be downloading records that weren’t updated but given
the typical size of the diff for each provider this single download
may still be preferable to 100,000 HTTP requests
● Future implementations may require us to build on this
initial harvester if the specifics are different
27. Next Steps
Hyku:
● Possibly support Dump
● Increase test set over
50K
DPLA:
● Harvest from 3 DPLA
providers implementing
ResourceSync by end of
year
28. IIIF & ResourceSync:
Supporting discovery
Mark A. Matienzo, Stanford University Libraries
@anarchivist / https://orcid.org/0000-0003-3270-1306
DPLAFest — Chicago, Illinois — April 20, 2017
29. International Image Interoperability Framework
A community
that develops Shared APIs
implements them in Software
and exposes interoperable Content
http://iiif.io/
30. IIIF Community
http://iiif.io/community
● IIIF Consortium
○ Currently 38 state/national
libraries, universities, museums,
tech firms
○ Provides sustainability and steering
for the initiative
● Wider community
○ 80+ CH institutions, companies,
and projects using IIIF standards
○ iiif-discuss list = 670+ members
○ IIIF Slack = 300+ members
● Community & Technical
Specification Groups
31. Shared APIs
http://iiif.io/api/
● Image API
○ Transfer image pixels, regions, etc.
○ Image manipulation
● Presentation API
○ Presentation of an object (pixels +
navigation and metadata)
○ Easily share and re-use, mix and
match content
○ Annotate content
● Search API
○ Search annotations
● Authentication API
○ Provide interoperability for
access-restricted content
33. IIIF Content
All kinds of image resources:
artworks, photographs,
manuscripts, newspapers
Investigating AV and 3D
34. “Discovery”
in IIIF
Finding interoperable resources
Two main concerns:
● How can users find IIIF
resources?
● How can users then get those
resources into an environment
where they can use them?
35. Scoping the
problem
What resources
can be discovered?
Types of resources in IIIF:
● Content (Image API)
● Description (Presentation API)
The Image API does not provide
description of image content, just
technical and rights metadata.
Discovery requires Description
resources to provide information
about Content resources.
36. Presentation API
A Manifest provides
just enough metadata
(descriptive, structural,
etc.) to drive a viewer.
A Collection groups
Manifests or other
Collections.
http://iiif.io/api/presentation/2.1/
38. Presentation
API constraints
Informing decisions
The Presentation API does not
include semantic descriptions, but
can reference them using seeAlso.
IIIF (including the Presentation
API) has a resource-centric view of
the web, not a service-centric view
(cf Sitemaps/ResourceSync vs
OAI-PMH).
40. Basic Sitemaps
at NC State
● Example demonstrates use of
Simple sitemaps without any
extensions, including
ResourceSync
● Intended to expand upon
existing practice of publishing
sitemaps from digital collections
41. Sitemap entry for manifests
<url>
<loc>https://d.lib.ncsu.edu/collections/catalog/bh1141pnc004/manifest</loc>
<lastmod>2016-12-13T15:38:19Z</lastmod>
</url>
Sitemap entry for landing page
<url>
<loc>https://d.lib.ncsu.edu/collections/catalog/bh1141pnc004</loc>
<lastmod>2017-03-27T19:33:52Z</lastmod>
</url>
Sample of NCSU Sitemaps
Courtesy Jason Ronallo, North Carolina State University
42. Prototyping at
Europeana
Exploring Sitemaps and
extensions for discovery of
IIIF resources for harvesting
● Partnership with University
College Dublin and National
Library of Wales
● ResourceSync satisfied key
needs identified within
requirements
● ResourceSync accommodated
additional metadata prototyped
in an IIIF Sitemap Extension
● Follows several synchronization
paradigms
43. Uses Sitemaps and IIIF Extension
<url>
<loc>http://newspapers.library.wales/view/3320640</loc>
<iiif:Manifest xmlns:iiif="http://iiif.io/api/presentation/2/">
http://dams.llgc.org.uk/iiif/newspaper/issue/3320640/manifest.json
</iiif:Manifest>
<dct:isPartOf>http://dams.llgc.org.uk/iiif/newspapers/3320639.json</dct:isPartOf>
<lastmod>2014-11-08</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
Example of NLW Sitemap Entry
Courtesy Nuno Freire, Europeana
44. Uses Sitemaps and ResourceSync and DCMES as Extensions
<url>
<loc>https://digital.ucd.ie/view/ucdlib:38491</loc>
<rs:ln rel="alternate" href="https://digital.ucd.ie/view/ucdlib:38491"
type="application/json" dcterms:conformsTo="http://iiif.io/api/presentation/2.1/"/>
<rs:ln rel="collection” href="https://digital.ucd.ie/view/ucdlib:38488”
type="application/json" dcterms:conformsTo="http://iiif.io/api/presentation/2.1/"/>
<lastmod>2014-08-24T04:09:09.716Z</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
Example of UCD Resource List Entry
Courtesy Nuno Freire, Europeana
45. Uses Sitemaps, ResourceSync, and Sitemap Image Extension
Sample of UCD Resource List
Courtesy John Howard, University College Dublin
46. Conclusions
Strengths
● ResourceSync addresses core requirements
for exposing IIIF resources for harvesting
● Can build on publication of existing
sitemaps easily
● Leverages Many-to-One, Selective
Synchronization, and Metadata Harvesting
paradigms
● Can adopt additional extensions to
implement needed features
● Plenty of opportunity to contribute; need
more prototypes
Challenges
● IIIF community’s needs for discovery are
not necessarily what other sitemap
consumers want (e.g. Google)
● Identifying the primary resource influences
structure
● Unclear whether search engines support
custom extensions, and what ranking
impact would be
47. Thank You!
Mark A. Matienzo, Stanford University Libraries
@anarchivist / https://orcid.org/0000-0003-3270-1306
DPLAFest — Chicago, Illinois — April 20, 2017
50. Use Case 1: What is CORE?
OA Repositories OA Journals
Mostly OAI-PMH
CORE aggregates and
provides free access to
millions of research
articles aggregated
from thousands of OA
repositories and
journals.
51. Use Case 1: What is CORE?
OA Repositories OA Journals
Mostly OAI-PMH
CORE aggregates and
provides free access to
millions of research
articles aggregated
from thousands of OA
repositories and
journals.
» Enrichment and
harmonisation of
aggregated data
» Products/services:
› Portal
› API
› Data dumps
› Recommendation
system for libraries
› Repository dashboard
› B2B and analytical
services
52. Use Case 1: What is CORE?
OA Repositories OA Journals
Mostly OAI-PMH
CORE aggregates and
provides free access to
millions of research
articles aggregated
from thousands of OA
repositories and
journals.
» 70 million+
metadata records
» Over 6 million full
texts hosted on
CORE
» ~1.5 million
monthly active
users
» Aggregating from
2,500 repositories
and 10k OA
journals
54. Use Case 1: Approach
OA Repositories OA Journals
Key publishers
(OA + hybrid OA)
Publisher connector
Mostly OAI-PMH
A range of bespoke APIs
+ many others
Provide seamless access over non-standardised APIs.
What protocol?
55. Use Case 1: Approach
OA Repositories OA Journals
Key publishers
(OA + hybrid OA)
Publisher connector
Mostly OAI-PMH
A range of bespoke APIs
+ many others
Provide seamless access over non-standardised APIs.
What protocol? » Why not OAI-PMH?
› slow and very inefficient
for big repositories.
› Standardised for
metadata transfer but
not for content transfer.
› Very difficult to
represent the richness of
metadata from a broad
range of data providers.
58. Use Case 2: Subscribing to ResourceSync
OA Repositories OA Journals
Key publishers
(OA + hybrid OA)
Publisher connector
Mostly OAI-PMH
A range of bespoke APIs
ResourceSync
+ many others
» Other aggregators can
subscribe to the Publisher
connector to make use of their
ingestion pipelines and
enrichment technologies
59. Use Case 2: Content ingestion in OpenMinTeD
OA Repositories OA Journals
Key publishers
(OA + hybrid OA)
Publisher connector
ResourceSync
Mostly OAI-PMH
OMTD-SHARE
(over REST)
A range of bespoke APIs
+ many others
» CORE and OpenAIRE are content sources in the OpenMinTeD
TDM platform (EU infrastructure project) being developed to
enable the mining of scholarly literature.
60. Use Case 2: Exposing enriched data for TDM
OA Repositories OA Journals
Key publishers
(OA + hybrid OA)
Publisher connector
ResourceSync
Mostly OAI-PMH
A range of bespoke APIs
+ many others
ResourceSync
» But others want similar solutions … typically, they want to be
able to sync and host the data.
62. Use Case 3: Replace OAI-PMH with ResourceSync
OA Repositories OA Journals
Key publishers
(OA + hybrid OA)
Publisher connector
ResourceSync
Mostly OAI-PMH
OMTD-SHARE
(over REST)
A range of bespoke APIs
+ many others
ResourceSync
ResourceSync
» Will be a game changer …
» Advocated by COAR Next
Generation Repositories WG
67. An overview of capabilities and real-world use cases for discovery,
harvesting, and synchronization of resources on the web
http://www.openarchives.org/rs #resourcesync
ResourceSync
ANSI/NISO Z39.99-2017
@mart1nkle1n @G_AmSpinnrade @anarchivist @petrknoth