SlideShare a Scribd company logo
1 of 41
An introduction to SDshare 2011-03-15 Lars Marius Garshol, <larsga@bouvet.no> http://twitter.com/larsga
Overview of SDshare
SDshare A protocol for tracking changes in a semantic datastore essentially allows clients to keep track of all changes, for replication purposes Supports both Topic Maps and RDF Based on Atom Highly RESTful A CEN specification
Basic workings Server Client Fragment Fragment Fragment Fragment Client pulls these in, updates local copy of dataset Server publishes fragments representing changes in datastore There is, however, more to it than just this
What more is needed? Support for more than one dataset per server this means: more than one fragment stream How do clients get started? a change feed is nice once you've got a copy of the dataset, but how do you get a copy? What if you miss out on some changes and need to restart? must be a way to reset local copy The protocol supports all this
Two new concepts Collection essentially a dataset inside the server exact meaning is not defined in spec will generally be a topic map (TMs) or a graph (RDF) Snapshot a complete copy of a collection at some point in time
Feeds in the server Snapshot Snapshot feed Overview feed Fragment Fragment feed Collection feeds
An overview feed <feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare">   <title>SDshare feeds from localhost</title>   <updated>2011-03-15T18:55:38Z</updated>   <author>     <name>Ontopia SDshare server</name>   </author>   <id>http://localhost:8080/sdshare/</id>   <link href="http://localhost:8080/sdshare/"></link>   <entry> <title>beer.xtm</title>     <updated>2011-03-15T18:55:38Z</updated>     <id>http://localhost:8080/sdshare/beer.xtm</id> <link href="collection.jsp?topicmap=beer.xtm" type="application/atom+xml" rel="http://www.egovpt.org/sdshare/collectionfeed"></link>   </entry>   <entry>  <title>metadata.xtm</title>     <updated>2011-03-15T18:55:38Z</updated>     <id>http://localhost:8080/sdshare/metadata.xtm</id>   <link href="collection.jsp?topicmap=metadata.xtm" type="application/atom+xml" rel="http://www.egovpt.org/sdshare/collectionfeed"></link>   </entry> </feed>
The snapshot feed A list of links to snapshots of the entire dataset (collection) The spec doesn't say anything about how and when snapshots are produced It's up to implementations to decide how they want to do this It makes sense, though, to always have a snapshot for the current state of the dataset
Example snapshot feed <feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare">   <title>Snapshots feed for beer.xtm</title>   <updated>2011-03-15T19:12:34Z</updated>   <author>     <name>Ontopia SDshare server</name>   </author>   <id>file:/Users/larsga/data/topicmaps/beer.xtm/snapshots</id>   <sdshare:ServerSrcLocatorPrefix>file:/Users/larsga/data/topicmaps/beer.xtm</sdshare:ServerSrcLocatorPrefix>   <entry>     <title>Snapshot of beer.xtm</title>     <updated>2011-03-15T19:12:34Z</updated>     <id>file:/Users/larsga/data/topicmaps/beer.xtm/snapshot/0</id>     <link href="snapshot.jsp?topicmap=beer.xtm" type="application/x-tm+xml; version=1.0" rel="alternate"></link>   </entry> </feed>
The fragment feed For every change in the topic map, there is one fragment the granularity of changes is not defined by the spec it could be per transaction, or per topic changed The fragment is basically a link to a URL that produces a part of the dataset
An example fragment feed <feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare">   <title>Fragments feed for beer.xtm</title>   <updated>2011-03-15T19:21:20Z</updated>   <author>     <name>Ontopia SDshare server</name>   </author>   <id>file:/Users/larsga/data/topicmaps/beer.xtm/fragments</id>   <sdshare:ServerSrcLocatorPrefix>file:/Users/larsga/data/topicmaps/beer.xtm</sdshare:ServerSrcLocatorPrefix>   <entry>     <title>Topic with object ID 4521</title>     <updated>2011-03-15T19:20:03Z</updated>     <id>file:/Users/larsga/data/topicmaps/beer.xtm/4521/1300216803730</id>     <link href="fragment.jsp?topicmap=beer.xtm&amp;topic=4521&amp;syntax=rdf" type="application/rdf+xml" rel="alternate"/>     <link href="fragment.jsp?topicmap=beer.xtm&amp;topic=4521&amp;syntax=xtm" type="application/x-tm+xml; version=1.0" rel="alternate"/>     <sdshare:TopicSI>http://psi.example.org/12</sdshare:TopicSI>   </entry> </feed>
What is a fragment? Essentially, a piece of a topic map that is, a complete XTM file that contains only part of a bigger topic map typically, most of the topic references will point to topics not in the XTM file Downloading more fragments will yield a bigger subset of the topic map the automatic merging in Topic Maps will cause the fragments to match up Exactly the same applies in RDF
An example fragment <topicMap xmlns="http://www.topicmaps.org/xtm/1.0/" xmlns:xlink="http://www.w3.org/1999/xlink">   <topic id="id4521">     <instanceOf>       <subjectIndicatorRef xlink:href="http://psi.garshol.priv.no/beer/pub"></subjectIndicatorRef>     </instanceOf>     <subjectIdentity>       <subjectIndicatorRef xlink:href="http://psi.example.org/12"></subjectIndicatorRef>       <topicRef xlink:href="file:/Users/larsga/data/topicmaps/beer.xtm#id2662"></topicRef>     </subjectIdentity>     <baseName>       <baseNameString>Amundsen Bryggeri og Spiseri</baseNameString>     </baseName>     <occurrence>       <instanceOf>         <subjectIndicatorRef xlink:href="http://psi.ontopia.net/ontology/latitude"></subjectIndicatorRef>       </instanceOf>       <resourceData>59.913816</resourceData>     </occurrence>     ...     </topic>     ... </topicMap>
Applying a fragment The feed contains a URI prefix this is used to create item identifiers tagging statements with their origin For each TopicSI find that topic, then for each statement, remove matching item identifier if statement now has no item identifiers, delete it Merge in the received fragment then tag all statements in it with matching item identifier
Properties of the protocol HATEOAS uses hypertext principles only endpoint is that of the overview feed all other URLs available via hypertext Applying a fragment is idempotent ie: result is the same, no matter how many times you do it Loose binding very loose binding between server and client Supports federation of data client can safely merge data from different sources
SDshare push In normal SDshare data receivers connect to the data source basically, they poll the source with GET requests However, the receiver is not always allowed to make connections to the source SDshare push is designed for this situation Solution is a slightly modified protocol source POSTs Atom feeds with inline fragments to receipient this flips the server/client relationship Not part of the spec; unofficial Ontopia extension
Uses of SDshare
Example use case #1 Frontend Database Ontopia DB2TM JDBC Portal
Example use case #1 Service #1 Frontend Database Ontopia DB2TM SDshare Ontopia SDshare Service #3 Portal ESB
NRK/Skole today Production environment Editorial server MediaDB Prod #1 Prod #2 DB2TM Export JDBC JDBC nrk-grep.xtm Import DB server 1 DB server 2 Database Firewall Server
NRK/Skole with SDshare push Production environment SDshare PUSH Editorial server MediaDB Prod #1 Prod #2 DB2TM JDBC JDBC DB server 1 DB server 2 Database Firewall Server
Hafslund ERP GIS CRM ... UMIC Search engine Archive
Hafslund architecture The beauty of this architecture is that SDshare insulates the different systems from one another More input systems can be added without hassle Any component can be replaced without affecting the others Essentially, a plug-and-play architecture
A Hafslund problem There are too many duplicates in the data duplicates within each system also duplication across systems How to get rid of the duplicates? unrealistic to expect cleanup across systems So, we build a deduplicator and plug it in...
DuKe plugged in ERP GIS CRM ... UMIC Search engine Dupe Killer Archive
Implementations
Current implementations Web3 both client and server Ontopia ditto + SDshare push Isidorus don't know Atomico server framework only; no actual implementation
Ontopia SDshare server Event tracker taps into event API where it listens for changes maintains in-memory list of changes writes all changes to disk as well removes duplicate changes and discards old changes Web application based on tracker JSP pages producing feeds and fragments one fragment per changed topic, sorted by time only a single snapshot of current state of TM
Ontopia SDshare client Web UI for mgmt Pluggable frontends Pluggable backends Combine at will Frontends Ontopia: event listener SDshare: polls Atom feeds Backends Ontopia: applies changes to Ontopia locally SPARQL: writes changes to RDF repo via SPARUL push: pushes changes over SDshare push Web UI Ontopia events Core logic Ontopia backend SPARQL Update SDshare client SDshare push
Web UI to client
Problems with the spec
What if many fragments? The size of the fragments feed grows enormous expensive if polled frequently Paging might be one solution basically, end of feed contains pointer to more "since" parameter might be another allows client to say "only show me changes since ..." Probably need both in practice http://projects.topicmapslab.de/issues/3675
Ordering of fragments Should the spec require that fragments be ordered? not really necessary if all fragment URIs return current state (instead of state at time fragment entry was created)
RDF fragment algorithm The one given in the spec makes no sense Relies on Topic Maps constructs not found in RDF Really no way to make use of it http://projects.topicmapslab.de/issues/4013
Our interpretation Server prefix is URI of RDF named graph Fragment algorithm therefore becomes delete all statements about changed resources then add all statements in fragment Means each source gets a different graph
TopicSL/TopicII Currently, topics can only be identified by subject identifier but not all topics have one Solution add elements for subject locators and item identifiers http://projects.topicmapslab.de/issues/3667
Paging of snapshots? What if the snapshot is vast? clients probably won't be able to download and store the entire thing in one go Could we page the snapshot into fragments? Or is there some other solution? http://projects.topicmapslab.de/issues/4307
How to tell if the fragment feed is complete? When reading the fragment feed, how can we tell if there are older fragments that are discarded? and how can we tell which fragment was the newest to be thrown away? Without this there's no way to know for certain if you've lost fragments if the feed stops before the newest fragment you've got and if you're using since it always will stop before the newest fragment... Make new sdshare:foo element on feed level for this information? http://projects.topicmapslab.de/issues/4308
Blank nodes are not supported What to do? http://projects.topicmapslab.de/issues/4306
More information SDshare spec http://www.egovpt.org/fg/CWA_Part_1b SDshare issue tracker http://projects.topicmapslab.de/projects/sdshare SDshare use cases http://www.garshol.priv.no/blog/215.html

More Related Content

Viewers also liked

Sala de lo Constitucional oficializa fallo sobre reelección presidencial
Sala de lo Constitucional oficializa fallo sobre reelección presidencialSala de lo Constitucional oficializa fallo sobre reelección presidencial
Sala de lo Constitucional oficializa fallo sobre reelección presidencialProceso Digital
 
Bienvenido mr
Bienvenido mrBienvenido mr
Bienvenido mryolanda
 
Jornal Cidade - Ano I - Nº 19
Jornal Cidade - Ano I - Nº 19Jornal Cidade - Ano I - Nº 19
Jornal Cidade - Ano I - Nº 19Jornal Cidade
 
La jurisdicción constitucional es la rama de la justicia que vela por la supr...
La jurisdicción constitucional es la rama de la justicia que vela por la supr...La jurisdicción constitucional es la rama de la justicia que vela por la supr...
La jurisdicción constitucional es la rama de la justicia que vela por la supr...Luis Angel Cruz García
 
Resumen -marbury_versus_madison_para_lexweb_
Resumen  -marbury_versus_madison_para_lexweb_Resumen  -marbury_versus_madison_para_lexweb_
Resumen -marbury_versus_madison_para_lexweb_Roger Reynaga Ventocilla
 
Recurso de inaplicabilidad
Recurso de inaplicabilidadRecurso de inaplicabilidad
Recurso de inaplicabilidadGabriela Galaz
 
Patrocinio deportivo
Patrocinio deportivoPatrocinio deportivo
Patrocinio deportivojosugg
 
Participación democrática(articulo 40 de la constitución colombiana)
Participación democrática(articulo 40 de la constitución colombiana)Participación democrática(articulo 40 de la constitución colombiana)
Participación democrática(articulo 40 de la constitución colombiana)laura Avila
 
EstadoDerechoyConstitucion
EstadoDerechoyConstitucionEstadoDerechoyConstitucion
EstadoDerechoyConstitucionAndrea Ungaretti
 

Viewers also liked (11)

Eminem
EminemEminem
Eminem
 
Sala de lo Constitucional oficializa fallo sobre reelección presidencial
Sala de lo Constitucional oficializa fallo sobre reelección presidencialSala de lo Constitucional oficializa fallo sobre reelección presidencial
Sala de lo Constitucional oficializa fallo sobre reelección presidencial
 
Bienvenido mr
Bienvenido mrBienvenido mr
Bienvenido mr
 
Jornal Cidade - Ano I - Nº 19
Jornal Cidade - Ano I - Nº 19Jornal Cidade - Ano I - Nº 19
Jornal Cidade - Ano I - Nº 19
 
La jurisdicción constitucional es la rama de la justicia que vela por la supr...
La jurisdicción constitucional es la rama de la justicia que vela por la supr...La jurisdicción constitucional es la rama de la justicia que vela por la supr...
La jurisdicción constitucional es la rama de la justicia que vela por la supr...
 
FALLO Marbury vs. Madison
FALLO Marbury vs. Madison FALLO Marbury vs. Madison
FALLO Marbury vs. Madison
 
Resumen -marbury_versus_madison_para_lexweb_
Resumen  -marbury_versus_madison_para_lexweb_Resumen  -marbury_versus_madison_para_lexweb_
Resumen -marbury_versus_madison_para_lexweb_
 
Recurso de inaplicabilidad
Recurso de inaplicabilidadRecurso de inaplicabilidad
Recurso de inaplicabilidad
 
Patrocinio deportivo
Patrocinio deportivoPatrocinio deportivo
Patrocinio deportivo
 
Participación democrática(articulo 40 de la constitución colombiana)
Participación democrática(articulo 40 de la constitución colombiana)Participación democrática(articulo 40 de la constitución colombiana)
Participación democrática(articulo 40 de la constitución colombiana)
 
EstadoDerechoyConstitucion
EstadoDerechoyConstitucionEstadoDerechoyConstitucion
EstadoDerechoyConstitucion
 

Similar to An introduction to the SDshare protocol

DC-2008 Tutorial 3 - Dublin Core and other metadata schemas
DC-2008 Tutorial 3 - Dublin Core and other metadata schemasDC-2008 Tutorial 3 - Dublin Core and other metadata schemas
DC-2008 Tutorial 3 - Dublin Core and other metadata schemasMikael Nilsson
 
Web 2.0 Lessonplan Day1
Web 2.0 Lessonplan Day1Web 2.0 Lessonplan Day1
Web 2.0 Lessonplan Day1Jesse Thomas
 
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...Crossref
 
Slug: A Semantic Web Crawler
Slug: A Semantic Web CrawlerSlug: A Semantic Web Crawler
Slug: A Semantic Web CrawlerLeigh Dodds
 
Catacomb Apachecon Fast Feather 2008
Catacomb Apachecon Fast Feather 2008Catacomb Apachecon Fast Feather 2008
Catacomb Apachecon Fast Feather 2008Markus Litz
 
SPARQLing Services
SPARQLing ServicesSPARQLing Services
SPARQLing ServicesLeigh Dodds
 
Terracotta Ch'ti Jug
Terracotta Ch'ti JugTerracotta Ch'ti Jug
Terracotta Ch'ti JugCh'ti JUG
 
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCacheClustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCacheCris Holdorph
 
Creative Commons @ Seybold San Francisco 2004 - DRM Roundtable
Creative Commons @ Seybold San Francisco 2004 - DRM RoundtableCreative Commons @ Seybold San Francisco 2004 - DRM Roundtable
Creative Commons @ Seybold San Francisco 2004 - DRM RoundtableMike Linksvayer
 
HTTP/2 Introduction
HTTP/2 IntroductionHTTP/2 Introduction
HTTP/2 IntroductionWalter Liu
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network ProcessingRyousei Takano
 
Revisiting HTTP/2
Revisiting HTTP/2Revisiting HTTP/2
Revisiting HTTP/2Fastly
 
Getting Started With The Talis Platform
Getting Started With The Talis PlatformGetting Started With The Talis Platform
Getting Started With The Talis PlatformLeigh Dodds
 
Agile Descriptions
Agile DescriptionsAgile Descriptions
Agile DescriptionsTony Hammond
 

Similar to An introduction to the SDshare protocol (20)

DC-2008 Tutorial 3 - Dublin Core and other metadata schemas
DC-2008 Tutorial 3 - Dublin Core and other metadata schemasDC-2008 Tutorial 3 - Dublin Core and other metadata schemas
DC-2008 Tutorial 3 - Dublin Core and other metadata schemas
 
Web 2.0 Lessonplan Day1
Web 2.0 Lessonplan Day1Web 2.0 Lessonplan Day1
Web 2.0 Lessonplan Day1
 
Sword v2 at UKCoRR
Sword v2 at UKCoRRSword v2 at UKCoRR
Sword v2 at UKCoRR
 
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
 
Slug: A Semantic Web Crawler
Slug: A Semantic Web CrawlerSlug: A Semantic Web Crawler
Slug: A Semantic Web Crawler
 
Catacomb Apachecon Fast Feather 2008
Catacomb Apachecon Fast Feather 2008Catacomb Apachecon Fast Feather 2008
Catacomb Apachecon Fast Feather 2008
 
RESTFul IDEAS
RESTFul IDEASRESTFul IDEAS
RESTFul IDEAS
 
Solr Presentation
Solr PresentationSolr Presentation
Solr Presentation
 
SPARQLing Services
SPARQLing ServicesSPARQLing Services
SPARQLing Services
 
Terracotta Ch'ti Jug
Terracotta Ch'ti JugTerracotta Ch'ti Jug
Terracotta Ch'ti Jug
 
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCacheClustering Made Easier: Using Terracotta with Hibernate and/or EHCache
Clustering Made Easier: Using Terracotta with Hibernate and/or EHCache
 
Creative Commons @ Seybold San Francisco 2004 - DRM Roundtable
Creative Commons @ Seybold San Francisco 2004 - DRM RoundtableCreative Commons @ Seybold San Francisco 2004 - DRM Roundtable
Creative Commons @ Seybold San Francisco 2004 - DRM Roundtable
 
HTTP/2 Introduction
HTTP/2 IntroductionHTTP/2 Introduction
HTTP/2 Introduction
 
User-space Network Processing
User-space Network ProcessingUser-space Network Processing
User-space Network Processing
 
Revisiting HTTP/2
Revisiting HTTP/2Revisiting HTTP/2
Revisiting HTTP/2
 
Getting Started With The Talis Platform
Getting Started With The Talis PlatformGetting Started With The Talis Platform
Getting Started With The Talis Platform
 
Ibm
IbmIbm
Ibm
 
Agile Descriptions
Agile DescriptionsAgile Descriptions
Agile Descriptions
 
11g R2
11g R211g R2
11g R2
 
Ontopia Code Camp
Ontopia Code CampOntopia Code Camp
Ontopia Code Camp
 

More from Lars Marius Garshol

JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformationLars Marius Garshol
 
Data collection in AWS at Schibsted
Data collection in AWS at SchibstedData collection in AWS at Schibsted
Data collection in AWS at SchibstedLars Marius Garshol
 
NoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativityNoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativityLars Marius Garshol
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engineLars Marius Garshol
 
Linked Open Data for the Cultural Sector
Linked Open Data for the Cultural SectorLinked Open Data for the Cultural Sector
Linked Open Data for the Cultural SectorLars Marius Garshol
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityLars Marius Garshol
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningLars Marius Garshol
 
Hafslund SESAM - Semantic integration in practice
Hafslund SESAM - Semantic integration in practiceHafslund SESAM - Semantic integration in practice
Hafslund SESAM - Semantic integration in practiceLars Marius Garshol
 

More from Lars Marius Garshol (20)

JSLT: JSON querying and transformation
JSLT: JSON querying and transformationJSLT: JSON querying and transformation
JSLT: JSON querying and transformation
 
Data collection in AWS at Schibsted
Data collection in AWS at SchibstedData collection in AWS at Schibsted
Data collection in AWS at Schibsted
 
Kveik - what is it?
Kveik - what is it?Kveik - what is it?
Kveik - what is it?
 
Nature-inspired algorithms
Nature-inspired algorithmsNature-inspired algorithms
Nature-inspired algorithms
 
Collecting 600M events/day
Collecting 600M events/dayCollecting 600M events/day
Collecting 600M events/day
 
History of writing
History of writingHistory of writing
History of writing
 
NoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativityNoSQL and Einstein's theory of relativity
NoSQL and Einstein's theory of relativity
 
Norwegian farmhouse ale
Norwegian farmhouse aleNorwegian farmhouse ale
Norwegian farmhouse ale
 
Archive integration with RDF
Archive integration with RDFArchive integration with RDF
Archive integration with RDF
 
The Euro crisis in 10 minutes
The Euro crisis in 10 minutesThe Euro crisis in 10 minutes
The Euro crisis in 10 minutes
 
Using the search engine as recommendation engine
Using the search engine as recommendation engineUsing the search engine as recommendation engine
Using the search engine as recommendation engine
 
Linked Open Data for the Cultural Sector
Linked Open Data for the Cultural SectorLinked Open Data for the Cultural Sector
Linked Open Data for the Cultural Sector
 
NoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativityNoSQL databases, the CAP theorem, and the theory of relativity
NoSQL databases, the CAP theorem, and the theory of relativity
 
Bitcoin - digital gold
Bitcoin - digital goldBitcoin - digital gold
Bitcoin - digital gold
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Hops - the green gold
Hops - the green goldHops - the green gold
Hops - the green gold
 
Big data 101
Big data 101Big data 101
Big data 101
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Hafslund SESAM - Semantic integration in practice
Hafslund SESAM - Semantic integration in practiceHafslund SESAM - Semantic integration in practice
Hafslund SESAM - Semantic integration in practice
 
Approximate string comparators
Approximate string comparatorsApproximate string comparators
Approximate string comparators
 

Recently uploaded

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Recently uploaded (20)

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

An introduction to the SDshare protocol

  • 1. An introduction to SDshare 2011-03-15 Lars Marius Garshol, <larsga@bouvet.no> http://twitter.com/larsga
  • 3. SDshare A protocol for tracking changes in a semantic datastore essentially allows clients to keep track of all changes, for replication purposes Supports both Topic Maps and RDF Based on Atom Highly RESTful A CEN specification
  • 4. Basic workings Server Client Fragment Fragment Fragment Fragment Client pulls these in, updates local copy of dataset Server publishes fragments representing changes in datastore There is, however, more to it than just this
  • 5. What more is needed? Support for more than one dataset per server this means: more than one fragment stream How do clients get started? a change feed is nice once you've got a copy of the dataset, but how do you get a copy? What if you miss out on some changes and need to restart? must be a way to reset local copy The protocol supports all this
  • 6. Two new concepts Collection essentially a dataset inside the server exact meaning is not defined in spec will generally be a topic map (TMs) or a graph (RDF) Snapshot a complete copy of a collection at some point in time
  • 7. Feeds in the server Snapshot Snapshot feed Overview feed Fragment Fragment feed Collection feeds
  • 8. An overview feed <feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare"> <title>SDshare feeds from localhost</title> <updated>2011-03-15T18:55:38Z</updated> <author> <name>Ontopia SDshare server</name> </author> <id>http://localhost:8080/sdshare/</id> <link href="http://localhost:8080/sdshare/"></link> <entry> <title>beer.xtm</title> <updated>2011-03-15T18:55:38Z</updated> <id>http://localhost:8080/sdshare/beer.xtm</id> <link href="collection.jsp?topicmap=beer.xtm" type="application/atom+xml" rel="http://www.egovpt.org/sdshare/collectionfeed"></link> </entry> <entry> <title>metadata.xtm</title> <updated>2011-03-15T18:55:38Z</updated> <id>http://localhost:8080/sdshare/metadata.xtm</id> <link href="collection.jsp?topicmap=metadata.xtm" type="application/atom+xml" rel="http://www.egovpt.org/sdshare/collectionfeed"></link> </entry> </feed>
  • 9. The snapshot feed A list of links to snapshots of the entire dataset (collection) The spec doesn't say anything about how and when snapshots are produced It's up to implementations to decide how they want to do this It makes sense, though, to always have a snapshot for the current state of the dataset
  • 10. Example snapshot feed <feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare"> <title>Snapshots feed for beer.xtm</title> <updated>2011-03-15T19:12:34Z</updated> <author> <name>Ontopia SDshare server</name> </author> <id>file:/Users/larsga/data/topicmaps/beer.xtm/snapshots</id> <sdshare:ServerSrcLocatorPrefix>file:/Users/larsga/data/topicmaps/beer.xtm</sdshare:ServerSrcLocatorPrefix> <entry> <title>Snapshot of beer.xtm</title> <updated>2011-03-15T19:12:34Z</updated> <id>file:/Users/larsga/data/topicmaps/beer.xtm/snapshot/0</id> <link href="snapshot.jsp?topicmap=beer.xtm" type="application/x-tm+xml; version=1.0" rel="alternate"></link> </entry> </feed>
  • 11. The fragment feed For every change in the topic map, there is one fragment the granularity of changes is not defined by the spec it could be per transaction, or per topic changed The fragment is basically a link to a URL that produces a part of the dataset
  • 12. An example fragment feed <feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare"> <title>Fragments feed for beer.xtm</title> <updated>2011-03-15T19:21:20Z</updated> <author> <name>Ontopia SDshare server</name> </author> <id>file:/Users/larsga/data/topicmaps/beer.xtm/fragments</id> <sdshare:ServerSrcLocatorPrefix>file:/Users/larsga/data/topicmaps/beer.xtm</sdshare:ServerSrcLocatorPrefix> <entry> <title>Topic with object ID 4521</title> <updated>2011-03-15T19:20:03Z</updated> <id>file:/Users/larsga/data/topicmaps/beer.xtm/4521/1300216803730</id> <link href="fragment.jsp?topicmap=beer.xtm&amp;topic=4521&amp;syntax=rdf" type="application/rdf+xml" rel="alternate"/> <link href="fragment.jsp?topicmap=beer.xtm&amp;topic=4521&amp;syntax=xtm" type="application/x-tm+xml; version=1.0" rel="alternate"/> <sdshare:TopicSI>http://psi.example.org/12</sdshare:TopicSI> </entry> </feed>
  • 13. What is a fragment? Essentially, a piece of a topic map that is, a complete XTM file that contains only part of a bigger topic map typically, most of the topic references will point to topics not in the XTM file Downloading more fragments will yield a bigger subset of the topic map the automatic merging in Topic Maps will cause the fragments to match up Exactly the same applies in RDF
  • 14. An example fragment <topicMap xmlns="http://www.topicmaps.org/xtm/1.0/" xmlns:xlink="http://www.w3.org/1999/xlink"> <topic id="id4521"> <instanceOf> <subjectIndicatorRef xlink:href="http://psi.garshol.priv.no/beer/pub"></subjectIndicatorRef> </instanceOf> <subjectIdentity> <subjectIndicatorRef xlink:href="http://psi.example.org/12"></subjectIndicatorRef> <topicRef xlink:href="file:/Users/larsga/data/topicmaps/beer.xtm#id2662"></topicRef> </subjectIdentity> <baseName> <baseNameString>Amundsen Bryggeri og Spiseri</baseNameString> </baseName> <occurrence> <instanceOf> <subjectIndicatorRef xlink:href="http://psi.ontopia.net/ontology/latitude"></subjectIndicatorRef> </instanceOf> <resourceData>59.913816</resourceData> </occurrence> ... </topic> ... </topicMap>
  • 15. Applying a fragment The feed contains a URI prefix this is used to create item identifiers tagging statements with their origin For each TopicSI find that topic, then for each statement, remove matching item identifier if statement now has no item identifiers, delete it Merge in the received fragment then tag all statements in it with matching item identifier
  • 16. Properties of the protocol HATEOAS uses hypertext principles only endpoint is that of the overview feed all other URLs available via hypertext Applying a fragment is idempotent ie: result is the same, no matter how many times you do it Loose binding very loose binding between server and client Supports federation of data client can safely merge data from different sources
  • 17. SDshare push In normal SDshare data receivers connect to the data source basically, they poll the source with GET requests However, the receiver is not always allowed to make connections to the source SDshare push is designed for this situation Solution is a slightly modified protocol source POSTs Atom feeds with inline fragments to receipient this flips the server/client relationship Not part of the spec; unofficial Ontopia extension
  • 19. Example use case #1 Frontend Database Ontopia DB2TM JDBC Portal
  • 20. Example use case #1 Service #1 Frontend Database Ontopia DB2TM SDshare Ontopia SDshare Service #3 Portal ESB
  • 21. NRK/Skole today Production environment Editorial server MediaDB Prod #1 Prod #2 DB2TM Export JDBC JDBC nrk-grep.xtm Import DB server 1 DB server 2 Database Firewall Server
  • 22. NRK/Skole with SDshare push Production environment SDshare PUSH Editorial server MediaDB Prod #1 Prod #2 DB2TM JDBC JDBC DB server 1 DB server 2 Database Firewall Server
  • 23. Hafslund ERP GIS CRM ... UMIC Search engine Archive
  • 24. Hafslund architecture The beauty of this architecture is that SDshare insulates the different systems from one another More input systems can be added without hassle Any component can be replaced without affecting the others Essentially, a plug-and-play architecture
  • 25. A Hafslund problem There are too many duplicates in the data duplicates within each system also duplication across systems How to get rid of the duplicates? unrealistic to expect cleanup across systems So, we build a deduplicator and plug it in...
  • 26. DuKe plugged in ERP GIS CRM ... UMIC Search engine Dupe Killer Archive
  • 28. Current implementations Web3 both client and server Ontopia ditto + SDshare push Isidorus don't know Atomico server framework only; no actual implementation
  • 29. Ontopia SDshare server Event tracker taps into event API where it listens for changes maintains in-memory list of changes writes all changes to disk as well removes duplicate changes and discards old changes Web application based on tracker JSP pages producing feeds and fragments one fragment per changed topic, sorted by time only a single snapshot of current state of TM
  • 30. Ontopia SDshare client Web UI for mgmt Pluggable frontends Pluggable backends Combine at will Frontends Ontopia: event listener SDshare: polls Atom feeds Backends Ontopia: applies changes to Ontopia locally SPARQL: writes changes to RDF repo via SPARUL push: pushes changes over SDshare push Web UI Ontopia events Core logic Ontopia backend SPARQL Update SDshare client SDshare push
  • 31. Web UI to client
  • 33. What if many fragments? The size of the fragments feed grows enormous expensive if polled frequently Paging might be one solution basically, end of feed contains pointer to more "since" parameter might be another allows client to say "only show me changes since ..." Probably need both in practice http://projects.topicmapslab.de/issues/3675
  • 34. Ordering of fragments Should the spec require that fragments be ordered? not really necessary if all fragment URIs return current state (instead of state at time fragment entry was created)
  • 35. RDF fragment algorithm The one given in the spec makes no sense Relies on Topic Maps constructs not found in RDF Really no way to make use of it http://projects.topicmapslab.de/issues/4013
  • 36. Our interpretation Server prefix is URI of RDF named graph Fragment algorithm therefore becomes delete all statements about changed resources then add all statements in fragment Means each source gets a different graph
  • 37. TopicSL/TopicII Currently, topics can only be identified by subject identifier but not all topics have one Solution add elements for subject locators and item identifiers http://projects.topicmapslab.de/issues/3667
  • 38. Paging of snapshots? What if the snapshot is vast? clients probably won't be able to download and store the entire thing in one go Could we page the snapshot into fragments? Or is there some other solution? http://projects.topicmapslab.de/issues/4307
  • 39. How to tell if the fragment feed is complete? When reading the fragment feed, how can we tell if there are older fragments that are discarded? and how can we tell which fragment was the newest to be thrown away? Without this there's no way to know for certain if you've lost fragments if the feed stops before the newest fragment you've got and if you're using since it always will stop before the newest fragment... Make new sdshare:foo element on feed level for this information? http://projects.topicmapslab.de/issues/4308
  • 40. Blank nodes are not supported What to do? http://projects.topicmapslab.de/issues/4306
  • 41. More information SDshare spec http://www.egovpt.org/fg/CWA_Part_1b SDshare issue tracker http://projects.topicmapslab.de/projects/sdshare SDshare use cases http://www.garshol.priv.no/blog/215.html