3. SDshare A protocol for tracking changes in a semantic datastore essentially allows clients to keep track of all changes, for replication purposes Supports both Topic Maps and RDF Based on Atom Highly RESTful A CEN specification
4. Basic workings Server Client Fragment Fragment Fragment Fragment Client pulls these in, updates local copy of dataset Server publishes fragments representing changes in datastore There is, however, more to it than just this
5. What more is needed? Support for more than one dataset per server this means: more than one fragment stream How do clients get started? a change feed is nice once you've got a copy of the dataset, but how do you get a copy? What if you miss out on some changes and need to restart? must be a way to reset local copy The protocol supports all this
6. Two new concepts Collection essentially a dataset inside the server exact meaning is not defined in spec will generally be a topic map (TMs) or a graph (RDF) Snapshot a complete copy of a collection at some point in time
7. Feeds in the server Snapshot Snapshot feed Overview feed Fragment Fragment feed Collection feeds
9. The snapshot feed A list of links to snapshots of the entire dataset (collection) The spec doesn't say anything about how and when snapshots are produced It's up to implementations to decide how they want to do this It makes sense, though, to always have a snapshot for the current state of the dataset
11. The fragment feed For every change in the topic map, there is one fragment the granularity of changes is not defined by the spec it could be per transaction, or per topic changed The fragment is basically a link to a URL that produces a part of the dataset
12. An example fragment feed <feed xmlns="http://www.w3.org/2005/Atom" xmlns:sdshare="http://www.egovpt.org/sdshare"> <title>Fragments feed for beer.xtm</title> <updated>2011-03-15T19:21:20Z</updated> <author> <name>Ontopia SDshare server</name> </author> <id>file:/Users/larsga/data/topicmaps/beer.xtm/fragments</id> <sdshare:ServerSrcLocatorPrefix>file:/Users/larsga/data/topicmaps/beer.xtm</sdshare:ServerSrcLocatorPrefix> <entry> <title>Topic with object ID 4521</title> <updated>2011-03-15T19:20:03Z</updated> <id>file:/Users/larsga/data/topicmaps/beer.xtm/4521/1300216803730</id> <link href="fragment.jsp?topicmap=beer.xtm&topic=4521&syntax=rdf" type="application/rdf+xml" rel="alternate"/> <link href="fragment.jsp?topicmap=beer.xtm&topic=4521&syntax=xtm" type="application/x-tm+xml; version=1.0" rel="alternate"/> <sdshare:TopicSI>http://psi.example.org/12</sdshare:TopicSI> </entry> </feed>
13. What is a fragment? Essentially, a piece of a topic map that is, a complete XTM file that contains only part of a bigger topic map typically, most of the topic references will point to topics not in the XTM file Downloading more fragments will yield a bigger subset of the topic map the automatic merging in Topic Maps will cause the fragments to match up Exactly the same applies in RDF
14. An example fragment <topicMap xmlns="http://www.topicmaps.org/xtm/1.0/" xmlns:xlink="http://www.w3.org/1999/xlink"> <topic id="id4521"> <instanceOf> <subjectIndicatorRef xlink:href="http://psi.garshol.priv.no/beer/pub"></subjectIndicatorRef> </instanceOf> <subjectIdentity> <subjectIndicatorRef xlink:href="http://psi.example.org/12"></subjectIndicatorRef> <topicRef xlink:href="file:/Users/larsga/data/topicmaps/beer.xtm#id2662"></topicRef> </subjectIdentity> <baseName> <baseNameString>Amundsen Bryggeri og Spiseri</baseNameString> </baseName> <occurrence> <instanceOf> <subjectIndicatorRef xlink:href="http://psi.ontopia.net/ontology/latitude"></subjectIndicatorRef> </instanceOf> <resourceData>59.913816</resourceData> </occurrence> ... </topic> ... </topicMap>
15. Applying a fragment The feed contains a URI prefix this is used to create item identifiers tagging statements with their origin For each TopicSI find that topic, then for each statement, remove matching item identifier if statement now has no item identifiers, delete it Merge in the received fragment then tag all statements in it with matching item identifier
16. Properties of the protocol HATEOAS uses hypertext principles only endpoint is that of the overview feed all other URLs available via hypertext Applying a fragment is idempotent ie: result is the same, no matter how many times you do it Loose binding very loose binding between server and client Supports federation of data client can safely merge data from different sources
17. SDshare push In normal SDshare data receivers connect to the data source basically, they poll the source with GET requests However, the receiver is not always allowed to make connections to the source SDshare push is designed for this situation Solution is a slightly modified protocol source POSTs Atom feeds with inline fragments to receipient this flips the server/client relationship Not part of the spec; unofficial Ontopia extension
20. Example use case #1 Service #1 Frontend Database Ontopia DB2TM SDshare Ontopia SDshare Service #3 Portal ESB
21. NRK/Skole today Production environment Editorial server MediaDB Prod #1 Prod #2 DB2TM Export JDBC JDBC nrk-grep.xtm Import DB server 1 DB server 2 Database Firewall Server
22. NRK/Skole with SDshare push Production environment SDshare PUSH Editorial server MediaDB Prod #1 Prod #2 DB2TM JDBC JDBC DB server 1 DB server 2 Database Firewall Server
24. Hafslund architecture The beauty of this architecture is that SDshare insulates the different systems from one another More input systems can be added without hassle Any component can be replaced without affecting the others Essentially, a plug-and-play architecture
25. A Hafslund problem There are too many duplicates in the data duplicates within each system also duplication across systems How to get rid of the duplicates? unrealistic to expect cleanup across systems So, we build a deduplicator and plug it in...
26. DuKe plugged in ERP GIS CRM ... UMIC Search engine Dupe Killer Archive
28. Current implementations Web3 both client and server Ontopia ditto + SDshare push Isidorus don't know Atomico server framework only; no actual implementation
29. Ontopia SDshare server Event tracker taps into event API where it listens for changes maintains in-memory list of changes writes all changes to disk as well removes duplicate changes and discards old changes Web application based on tracker JSP pages producing feeds and fragments one fragment per changed topic, sorted by time only a single snapshot of current state of TM
30. Ontopia SDshare client Web UI for mgmt Pluggable frontends Pluggable backends Combine at will Frontends Ontopia: event listener SDshare: polls Atom feeds Backends Ontopia: applies changes to Ontopia locally SPARQL: writes changes to RDF repo via SPARUL push: pushes changes over SDshare push Web UI Ontopia events Core logic Ontopia backend SPARQL Update SDshare client SDshare push
33. What if many fragments? The size of the fragments feed grows enormous expensive if polled frequently Paging might be one solution basically, end of feed contains pointer to more "since" parameter might be another allows client to say "only show me changes since ..." Probably need both in practice http://projects.topicmapslab.de/issues/3675
34. Ordering of fragments Should the spec require that fragments be ordered? not really necessary if all fragment URIs return current state (instead of state at time fragment entry was created)
35. RDF fragment algorithm The one given in the spec makes no sense Relies on Topic Maps constructs not found in RDF Really no way to make use of it http://projects.topicmapslab.de/issues/4013
36. Our interpretation Server prefix is URI of RDF named graph Fragment algorithm therefore becomes delete all statements about changed resources then add all statements in fragment Means each source gets a different graph
37. TopicSL/TopicII Currently, topics can only be identified by subject identifier but not all topics have one Solution add elements for subject locators and item identifiers http://projects.topicmapslab.de/issues/3667
38. Paging of snapshots? What if the snapshot is vast? clients probably won't be able to download and store the entire thing in one go Could we page the snapshot into fragments? Or is there some other solution? http://projects.topicmapslab.de/issues/4307
39. How to tell if the fragment feed is complete? When reading the fragment feed, how can we tell if there are older fragments that are discarded? and how can we tell which fragment was the newest to be thrown away? Without this there's no way to know for certain if you've lost fragments if the feed stops before the newest fragment you've got and if you're using since it always will stop before the newest fragment... Make new sdshare:foo element on feed level for this information? http://projects.topicmapslab.de/issues/4308
40. Blank nodes are not supported What to do? http://projects.topicmapslab.de/issues/4306
41. More information SDshare spec http://www.egovpt.org/fg/CWA_Part_1b SDshare issue tracker http://projects.topicmapslab.de/projects/sdshare SDshare use cases http://www.garshol.priv.no/blog/215.html