SlideShare a Scribd company logo
1 of 50
Transformational Tricks for
RDF/SPARQL
BY KURT CAGLE
EDITOR, THE CAGLE REPORT
Who is Kurt Cagle?
 Editor, The Cagle Report (https://thecaglereport.com)
 Email: kurt.cagle@thecaglereport.com
 Linked In: https://linkedin.com/in/kurtcagle
 Calendly: https://calendly.com/semantical. Open office hours.
 I like graphs, large language models, metadata, AI, future of work
 I consider myself a data therapist. Book a free hour if you want to talk.
Purpose of This Class
 The purpose of this class is to teach you transformation techniques for the
knowledge portal
 I’m assuming you have a basic working knowledge of RDF and SPARQL, and have
at least heard of SHACL.
 When done, you should have new tools for making your data sit up and beg, and
hopeful have a new way of thinking about the RDF Stack
Warning!!!
 It is likely that somewhere in this class, I will
skewer a sacred cow or five.
 There are many different ways of building
ontologies, many different best practices.
 Frankly, many of those best practices are no
longer relevant, or represent ways of thinking
that should be changed in the face of advances
in the technology.
 Hopefully, even if you disagree with the author,
there will be useful information contained
herein, but go in with an open mind and a
willingness to at least question your beliefs, if
not necessarily change them.
Transformations
A KNOWLEDGE PORTAL IS
NOT A QUERY SYSTEM,
BUT A
TRANSFORMATIONAL
ONE.
IT TRANSFORMS
INFORMATION FROM ONE
ONTOLOGY TO ANOTHER
IT TRANSFORMS DATA
FROM ONE FORMAT TO
ANOTHER
IT CREATES NEW
KNOWLEDGE FROM OLD
KNOWLEDGE.
UNLESS YOU
UNDERSTAND
KNOWLEDGE GRAPH
TRANSFORMATIONS, YOU
ARE NOT GETTING
EVERYTHING OUT OF
YOUR KNOWLEDGE
PORTAL.
Namespace Tricks
THESE ARE SUGGESTIONS ON
SETTING UP NAMESPACES TO
MAKE THEM AS USEFUL AS
POSSIBLE
Namespaces
 Namespaces underlay a great deal of semantics, but often tend to be poorly
utilized. Especially with regards to ontological modeling, the following techniques
may be useful (and are used throughout this deck).
 Use namespaces that correspond to classes. For instance, a Character class might
have an associated namespace:
 Namespace: http://comicsdata.org/ns/Character#
 PREFIX Character: <http://comicsdata.org/ns/Character#>
 The class can then be specified with the prefix in Turtle:
 Character:Catwoman a Character: .
 All triple stores support this notation after around 2017.
Namespace Construction
 The use of class-based namespaces can simplify contructing and
deconstructing URIs in SPARQL:
 str(Character:) -> “http://comicsdata.org/ns/Character#”
 strafter(str(Character:Catwoman),str(Character:)) -> “Catwoman”
 iri(concat(str(Character:),”Catwoman”)) ->
<http://comicsdata.org/ns/Character#Catwoman> ->
Character:Catwoman
 In some cases, you do not need to explicitly convert URIs to
strings.
 iri(concat(Character:,”Catwoman”))
Namespace Best Practices
 Store namespace and prefix strings in SHACL NodeShape
declarations.
 This makes it easier to construct contexts in SPARQL and JSON
 Filepaths can be stored as namespaces.
 E.g., PREFIX basePath: file:///path/to/root/
 basePath:foo/bar.ttl -> <file:///path/to/root/foo/bar.ttl>
 RDF-XML handles this notation just fine.
 This notation is friendlier to JSON-LD.
Namespace
Anti-Patterns
 Stop trying to map URIs to URLs. It
makes URIs brittle.
 If you must, designate an ontology URI
than can be mapped.
 Instance URIs are not that important,
property and class URIs are.
 Do not use ontology import predicates –
use SPARQL Update LOAD instead. (Indeed,
do you really need ontologies?)
 The number of class namespaces has no
appreciable impact on performance.
 Use named graphs to avoid namespace
collision.
 If you use Camel Case, stick with it. If
you use underscores, stick with it.
Turtle Tricks
TURTLE IS NOT JUST COMPACT
– IT’S A POWERFUL LANGUAGE
FOR ORGANIZING
INFORMATION, BUT ONLY IF
IT’S USED
Blank Nodes Are Structure Pointers
BLANK NODES CAN BE CONFUSING UNTIL
YOU KNOW THAT A BLANK NODE IS A
POINTER TO A STRUCTURE.
TURTLE NOTATION DELIBERATELY HIDES
BLANK NODES, BUT YOU CAN USE THEM TO
CREATE ARRAYS, DICTIONARIES,
PARAMETERS LISTS, AND SIMILAR
STRUCTURES.
AT THE SAME TIME, USING SHACL PROVIDES
A WAY OF DEPRECATING THE USE OF BLANK
NODES WHERE THEY REALLY SHOULD BE
NAMED NODES.
Square Brackets = Dictionaries
 Turtle uses the square bracket to indicate a dictionary.
 Character:Catwoman a Character: ;
Character:address [
a Address: ; #The class is usually implied
Address:street “1313 Mockingbird Lane”,
Address:city: “Arkham”;
Address:type AddressType:MailingAddress;
].
 This is equivalent to
 Character:Catwoman a Character: ;
Character:address _:CatWomanAddress ;
.
_:CatwomanAddress #This is the implicit blank node.
a Address: ;
Address:street “1313 Mockingbird Lane”,
Address:city: “Arkham”;
Address:type AddressType:MailingAddress;
.
 Note that containership is implied, but illusory. These are still triples.
Dictionaries and JSON Objects
Character:Catwoman a Person: ;
Character:address [
a Address: ;
Address:street “1313 Mockingbird Lane”,
Address:city: “Arkham”;
Address:type AddressType:MailingAddress;
].
{“Character:Catwoman”:{
@type: “Character” ,
address: {
@type: “Address”,
street: “1313 Mockingbird Lane”,
city: “Arkham”,
“Address:type”: “MailingAddress”
}
}
}
Parentheses = Linked Lists
 Turtle uses parentheses to indicate a linked list.
 Character:HarleyQuinn Character:memberOfOrg (
Org:SinCitySirens Org:SuicideSquad
).
 This is equivalent to
 Character: HarleyQuinn Character:memberOfOrg _:OrgList.
_:OrgList rdf:first Org:SinCitySirens;
rdf:next _:secondItem.
_:secondItem rdf:first Org:SuicideSquad;
rdf:next _:nil.
 Linked lists are intrinsically ordered, regardless of whether
inferencing is enabled or not.
Annotations – Double Angle Brackets
 An annotation is metadata that applies to a single assertion, using the RDF-
Star mechanism
 <<Character:Batman Character:description “World’s greatest detective!”>>
Assertion:comment “Wait! What about Sherlock Holmes, or Hercules
Poirot?!”.
 The triple in the double angle brackets is the subject of the annotation, and
again can be thought of as a blank node to a data structure with values,
subject, predicate, object:
 _:assertion rdf:subject Character:Batman ;
rdf:predicate Character:Description ;
rdf:object “World’s greatest detective!”;.
Assertion:comment “Wait! What about Sherlock Holmes, or
Hercules Poirot?!”
 The use of the <<>> notation is a new addition to RDF, RDF-Star, that is
currently undergoing discussion as a standard, though it is becoming adopted by
most most modern triple stores.
Annotation Sample
 <<Character:Batman Character:description “World’s greatest
detective!”>>
Assertion:annotation [
Annotation:comment “Wait! What about Sherlock
Holmes?!”;
Annotation:author Author:ArthurConanDoyle;
Annotation:date “1902-05-21”^^xsd:date;
], [
Annotation:comment “Maybe Hercule Poirot?”;
Annotation:author Author:AgathaChristie;
Annotation:date “1953-07-19”^^xsd:date;
].
Annotations –
Best Practices
 Typically an annotation of an
assertion will consist of a dictionary
node that holds several properties,
rather than just one comment.
 An assertion has a unique identifier,
even if the two such assertions have
the same subject, predicate, and
object. This means that multiple
annotations can be made about the same
assertion by different authorities.
 The Open Annotation Standard is a good
framework for annotating content, and
works especially well with RDF-Star.
 Annotations can be useful to indicate
version changes of individual
assertions, as well as a way to track
provenance.
Pointer Containers
 Occasionally, you’ll see pointer structures, such as
 [] a Character: .
 or
 [ a Character:] rdf:label “Joe”.
 An empty blank node is treated the same as an anonymous URI or pointer.
 In the second case, this should be read as “there exists a Character, whose label is Joe”.
Two separate blank characters are assumed to have different URIs.
 SPARQL notation emerged from the use of “named” blank nodes that were in fact
treated as variable names for nodes. Turtle structural notation consequently translates
directly into SPARQL Structural Notation.
Literals and Datatypes
 One of the ways we underutilize knowledge graphs is in not doing enough with
datatypes.
 Most people use the standard xsd: datatypes, without ever thinking about why
they shouldn’t.
 If you have a length measure, rather than putting type in properties, use
“25”^^qudt:Meters.
 If you have a full or partial population measure, use “8.01E9”^^quantity:People.
 If you have a markdown document use “# Cool Titlen## by Kurt
Cagle”^^textFormat:Markdown.
 Use your datatypes to indicate how literals are parsed, then add metadata. Nuff
said.
SPARQL Tricks
SPARQL IS MOSTLY USED FOR
CREATING TABLES, BUT IS IT
CAPABLE OF FAR MORE.
The Problem with CONSTRUCT
 The CONSTRUCT command in SPARQL is often used to produce graphs, but
because of this utility, it also hides much more useful capabilities. It is an
anachronism from OWL Inference rules where addition of new triples would create
virtual triples by these rules.
 Increasingly, organizations are abstracting access to knowledge portals to
GraphQL or JSON-LD, and as such SPARQL is hidden behind layers of security. One
benefit of this is that one underutilized feature of SPARQL Update – named
graphs – is beginning to come into its own, especially because it enables
workflows.
 This is what we’ll cover here.
The True Structure of “Triples”
Subject Predicate Object Graph AssertionID
Character:Catwoman rdf:type Character: Default uri:urn:12051AFCD…
“True” Triple The Graph
Container
For the Triple
The Identifier
For the Triple
For RDF-Star
The modern “triple” is actually (minimally) a pentuple.
The assertion ID field is used for reification, and
identifies the pentuple as a unique object. The graph
field indicates that this particular tuple is part of a
specific set. If the triple is the same but the graph is
different, the Assertion ID will be different too.
Named Graphs
 If a pentuple is set to a URI, that URI becomes the “name” of the graph that it
belongs to.
 All other pentuples with the same name are in the same graph.
 If a pentuple has the same triple values as another pentuple but has a different
graph name, then they are in different graphs.
 This means that the same triple can be contained in multiple graphs.
 The graph name is a URI just like node or assertion identifier.
Default Graph
 When a triple is inserted into the graph without specifying it’s graph, the triple will
be placed in the default graph.
 The default graph can be inclusive or exclusive. Where it is inclusive, a query
against a triple without specifying a graph will retrieve all triples from all graphs.
Where it is exclusive, the only triple that will be retrieved will be the one already in
the default graph.
 Check with your vendor whether your system is inclusive or exclusive. Most such
systems can go from the one mode to the other with a simple software switch in
the product.
Named Graph Use Cases
 Landing graph of newly ingested data
 Graph containing ingestion graph data
converted to local ontology
 Graph holding all instances of a given class
 Graph containing reports generated from
analysis
 Graph holding draft vs. approved resources
(workflow)
 Graph containing SHACL constraints
 Graph holding transformed content for
output
 Graphs containing data catalogs
 Graphs holding intermediate calculations
 Graphs holding frequently requested query
results
 Graphs containing documentation
 Graphs containing document stores
 Graphs containing controlled vocabularies
for rapid lookup
 Unions, intersections, diffs
 The list goes on and on …
Named Graphs
vs. Data Store
Partitions
 Many data portals have distinct data stores that
are partitioned with certain configurations.
 Such stores are usually best for multi-tenant
operations, as these typically also have
authentication and security considerations.
 Named graphs are conceptually a level lower –
they exist within a single security perimeter
and are optimized for rapid clearing.
 Modern named graphs usually provide
secondary indexes so that adding or removing
a triple is as simple as adding or removing a
URI to an array.
Internal Arrangement of Named Graphs
Subject Predicate Object Graph
Character:Catwoman rdf:type Character: Graph:Graph1
Graphs AssertionIDs
Graph:Graph1 uri:urn:12051AFCD…
Graph:Graph2 uri:urn:4792AE109…
Default uri:urn:319AD1592…
The Pentuple arrangement at a deeper level
illustrates how graphs can be moved,
copied and deleted so quickly, as the graph
key is itself a part of an index. Garbage
collection only occurs when the last graph is
removed from the tuple.
INSERTING DATA THROUGH SCRIPTS
The following SPARQL UPDATE script will add explicit triples to a graph:
# Namespaces Declared Here
INSERT DATA {
GRAPH Graph:Catwoman {
ex:Catwoman rdf:type ex:Antihero ;
rdfs:label "Catwoman" ;
ex:alterEgo "Selina Kyle" ;
ex:description "A skilled thief and occasional ally of Batman, who uses her
athletic abilities, martial arts skills, and cunning to navigate the criminal
underworld of Gotham City." ;
ex:superPowers "Peak human strength, agility, and endurance; expert martial
artist and hand-to-hand combatant; skilled thief and acrobat" ;
ex:gender "Female";
ex:universe "DCEU" ;
}
};
Digging Deeper Into Insert
 The INSERT DATA command uses the same syntax (I believe) as the TRIG standard,
which, save that the namespaces use PREFIX instead of @PREFIX in the context
header.
 Multiple graphs can be populated this way in the same statement.
 This is often useful for spot or test data, or for configuration data.
 Unlike SPARQL QUERY, SPARQL COMMANDS are transactional – multiple SPARQL
UPDATE statements can be run in the same script if separated by semi-colons.
Using the DELETE / INSERT model
 The powerhouse of SPARQL UPDATE is the DELETE/INSERT/WHERE command
which can be thought of as the supercharged version of CONSTRUCT.
 The WHERE clause determines the graph (and the variables) that the DEL/INS will
be working on.
 The DELETE statement is a CONSTRUCT like statement that eliminates the triples
that are created from the CONSTRUCT graph from the main graph.
 The INSERT statement is a CONSTRUCT like statement that add new triples to the
main graph from the CONSTRUCT graph if they do not already exist.
 Together these three keywords can be used to transform one graph into another.
Delete/Insert/Where
This identifies the working group in the WHERE clause, then DELETES and INSERTS
the triples with the corresponding variables.
# Namespaces Declared Here
DELETE {
GRAPH ?gOld {
?sOld ?pOld ?oOld.
}
}
INSERT {
GRAPH ?gNew {
?sNew ?pNew ?oNew.
}
}
WHERE {
# Use SPARQL to determine old and new variables
}
};
GRAPH Commands
Command Example Comments
CREATE Graph CREATE graph:Foo Creates an Empty Graph with the given name
DROP Graph DROP graph:Foo Drops (deletes) the graph from the system
CLEAR Graph CLEAR graph:Foo Clears the data from the graph but retains the graph itself
COPY Graph COPY graph:Foo to graph:Bar Replaces the triples in graph:Foo to graph:Bar, but leaves graph:Foo
untouched.
MOVE Graph MOVE graph:Foo to graph:Bar Replaces the triples in graph:Foo to graph:Bar, but eliminates
Graph:Foo
ADD Graph ADD graph:Foo to graph:Bar Copies the triples in graph:Foo to graph:Bar, but without removing
the old graph:Bar content.
LOAD Graph Load <uri> to Graph:Foo Loads external RDF from a file system or the internet. <uri> must be a
hard reference.
Workflows with SPARQL UPDATE
 SPARQL UPDATE is transactional, and can have multiple operations per script.
 If the transaction fails at any point, the results are rolled back to the previous state.
 Named graphs make it possible to create and populate a graph, then use that
graph to generate one or more additional graphs, which can then trigger other
actions.
 Conditions within graphs also mean that a DELETE/INSERT statement can be
short-circuited (or activated) only if the right graph conditions exist in the WHERE
clause, making for conditional logic.
 These are WORKFLOWS.
Superhero Example Workflow
1. Load an external Superheroes.ttl file into an ingestion graph.
2. Use DEL/INS to convert this file to an internal schema in superheroes graph.
3. From this converted schema, use DEL/INS to generate a SHACL file based upon the
superheroes graph, putting that into a SHACL graph.
4. Finally, use DEL/INS to create a message in a message queue graph indicating that an
update has been made to the superheroes and SHACL graphs.
And on to the demo!!!
(I will be posting the demo breakdown to a separate article called Workflows In
Sparql at https://thecaglereport.com.
Workflow Thoughts
 Transactions lock graphs in use. This means that you can create temporary graphs
in your script, so long as they get dropped upon completion.
 Temporary graphs can also be used to save and alter working triples. This is a way
of storing variables between transactions. You cannot set global variables directly
in transactions otherwise.
 You cannot run a SELECT or CONSTRUCT statement at the transaction level.
However, you can run them from within a WHERE clause.
 LOAD, sadly, does not support a WHERE clause. To load from external resources,
you may need to use SERVICE invocations instead, which can be run from the
WHERE clause.
Passing Variables Between Transactions
# Create Temporary graph with variable content
INSERT {
GRAPH Graph:Temp {
Temp:date1 Temp:hasValue now().
}
}
WHERE( bind(true() as ?true)
};
# IN a later transaction retrieve the variable value.
INSERT {
{Transation:123 has ?date.}
WHERE {
GRAPH Graph:Temp {Temp:date1 Temp:hasValue ?date}
};
Ingest Thoughts
 There are a number of ways to get non-RDF data into a knowledge portal.
 Most commercial portals have connectors to JSON, XML, relational databases, YAML, message
queues, openTelepathy (COMING SOON!) …
 It is STILL worth the time to map these to an internal organizational ontology.
 Internal transformations can create maps to relevant controlled vocabularies and
taxonomies.
 To get a good start, use AutoGPT AI or similar to do the bulk of the mapping for you.
This is where having a way of identifying different ontologies comes in handy, and
while usually get you 80% of the way there.
 That remaining 20% is often critical for your business, and deserves to have human
eyeballs on it.
Last Thoughts on Named Graphs
 Wrap your instances by associated classes in named graphs for that class, and
stuff that graph name into your SHACL metadata for that class
 The class graph will be much smaller than trying to search by class name, find the
associated graph, then retrieve the results.
 If you’re really ambitious, wrap each instance in a named graph tied into the
subject IRI, then use ?s (rdf:*)+ ?o to get the full transitive closure for ?s, to put
into that graph. This will get you a super DESCRIBE that will often get you info you
normally have to write a lot of ugly code to get, and it’s FAST.
 Most knowledge portals have named graph endpoints. Go wild.
Transformational
Tricks
WHILE TALKING ABOUT
TRANSFORMATIONS …
Turtle and JSON
 Not all systems support it, but a few extension functions can prove immensely
valuable.
 The function toJSON(listNode|graphNode) as string will convert either the root node
of a list or a named graph node into a serialized JSON string that can then be
persisted in a literal of type rdf:JSON. This can be used in SPARQL and Sparql
Update
 The function fromJSON(jsonStr,graphNode will convert that string back into triples in
the given graphNode and would be available in Sparql Update.
 This ability really comes in handy with SELECT statements serialized to JSON,
which then contains the serialized literals as sub-JSON fragments
Presentation as Function
VIEW PRESENTATIONS AS
MODULES (LIKELY RUNNING IN
NODEJS) THAT CAN BE
SELECTED TO SHAPE OUTPUT.
PRESENTATION MODULES
WOULD LIKELY BE WRITTEN IN
NODEJS AND WOULD BE ABLE
TO ACCESS THE KNOWLEDGE
GRAPH VIA SPARQL CALLS.
PRESENTATION MODULES
COULD HANDLE DIFFERENT
VARIANTS OF JSON, XML,
MARKDOWN, CSV AND SO
FORTH, AS WELL AS PERFORM
OUTBOUND
TRANSFORMATIONS TO PIDGIN
ONTOLOGIES.
SIMILAR INBOUND MODULES
COULD HANDLE NATURAL
LANGUAGE QUERIES IN A
MANNER SIMILAR TO CHATGPT,
AS WELL AS SIMPLIFY GRAPHQL
DEPLOYMENT.
SHACL for Schema Metadata
 Regardless of whether you validate content or not, think about using SHACL
within your applications for schema metadata
 SHACL works well with RDFS, and can help to document your schemas
 SHACL is a good place to store metadata equivalencies
 SHACL can hold presentation metadata that can simplify UX dramatically.
 SHACL is often used in conjunction with GraphQL
 SHACL can support function definition and metadata.
You Can Do Worse Than Jena
 Big data is sexy. We want our databases to be huge and comprehensive, even if
99.9995% of that data is never, ever touched. It’s why we get so excited about
large language models in AI, even though they’re too complex to keep up to date.
 Perhaps it’s time to think small again. Jena’s an open-source knowledge portal
with a barely-there UI. But … slap a Nodejs front end running Express on its front,
create named services that handle workflows along with a pretty UX, and you
usually can get what you need up and running within days, rather than months.
Think about turning them into Solid Pods while you’re at it – a good idea that just
needs the right platform.
 Think not about ingestion, but Expression!
Don’t Sweat Ontologies
 An ontology is a glorified term for an organization’s language. Your organization is
likely to be different than mine, so its language will be different. There’s nothing wrong
with that.
 Think in terms of pidgins (no, not the birds). A pidgin is a trade language, simplified so
that people speaking it can get most of the ideas across, even if it involves a lot of
hand-waving.
 As you build out your language, add equivalent terms (or transformations) to your
classes and properties to map to those pidgins you use. It need not be perfect – we’re
getting pretty good at translation.
 When you need that final 20%, get on the phone and talk with your customers, your
vendors, your agents. Knowledge graphs are really good for storing pidgins.
 Don’t sweat the small stuff.
Big Trends
 GraphQL is becoming the mechanism to talk to
knowledge graphs. Make your GraphQL RDF compliant,
and you’re golden. Use SPARQL for the heavy graphy
stuff that shouldn’t be public anyway.
 SHACL is showing up as the way to universally define
schemas. Use SHACL to drive your GraphQL interfaces.
Graph doesn’t always have to be Turtle, but JSON that
can represent RDF is a win across the board.
 Markdown is deconstructing HTML. It’s driving code
repositories and is the language that LLMs are using.
The age of the intricate web app may be ending as
making data meaningful overrides making web pages
overinteractive.
 The buzzword for today is Generative. Knowledge
Portals are Generative Engines. Think about it.
Why I Like XSLT 3
 XML is dead.
 However, JSON by itself is difficult to traverse, because dictionaries and arrays are
two very different things. Recursion is hard on JSON.
 However, if you canonicalize JSON (a relatively easy and fast process) as tokens
that can be represented as XML, then you can use the same kind of deep recursive
processing that XML people were used to doing.
 Language is recursive.
 XSLT3 is a recursive pattern matching transformation engine that works with most
data formats, including JSON and RDF. It can denormalize relational data into
trees and vice versa. It’s a pretty decent non-LLM based text interpreter as well.
The Dinosaur in
the Living Room
 Are you tired of AI yet?
 What we’re discovering about large
language models is that the solution to AI is
not to suck up Wikipedia and Github.
 Instead, it is to create smaller, manageable,
composite models that can be merged
together when needed to build up
contextual engines.
 LORAs, which started out in the Diffusion
space, are now giving way to Chinchillas
that are beginning to look more and more
like … knowledge graphs.
 In simpler terms, you don’t need one super-
duper genius, but a few relatively smart
people working together.
Feeding Your LLM (and SLMs)
 AI is great at classifying, but poor at naming. It is surprisingly good at
summarizing, something people generally are not great at. It is getting better at
reasoning, but that is an expensive capability.
 Knowledge graphs can benefit from LLM capabilities, but more to the point,
knowledge graphs can also in turn provide provenance, evolution and higher
order reasoning to both large and small language models.
 While there are a number of different approaches, JSONL is becoming the
preferred mechanism for fine tuning such models. RDF is superfood to such
models, rich with connections.
Summary
 Knowledge Portals should be
transformation engines.
 Knowledge graphs can represent complex
structures in a more universal manner than
any other data representation (including AI).
 Knowledge graphs primary weaknesses
stem from becoming too fixated on rigor
and protocol, even as other technologies
evolve around it.
 By beefing up the RDF stack to better allow
for map/reduce transformations especially,
we stand a better chance of remaining not
only relevant but vital.
 Generative AI (Machine Learning) and
Symbolic AI (Semantics) must work
together, as they represent collectively the
breadth of knowledge programming.

More Related Content

Similar to Transformational Tricks for RDF.pptx

Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with railsTom Z Zeng
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandrarantav
 
Introduction to Apache Cassandra and support within WSO2 Platform
Introduction to Apache Cassandra and support within WSO2 PlatformIntroduction to Apache Cassandra and support within WSO2 Platform
Introduction to Apache Cassandra and support within WSO2 PlatformSrinath Perera
 
Algebraic Data Types for Data Oriented Programming - From Haskell and Scala t...
Algebraic Data Types forData Oriented Programming - From Haskell and Scala t...Algebraic Data Types forData Oriented Programming - From Haskell and Scala t...
Algebraic Data Types for Data Oriented Programming - From Haskell and Scala t...Philip Schwarz
 
Embed--Basic PERL XS
Embed--Basic PERL XSEmbed--Basic PERL XS
Embed--Basic PERL XSbyterock
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFSNilesh Wagmare
 
A Brief, but Dense, Intro to Scala
A Brief, but Dense, Intro to ScalaA Brief, but Dense, Intro to Scala
A Brief, but Dense, Intro to ScalaDerek Chen-Becker
 
perl 6 hands-on tutorial
perl 6 hands-on tutorialperl 6 hands-on tutorial
perl 6 hands-on tutorialmustafa sarac
 
Triplestore and SPARQL
Triplestore and SPARQLTriplestore and SPARQL
Triplestore and SPARQLLino Valdivia
 
Scala final ppt vinay
Scala final ppt vinayScala final ppt vinay
Scala final ppt vinayViplav Jain
 
Martin Odersky - Evolution of Scala
Martin Odersky - Evolution of ScalaMartin Odersky - Evolution of Scala
Martin Odersky - Evolution of ScalaScala Italy
 
Stringing Things Along
Stringing Things AlongStringing Things Along
Stringing Things AlongKevlin Henney
 
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
 ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in... ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...Saurabh Nanda
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"George Stathis
 
typemap in Perl/XS
typemap in Perl/XS  typemap in Perl/XS
typemap in Perl/XS charsbar
 
XSLT+SPARQL: Scripting the Semantic Web with SPARQL embedded into XSLT styles...
XSLT+SPARQL: Scripting the Semantic Web with SPARQL embedded into XSLT styles...XSLT+SPARQL: Scripting the Semantic Web with SPARQL embedded into XSLT styles...
XSLT+SPARQL: Scripting the Semantic Web with SPARQL embedded into XSLT styles...Diego Berrueta
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic WebIvan Herman
 
Semantic web
Semantic webSemantic web
Semantic webtariq1352
 

Similar to Transformational Tricks for RDF.pptx (20)

Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 
Introduction to Apache Cassandra and support within WSO2 Platform
Introduction to Apache Cassandra and support within WSO2 PlatformIntroduction to Apache Cassandra and support within WSO2 Platform
Introduction to Apache Cassandra and support within WSO2 Platform
 
Algebraic Data Types for Data Oriented Programming - From Haskell and Scala t...
Algebraic Data Types forData Oriented Programming - From Haskell and Scala t...Algebraic Data Types forData Oriented Programming - From Haskell and Scala t...
Algebraic Data Types for Data Oriented Programming - From Haskell and Scala t...
 
Embed--Basic PERL XS
Embed--Basic PERL XSEmbed--Basic PERL XS
Embed--Basic PERL XS
 
Introduction To RDF and RDFS
Introduction To RDF and RDFSIntroduction To RDF and RDFS
Introduction To RDF and RDFS
 
A Brief, but Dense, Intro to Scala
A Brief, but Dense, Intro to ScalaA Brief, but Dense, Intro to Scala
A Brief, but Dense, Intro to Scala
 
perl 6 hands-on tutorial
perl 6 hands-on tutorialperl 6 hands-on tutorial
perl 6 hands-on tutorial
 
Casbase presentation
Casbase presentationCasbase presentation
Casbase presentation
 
Triplestore and SPARQL
Triplestore and SPARQLTriplestore and SPARQL
Triplestore and SPARQL
 
Scala final ppt vinay
Scala final ppt vinayScala final ppt vinay
Scala final ppt vinay
 
Martin Odersky - Evolution of Scala
Martin Odersky - Evolution of ScalaMartin Odersky - Evolution of Scala
Martin Odersky - Evolution of Scala
 
Stringing Things Along
Stringing Things AlongStringing Things Along
Stringing Things Along
 
Introduction to es6
Introduction to es6Introduction to es6
Introduction to es6
 
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
 ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in... ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
ABRIDGED VERSION - Joys & frustrations of putting 34,000 lines of Haskell in...
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
 
typemap in Perl/XS
typemap in Perl/XS  typemap in Perl/XS
typemap in Perl/XS
 
XSLT+SPARQL: Scripting the Semantic Web with SPARQL embedded into XSLT styles...
XSLT+SPARQL: Scripting the Semantic Web with SPARQL embedded into XSLT styles...XSLT+SPARQL: Scripting the Semantic Web with SPARQL embedded into XSLT styles...
XSLT+SPARQL: Scripting the Semantic Web with SPARQL embedded into XSLT styles...
 
State of the Semantic Web
State of the Semantic WebState of the Semantic Web
State of the Semantic Web
 
Semantic web
Semantic webSemantic web
Semantic web
 

Recently uploaded

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Transformational Tricks for RDF.pptx

  • 1. Transformational Tricks for RDF/SPARQL BY KURT CAGLE EDITOR, THE CAGLE REPORT
  • 2. Who is Kurt Cagle?  Editor, The Cagle Report (https://thecaglereport.com)  Email: kurt.cagle@thecaglereport.com  Linked In: https://linkedin.com/in/kurtcagle  Calendly: https://calendly.com/semantical. Open office hours.  I like graphs, large language models, metadata, AI, future of work  I consider myself a data therapist. Book a free hour if you want to talk.
  • 3. Purpose of This Class  The purpose of this class is to teach you transformation techniques for the knowledge portal  I’m assuming you have a basic working knowledge of RDF and SPARQL, and have at least heard of SHACL.  When done, you should have new tools for making your data sit up and beg, and hopeful have a new way of thinking about the RDF Stack
  • 4. Warning!!!  It is likely that somewhere in this class, I will skewer a sacred cow or five.  There are many different ways of building ontologies, many different best practices.  Frankly, many of those best practices are no longer relevant, or represent ways of thinking that should be changed in the face of advances in the technology.  Hopefully, even if you disagree with the author, there will be useful information contained herein, but go in with an open mind and a willingness to at least question your beliefs, if not necessarily change them.
  • 5. Transformations A KNOWLEDGE PORTAL IS NOT A QUERY SYSTEM, BUT A TRANSFORMATIONAL ONE. IT TRANSFORMS INFORMATION FROM ONE ONTOLOGY TO ANOTHER IT TRANSFORMS DATA FROM ONE FORMAT TO ANOTHER IT CREATES NEW KNOWLEDGE FROM OLD KNOWLEDGE. UNLESS YOU UNDERSTAND KNOWLEDGE GRAPH TRANSFORMATIONS, YOU ARE NOT GETTING EVERYTHING OUT OF YOUR KNOWLEDGE PORTAL.
  • 6. Namespace Tricks THESE ARE SUGGESTIONS ON SETTING UP NAMESPACES TO MAKE THEM AS USEFUL AS POSSIBLE
  • 7. Namespaces  Namespaces underlay a great deal of semantics, but often tend to be poorly utilized. Especially with regards to ontological modeling, the following techniques may be useful (and are used throughout this deck).  Use namespaces that correspond to classes. For instance, a Character class might have an associated namespace:  Namespace: http://comicsdata.org/ns/Character#  PREFIX Character: <http://comicsdata.org/ns/Character#>  The class can then be specified with the prefix in Turtle:  Character:Catwoman a Character: .  All triple stores support this notation after around 2017.
  • 8. Namespace Construction  The use of class-based namespaces can simplify contructing and deconstructing URIs in SPARQL:  str(Character:) -> “http://comicsdata.org/ns/Character#”  strafter(str(Character:Catwoman),str(Character:)) -> “Catwoman”  iri(concat(str(Character:),”Catwoman”)) -> <http://comicsdata.org/ns/Character#Catwoman> -> Character:Catwoman  In some cases, you do not need to explicitly convert URIs to strings.  iri(concat(Character:,”Catwoman”))
  • 9. Namespace Best Practices  Store namespace and prefix strings in SHACL NodeShape declarations.  This makes it easier to construct contexts in SPARQL and JSON  Filepaths can be stored as namespaces.  E.g., PREFIX basePath: file:///path/to/root/  basePath:foo/bar.ttl -> <file:///path/to/root/foo/bar.ttl>  RDF-XML handles this notation just fine.  This notation is friendlier to JSON-LD.
  • 10. Namespace Anti-Patterns  Stop trying to map URIs to URLs. It makes URIs brittle.  If you must, designate an ontology URI than can be mapped.  Instance URIs are not that important, property and class URIs are.  Do not use ontology import predicates – use SPARQL Update LOAD instead. (Indeed, do you really need ontologies?)  The number of class namespaces has no appreciable impact on performance.  Use named graphs to avoid namespace collision.  If you use Camel Case, stick with it. If you use underscores, stick with it.
  • 11. Turtle Tricks TURTLE IS NOT JUST COMPACT – IT’S A POWERFUL LANGUAGE FOR ORGANIZING INFORMATION, BUT ONLY IF IT’S USED
  • 12. Blank Nodes Are Structure Pointers BLANK NODES CAN BE CONFUSING UNTIL YOU KNOW THAT A BLANK NODE IS A POINTER TO A STRUCTURE. TURTLE NOTATION DELIBERATELY HIDES BLANK NODES, BUT YOU CAN USE THEM TO CREATE ARRAYS, DICTIONARIES, PARAMETERS LISTS, AND SIMILAR STRUCTURES. AT THE SAME TIME, USING SHACL PROVIDES A WAY OF DEPRECATING THE USE OF BLANK NODES WHERE THEY REALLY SHOULD BE NAMED NODES.
  • 13. Square Brackets = Dictionaries  Turtle uses the square bracket to indicate a dictionary.  Character:Catwoman a Character: ; Character:address [ a Address: ; #The class is usually implied Address:street “1313 Mockingbird Lane”, Address:city: “Arkham”; Address:type AddressType:MailingAddress; ].  This is equivalent to  Character:Catwoman a Character: ; Character:address _:CatWomanAddress ; . _:CatwomanAddress #This is the implicit blank node. a Address: ; Address:street “1313 Mockingbird Lane”, Address:city: “Arkham”; Address:type AddressType:MailingAddress; .  Note that containership is implied, but illusory. These are still triples.
  • 14. Dictionaries and JSON Objects Character:Catwoman a Person: ; Character:address [ a Address: ; Address:street “1313 Mockingbird Lane”, Address:city: “Arkham”; Address:type AddressType:MailingAddress; ]. {“Character:Catwoman”:{ @type: “Character” , address: { @type: “Address”, street: “1313 Mockingbird Lane”, city: “Arkham”, “Address:type”: “MailingAddress” } } }
  • 15. Parentheses = Linked Lists  Turtle uses parentheses to indicate a linked list.  Character:HarleyQuinn Character:memberOfOrg ( Org:SinCitySirens Org:SuicideSquad ).  This is equivalent to  Character: HarleyQuinn Character:memberOfOrg _:OrgList. _:OrgList rdf:first Org:SinCitySirens; rdf:next _:secondItem. _:secondItem rdf:first Org:SuicideSquad; rdf:next _:nil.  Linked lists are intrinsically ordered, regardless of whether inferencing is enabled or not.
  • 16. Annotations – Double Angle Brackets  An annotation is metadata that applies to a single assertion, using the RDF- Star mechanism  <<Character:Batman Character:description “World’s greatest detective!”>> Assertion:comment “Wait! What about Sherlock Holmes, or Hercules Poirot?!”.  The triple in the double angle brackets is the subject of the annotation, and again can be thought of as a blank node to a data structure with values, subject, predicate, object:  _:assertion rdf:subject Character:Batman ; rdf:predicate Character:Description ; rdf:object “World’s greatest detective!”;. Assertion:comment “Wait! What about Sherlock Holmes, or Hercules Poirot?!”  The use of the <<>> notation is a new addition to RDF, RDF-Star, that is currently undergoing discussion as a standard, though it is becoming adopted by most most modern triple stores.
  • 17. Annotation Sample  <<Character:Batman Character:description “World’s greatest detective!”>> Assertion:annotation [ Annotation:comment “Wait! What about Sherlock Holmes?!”; Annotation:author Author:ArthurConanDoyle; Annotation:date “1902-05-21”^^xsd:date; ], [ Annotation:comment “Maybe Hercule Poirot?”; Annotation:author Author:AgathaChristie; Annotation:date “1953-07-19”^^xsd:date; ].
  • 18. Annotations – Best Practices  Typically an annotation of an assertion will consist of a dictionary node that holds several properties, rather than just one comment.  An assertion has a unique identifier, even if the two such assertions have the same subject, predicate, and object. This means that multiple annotations can be made about the same assertion by different authorities.  The Open Annotation Standard is a good framework for annotating content, and works especially well with RDF-Star.  Annotations can be useful to indicate version changes of individual assertions, as well as a way to track provenance.
  • 19. Pointer Containers  Occasionally, you’ll see pointer structures, such as  [] a Character: .  or  [ a Character:] rdf:label “Joe”.  An empty blank node is treated the same as an anonymous URI or pointer.  In the second case, this should be read as “there exists a Character, whose label is Joe”. Two separate blank characters are assumed to have different URIs.  SPARQL notation emerged from the use of “named” blank nodes that were in fact treated as variable names for nodes. Turtle structural notation consequently translates directly into SPARQL Structural Notation.
  • 20. Literals and Datatypes  One of the ways we underutilize knowledge graphs is in not doing enough with datatypes.  Most people use the standard xsd: datatypes, without ever thinking about why they shouldn’t.  If you have a length measure, rather than putting type in properties, use “25”^^qudt:Meters.  If you have a full or partial population measure, use “8.01E9”^^quantity:People.  If you have a markdown document use “# Cool Titlen## by Kurt Cagle”^^textFormat:Markdown.  Use your datatypes to indicate how literals are parsed, then add metadata. Nuff said.
  • 21. SPARQL Tricks SPARQL IS MOSTLY USED FOR CREATING TABLES, BUT IS IT CAPABLE OF FAR MORE.
  • 22. The Problem with CONSTRUCT  The CONSTRUCT command in SPARQL is often used to produce graphs, but because of this utility, it also hides much more useful capabilities. It is an anachronism from OWL Inference rules where addition of new triples would create virtual triples by these rules.  Increasingly, organizations are abstracting access to knowledge portals to GraphQL or JSON-LD, and as such SPARQL is hidden behind layers of security. One benefit of this is that one underutilized feature of SPARQL Update – named graphs – is beginning to come into its own, especially because it enables workflows.  This is what we’ll cover here.
  • 23. The True Structure of “Triples” Subject Predicate Object Graph AssertionID Character:Catwoman rdf:type Character: Default uri:urn:12051AFCD… “True” Triple The Graph Container For the Triple The Identifier For the Triple For RDF-Star The modern “triple” is actually (minimally) a pentuple. The assertion ID field is used for reification, and identifies the pentuple as a unique object. The graph field indicates that this particular tuple is part of a specific set. If the triple is the same but the graph is different, the Assertion ID will be different too.
  • 24. Named Graphs  If a pentuple is set to a URI, that URI becomes the “name” of the graph that it belongs to.  All other pentuples with the same name are in the same graph.  If a pentuple has the same triple values as another pentuple but has a different graph name, then they are in different graphs.  This means that the same triple can be contained in multiple graphs.  The graph name is a URI just like node or assertion identifier.
  • 25. Default Graph  When a triple is inserted into the graph without specifying it’s graph, the triple will be placed in the default graph.  The default graph can be inclusive or exclusive. Where it is inclusive, a query against a triple without specifying a graph will retrieve all triples from all graphs. Where it is exclusive, the only triple that will be retrieved will be the one already in the default graph.  Check with your vendor whether your system is inclusive or exclusive. Most such systems can go from the one mode to the other with a simple software switch in the product.
  • 26. Named Graph Use Cases  Landing graph of newly ingested data  Graph containing ingestion graph data converted to local ontology  Graph holding all instances of a given class  Graph containing reports generated from analysis  Graph holding draft vs. approved resources (workflow)  Graph containing SHACL constraints  Graph holding transformed content for output  Graphs containing data catalogs  Graphs holding intermediate calculations  Graphs holding frequently requested query results  Graphs containing documentation  Graphs containing document stores  Graphs containing controlled vocabularies for rapid lookup  Unions, intersections, diffs  The list goes on and on …
  • 27. Named Graphs vs. Data Store Partitions  Many data portals have distinct data stores that are partitioned with certain configurations.  Such stores are usually best for multi-tenant operations, as these typically also have authentication and security considerations.  Named graphs are conceptually a level lower – they exist within a single security perimeter and are optimized for rapid clearing.  Modern named graphs usually provide secondary indexes so that adding or removing a triple is as simple as adding or removing a URI to an array.
  • 28. Internal Arrangement of Named Graphs Subject Predicate Object Graph Character:Catwoman rdf:type Character: Graph:Graph1 Graphs AssertionIDs Graph:Graph1 uri:urn:12051AFCD… Graph:Graph2 uri:urn:4792AE109… Default uri:urn:319AD1592… The Pentuple arrangement at a deeper level illustrates how graphs can be moved, copied and deleted so quickly, as the graph key is itself a part of an index. Garbage collection only occurs when the last graph is removed from the tuple.
  • 29. INSERTING DATA THROUGH SCRIPTS The following SPARQL UPDATE script will add explicit triples to a graph: # Namespaces Declared Here INSERT DATA { GRAPH Graph:Catwoman { ex:Catwoman rdf:type ex:Antihero ; rdfs:label "Catwoman" ; ex:alterEgo "Selina Kyle" ; ex:description "A skilled thief and occasional ally of Batman, who uses her athletic abilities, martial arts skills, and cunning to navigate the criminal underworld of Gotham City." ; ex:superPowers "Peak human strength, agility, and endurance; expert martial artist and hand-to-hand combatant; skilled thief and acrobat" ; ex:gender "Female"; ex:universe "DCEU" ; } };
  • 30. Digging Deeper Into Insert  The INSERT DATA command uses the same syntax (I believe) as the TRIG standard, which, save that the namespaces use PREFIX instead of @PREFIX in the context header.  Multiple graphs can be populated this way in the same statement.  This is often useful for spot or test data, or for configuration data.  Unlike SPARQL QUERY, SPARQL COMMANDS are transactional – multiple SPARQL UPDATE statements can be run in the same script if separated by semi-colons.
  • 31. Using the DELETE / INSERT model  The powerhouse of SPARQL UPDATE is the DELETE/INSERT/WHERE command which can be thought of as the supercharged version of CONSTRUCT.  The WHERE clause determines the graph (and the variables) that the DEL/INS will be working on.  The DELETE statement is a CONSTRUCT like statement that eliminates the triples that are created from the CONSTRUCT graph from the main graph.  The INSERT statement is a CONSTRUCT like statement that add new triples to the main graph from the CONSTRUCT graph if they do not already exist.  Together these three keywords can be used to transform one graph into another.
  • 32. Delete/Insert/Where This identifies the working group in the WHERE clause, then DELETES and INSERTS the triples with the corresponding variables. # Namespaces Declared Here DELETE { GRAPH ?gOld { ?sOld ?pOld ?oOld. } } INSERT { GRAPH ?gNew { ?sNew ?pNew ?oNew. } } WHERE { # Use SPARQL to determine old and new variables } };
  • 33. GRAPH Commands Command Example Comments CREATE Graph CREATE graph:Foo Creates an Empty Graph with the given name DROP Graph DROP graph:Foo Drops (deletes) the graph from the system CLEAR Graph CLEAR graph:Foo Clears the data from the graph but retains the graph itself COPY Graph COPY graph:Foo to graph:Bar Replaces the triples in graph:Foo to graph:Bar, but leaves graph:Foo untouched. MOVE Graph MOVE graph:Foo to graph:Bar Replaces the triples in graph:Foo to graph:Bar, but eliminates Graph:Foo ADD Graph ADD graph:Foo to graph:Bar Copies the triples in graph:Foo to graph:Bar, but without removing the old graph:Bar content. LOAD Graph Load <uri> to Graph:Foo Loads external RDF from a file system or the internet. <uri> must be a hard reference.
  • 34. Workflows with SPARQL UPDATE  SPARQL UPDATE is transactional, and can have multiple operations per script.  If the transaction fails at any point, the results are rolled back to the previous state.  Named graphs make it possible to create and populate a graph, then use that graph to generate one or more additional graphs, which can then trigger other actions.  Conditions within graphs also mean that a DELETE/INSERT statement can be short-circuited (or activated) only if the right graph conditions exist in the WHERE clause, making for conditional logic.  These are WORKFLOWS.
  • 35. Superhero Example Workflow 1. Load an external Superheroes.ttl file into an ingestion graph. 2. Use DEL/INS to convert this file to an internal schema in superheroes graph. 3. From this converted schema, use DEL/INS to generate a SHACL file based upon the superheroes graph, putting that into a SHACL graph. 4. Finally, use DEL/INS to create a message in a message queue graph indicating that an update has been made to the superheroes and SHACL graphs. And on to the demo!!! (I will be posting the demo breakdown to a separate article called Workflows In Sparql at https://thecaglereport.com.
  • 36. Workflow Thoughts  Transactions lock graphs in use. This means that you can create temporary graphs in your script, so long as they get dropped upon completion.  Temporary graphs can also be used to save and alter working triples. This is a way of storing variables between transactions. You cannot set global variables directly in transactions otherwise.  You cannot run a SELECT or CONSTRUCT statement at the transaction level. However, you can run them from within a WHERE clause.  LOAD, sadly, does not support a WHERE clause. To load from external resources, you may need to use SERVICE invocations instead, which can be run from the WHERE clause.
  • 37. Passing Variables Between Transactions # Create Temporary graph with variable content INSERT { GRAPH Graph:Temp { Temp:date1 Temp:hasValue now(). } } WHERE( bind(true() as ?true) }; # IN a later transaction retrieve the variable value. INSERT { {Transation:123 has ?date.} WHERE { GRAPH Graph:Temp {Temp:date1 Temp:hasValue ?date} };
  • 38. Ingest Thoughts  There are a number of ways to get non-RDF data into a knowledge portal.  Most commercial portals have connectors to JSON, XML, relational databases, YAML, message queues, openTelepathy (COMING SOON!) …  It is STILL worth the time to map these to an internal organizational ontology.  Internal transformations can create maps to relevant controlled vocabularies and taxonomies.  To get a good start, use AutoGPT AI or similar to do the bulk of the mapping for you. This is where having a way of identifying different ontologies comes in handy, and while usually get you 80% of the way there.  That remaining 20% is often critical for your business, and deserves to have human eyeballs on it.
  • 39. Last Thoughts on Named Graphs  Wrap your instances by associated classes in named graphs for that class, and stuff that graph name into your SHACL metadata for that class  The class graph will be much smaller than trying to search by class name, find the associated graph, then retrieve the results.  If you’re really ambitious, wrap each instance in a named graph tied into the subject IRI, then use ?s (rdf:*)+ ?o to get the full transitive closure for ?s, to put into that graph. This will get you a super DESCRIBE that will often get you info you normally have to write a lot of ugly code to get, and it’s FAST.  Most knowledge portals have named graph endpoints. Go wild.
  • 41. Turtle and JSON  Not all systems support it, but a few extension functions can prove immensely valuable.  The function toJSON(listNode|graphNode) as string will convert either the root node of a list or a named graph node into a serialized JSON string that can then be persisted in a literal of type rdf:JSON. This can be used in SPARQL and Sparql Update  The function fromJSON(jsonStr,graphNode will convert that string back into triples in the given graphNode and would be available in Sparql Update.  This ability really comes in handy with SELECT statements serialized to JSON, which then contains the serialized literals as sub-JSON fragments
  • 42. Presentation as Function VIEW PRESENTATIONS AS MODULES (LIKELY RUNNING IN NODEJS) THAT CAN BE SELECTED TO SHAPE OUTPUT. PRESENTATION MODULES WOULD LIKELY BE WRITTEN IN NODEJS AND WOULD BE ABLE TO ACCESS THE KNOWLEDGE GRAPH VIA SPARQL CALLS. PRESENTATION MODULES COULD HANDLE DIFFERENT VARIANTS OF JSON, XML, MARKDOWN, CSV AND SO FORTH, AS WELL AS PERFORM OUTBOUND TRANSFORMATIONS TO PIDGIN ONTOLOGIES. SIMILAR INBOUND MODULES COULD HANDLE NATURAL LANGUAGE QUERIES IN A MANNER SIMILAR TO CHATGPT, AS WELL AS SIMPLIFY GRAPHQL DEPLOYMENT.
  • 43. SHACL for Schema Metadata  Regardless of whether you validate content or not, think about using SHACL within your applications for schema metadata  SHACL works well with RDFS, and can help to document your schemas  SHACL is a good place to store metadata equivalencies  SHACL can hold presentation metadata that can simplify UX dramatically.  SHACL is often used in conjunction with GraphQL  SHACL can support function definition and metadata.
  • 44. You Can Do Worse Than Jena  Big data is sexy. We want our databases to be huge and comprehensive, even if 99.9995% of that data is never, ever touched. It’s why we get so excited about large language models in AI, even though they’re too complex to keep up to date.  Perhaps it’s time to think small again. Jena’s an open-source knowledge portal with a barely-there UI. But … slap a Nodejs front end running Express on its front, create named services that handle workflows along with a pretty UX, and you usually can get what you need up and running within days, rather than months. Think about turning them into Solid Pods while you’re at it – a good idea that just needs the right platform.  Think not about ingestion, but Expression!
  • 45. Don’t Sweat Ontologies  An ontology is a glorified term for an organization’s language. Your organization is likely to be different than mine, so its language will be different. There’s nothing wrong with that.  Think in terms of pidgins (no, not the birds). A pidgin is a trade language, simplified so that people speaking it can get most of the ideas across, even if it involves a lot of hand-waving.  As you build out your language, add equivalent terms (or transformations) to your classes and properties to map to those pidgins you use. It need not be perfect – we’re getting pretty good at translation.  When you need that final 20%, get on the phone and talk with your customers, your vendors, your agents. Knowledge graphs are really good for storing pidgins.  Don’t sweat the small stuff.
  • 46. Big Trends  GraphQL is becoming the mechanism to talk to knowledge graphs. Make your GraphQL RDF compliant, and you’re golden. Use SPARQL for the heavy graphy stuff that shouldn’t be public anyway.  SHACL is showing up as the way to universally define schemas. Use SHACL to drive your GraphQL interfaces. Graph doesn’t always have to be Turtle, but JSON that can represent RDF is a win across the board.  Markdown is deconstructing HTML. It’s driving code repositories and is the language that LLMs are using. The age of the intricate web app may be ending as making data meaningful overrides making web pages overinteractive.  The buzzword for today is Generative. Knowledge Portals are Generative Engines. Think about it.
  • 47. Why I Like XSLT 3  XML is dead.  However, JSON by itself is difficult to traverse, because dictionaries and arrays are two very different things. Recursion is hard on JSON.  However, if you canonicalize JSON (a relatively easy and fast process) as tokens that can be represented as XML, then you can use the same kind of deep recursive processing that XML people were used to doing.  Language is recursive.  XSLT3 is a recursive pattern matching transformation engine that works with most data formats, including JSON and RDF. It can denormalize relational data into trees and vice versa. It’s a pretty decent non-LLM based text interpreter as well.
  • 48. The Dinosaur in the Living Room  Are you tired of AI yet?  What we’re discovering about large language models is that the solution to AI is not to suck up Wikipedia and Github.  Instead, it is to create smaller, manageable, composite models that can be merged together when needed to build up contextual engines.  LORAs, which started out in the Diffusion space, are now giving way to Chinchillas that are beginning to look more and more like … knowledge graphs.  In simpler terms, you don’t need one super- duper genius, but a few relatively smart people working together.
  • 49. Feeding Your LLM (and SLMs)  AI is great at classifying, but poor at naming. It is surprisingly good at summarizing, something people generally are not great at. It is getting better at reasoning, but that is an expensive capability.  Knowledge graphs can benefit from LLM capabilities, but more to the point, knowledge graphs can also in turn provide provenance, evolution and higher order reasoning to both large and small language models.  While there are a number of different approaches, JSONL is becoming the preferred mechanism for fine tuning such models. RDF is superfood to such models, rich with connections.
  • 50. Summary  Knowledge Portals should be transformation engines.  Knowledge graphs can represent complex structures in a more universal manner than any other data representation (including AI).  Knowledge graphs primary weaknesses stem from becoming too fixated on rigor and protocol, even as other technologies evolve around it.  By beefing up the RDF stack to better allow for map/reduce transformations especially, we stand a better chance of remaining not only relevant but vital.  Generative AI (Machine Learning) and Symbolic AI (Semantics) must work together, as they represent collectively the breadth of knowledge programming.