We argue the need for support of annotations on an edition made by researchers unaffiliated to the edition project, as a contribution to the explanatory material already present on the site, for purposes of private study or for publication in conjunction with a scholarly article. We demonstrate our annotation approach, that exploits RDFa for embedding the edition-specific semantics and identifier in the edition's HTML pages. We discuss an
FRBROO- based ontology of the editorial domain, capable of describing both the objects of editing (Text and Document) and their representation in the edition. We have a fully functional and open source prototype of an annotation tool that over the coming years will be actively developed, for use in multiple disciplines.
Facilitating reusable third-party annotations in the digital edition
1. Facilitating Reusable Third-Party
Annotations in the Digital Edition
Marijn Koolen
(Royal Netherlands Academy of Arts and Sciences - Humanities Cluster)
Peter Boot
(Huygens ING)
Annotation in Scholarly Editions and Research, 22-02-2019, Wuppertal, Germany
2. ● Annotations in Digital Editions
○ Tend to be restricted to critical notes by creators of the edition
○ Users rarely have support from editions for making their own annotations
● Annotation is a scholarly primitive (Unsworth 2000, Palmer et al. 2009)
○ All scholars make annotations, use them to structure thoughts, gather data
○ Either visible only in private copies, or invisible in shared source materials,
○ Add interpretations, explanations and perspectives
● Annotation is broad but vaguely defined concept
○ “nearly every type of digital research activity in the Humanities today is referred to or connected to
annotating” (Niels-Oliver Walkowski on DARIAH Annotation WG survey, 2017)
Annotation as Scholarly Activity
4. ● Facilitate third-party annotations in the digital edition:
○ annotations made by researchers unaffiliated to the edition project,
○ contributes to explanatory material already present on the site,
○ Purpose: private study or publish along scholarly article
● Making annotations more a visible part of scholarly communication
○ “Visions of the scholarly web”
● Goal: approach with low threshold for participation
○ For resource providers: tool that is easy to integrate in existing edition
○ For scholars: tool that supports different annotation tasks, allows rich querying/analysing
○ Implementation:
■ W3C Web Annotation data model and protocol (interoperability)
■ Javascript client talking with WA server
Third-Party Annotations
5. Overview
1. Annotation Digital Editions on the Web
a. The problem of anchoring
b. The problem of semantics
2. Making Web Editions Annotatable
a. Anchors and semantics via RDFa
b. The problem of representation
3. Facilitating Third-Party Annotations
a. The consequences
b. Beyond Digital Editions
7. ● How to anchor annotation to specific location in the edition
● Ensure the annotation addresses a component in the logical information
structure that defines the edition
○ and not a location in an HTML page which is merely one representation of an edited text
Annotating Digital Editions on the Web
8. ● Many open, browser-based tools for social annotation tasks
○ Annotator.js
○ Hypothes.is
○ Dokie.li
○ Pund.it
○ Apache Annotator
● Advantages
○ Annotate online materials
○ Open formats: sharing, collaborating
● Disadvantages
○ Limited knowledge of the structure the annotated object
○ Limited support for using/analysing annotations outside of annotated web page
○ Limited support for annotating multimedia objects
State of the Art in Web-based Annotation
11. ● We argue an annotation tool should understand structure of object itself
a. Browser uses HTML representation
i. HTML is layout oriented, no meaningful connection with annotated object
ii. Annotation not robust against changes in HTML representation
b. Multiple websites may have (different) online versions/editions of the same object
i. Annotations all target same object but different URLs
c. Object may have multiple representations
i. Digital edition can have different transcriptions, translations, audio versions
ii. Annotations made on one representation should be accessible for others
d. Resource providers should be able to suggest suitable annotation types for different object
components
Understanding Annotated Object
12. ● Edition provider has:
○ Resources + metadata (e.g. as TEI/XML)
● Transformed to HTML presentation format for web browser
○ Browser (and annotation plugin) only sees presentation information
○ Compare rich semantics of TEI file with poor semantics of HTML representation
Annotating Digital Editions as Web Pages
20. ● Use RDFa to describe resources in web page
○ Enrich HTML presentation of resource with semantic info on resource
● Develop annotation client that understands RDFa
○ Parse RDFa information in web page to know annotatable components
○ Capture structural semantic information in annotation
Semantic Anchoring via RDFa
27. ● How to anchor an annotation to specific representation in the edition
● Ensure the information structure is described in sufficient detail to distinguish
○ the edited text or document (the object of editing)
○ its (multiple) representation(s) in the edition
Annotating Digital Editions on the Web
28. Creative Works and Representation
● Digital Editions can have multiple representations of the same creative work
○ E.g. image scan, transcript, translation
○ Annotations may relate to a specific representation…
■ E.g. a correction or comment on a word in the transcript or translation
○ … or to the abstract creative work...
■ E.g. background information for something referenced in the text
■ Or a code to assign a phrase to a category of interest
○ … or to a combination of representations
■ E.g. linking a phrase in the transcript to a drawing in the page image
● Different structures may be leading in the HTML view
○ E.g. document-centric (pages) and text-centric (sections, paragraphs) structures
○ Annotations made on one structure should be translated to match alternative structure
29. Annotations on Different Levels
● How can we distinguish between abstract work and representation?
● How can we target annotations at these different levels?
● Which annotations should be shown in which context?
30. ● We created an FRBR-based ontology to distinguish between
○ Editable objects (creative works, parts of works)
○ Edition objects (representations, parts of representations)
● FRBR
○ Functional Requirements for Bibliographic Records
○ Distinguish Work - Manifestation - Expression - Item
○ Van Gogh’s letter is a create Work
○ Diplomatic transcription is an expression of this work
■ (and a creative work in itself)
○ Translation is an expression of this work
■ (and a creative work in itself)
Editable and Edition Domains
41. Private, Shared, Public
● Annotations have permissions
○ Private by default, can be shared (once implemented) or made public
○ Importance of private annotations (Bradley, 2012): the role of personal reflection
■ Also, McCarty’s point on the act of making an annotation (“knowing in doing”)
■ Annotations are mainly for structuring your thoughts?
● Annotations for writing vs. annotations for reading
○ Transition from ‘for writing’ (knowing in doing) to ‘for reading’ (knowing in using)
■ I.e. from private/shared to public
○ When does annotator consider annotation of interest for others?
■ E.g. when they’re published alongside article to support arguments made
○ Edit annotations to make them comprehensible for others
42. Impact
● What are consequences of third-party annotation for scholarship?
○ Publish annotations along scholarly arguments
○ Edition could become living document with ongoing visible communication
■ Esp. within a collaborative project
■ But also more publicly (how to avoid this becoming an impenetrable mess?)
● Feedback
○ Edition owners/maintainers may want to incorporate certain annotations into the edition
○ Third-party annotation to curated annotation/markup
● Editions of famous works or authors may attract much attention
○ Open model: anyone can share anything with everyone
○ Editorial model: public annotations need to be approved (by whom?)
○ Private/shared model: only share with specific collaborators, enable limited conversations,
can’t openly cite annotations
43. Low Threshold To Participate?
● We want our annotation approach to be easy to adopt by other editions
○ Semantics can be embedded via RDFa without changing the layout
○ The JavaScript client that can be loaded in any RDFa-enriched web page
■ Configurable to suit editor’s/annotator’s needs
○ A Python REST server running Elasticsearch in the background for indexing and retrieval
■ With access permissions per annotation (private, shared, public)
■ Support for AnnotationCollections
● Both available on GitHub
○ Server: https://github.com/marijnkoolen/scholarly-web-annotation-server
○ Client: https://github.com/CLARIAH/scholarly-web-annotation-client
○ Document is minimal and somewhat out-of-date
51. Wrap Up
● We think support for third-party annotation in digital editions is valuable
○ Several difficulties:
■ Changing objects, unstable identifiers
■ Openness comes at a price
○ Our approach has pros and cons
■ Pro: flexible, supports many tasks and multiple modalities, interoperable
■ Cons: complex structure, esp. when using FRBR layers, easy to make mistakes
○ Suggestions for improvement/simplification are welcome
● Plans
○ Set up across CLARIAH infrastructure (funded 2019-2023)
○ Experiment with pilots in different disciplines (historical science, media studies, literary studies,
linguistics, ...)
52. Anderson, S., T. Blanke, and S. Dunn. (2010). Methodological commons: arts and humanities e-Science fundamentals. Philosophical Transactions
of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 368(1925), 3779-3796.
Boot, P, Haentjens Dekker, R, Koolen, M, Melgar, L. (2017). Facilitating Fine-grained Open Annotations of Scholarly Sources. In: Conference
abstracts Digital Humanities 2017, Montreal.
Boot, P., Koolen, M (2018). A FRBRoo-based annotation ontology for digital editing. In: Conference abstracts European Association for Digital
Humanities 2018, Galway.
Bradley, J. (2012). Towards a richer sense of digital annotation: Moving beyond a “media” orientation of the annotation of digital objects. Digital
Humanities Quarterly, 6(2).
Palmer, C. L., Teffeau, L. C., & Pirmann, C. M. (2009). Scholarly Information Practices in the Online Environment. Report commissioned by OCLC
Research.
Unsworth, J. (2001). Scholarly Primitives: what methods do humanities researchers have in common, and how might our tools reflect this. In
Humanities Computing: formal methods, experimental practice symposium, King’s College, London.
Walkowski (2016). The Landscape of Digital Annotation and Its Meaning. Conference on Language Technologies & Digital Humanities, Ljubljana,
2016
References
54. EARMARK
● Extremely Annotational RDF Markup
● Goals:
○ Allow multiple annotators to annotate the same object (overlapping annotations
○ Refer to external entities
● Solution
○ Java application,
○ Works on XML/TEI files,
○ Derives identifier from XML structure, uses XPath and character offsets and range to identify
text elements
○ Allows both standoff annotation and embedding as markup
○ RDF for references to anything in the world
○
60. ● Approach to enable third-party annotation in digital editions
○ Technical approach is only first step!
● Annotation approach to support fluid nature of annotations
○ Support need for critical distinctions in targeting
● All code on GitHub
○ Server: https://github.com/marijnkoolen/scholarly-web-annotation-server
○ Client: https://github.com/CLARIAH/scholarly-web-annotation-client
○ Document is minimal and somewhat out-of-date
Conclusions