Slides for a presentation made at the Archives Association of British Columbia's 2016 Annual Conference, April 15, 2016, held in Vancouver, BC, Canada.
The slides aim to provide users with a basic introduction to some of the key considerations when implementing a digital preservation plan, describing the workflow with a series of cooking-related references.
1. Your Digital
Preservation
Cookbook
Sara Allain, Dan Gillean,
and Sarah Romkey,
Artefactual Systems
Archives Association of
B.C. Annual Conference,
April 15, 2016
https://www.pinterest.com/pin/455145106065308238/
Your Digital
Preservation
Cookbook
Sara Allain, Dan Gillean,
and Sarah Romkey,
Artefactual Systems
Archives Association of
B.C. Annual Conference,
April 15, 2016
2. Today’s offerings:
1. Ingredient preparation
(digital preservation actions)
2. Cooking (preservation
storage)
3. Serving (providing access
to digital content)
4. Kitchen management
(policies and procedures)
http://www.sampletemplates.com/menu-templates/blank-menu-template.html
4. Preparation: Digital preservation
actions
By taking on digital preservation prep, your
files are better understood for the future.
Like properly prepped ingredients,
prepared digital content is better cooked
(preserved).
Unlike ingredients in your favourite recipe,
prep activities actually increase their
authenticity rather than transforming them
into something new.
http://www.blogher.com/women-and-food-will-win-war-wwi
5. Preparation: Fixity
Are those ingredients what they say they are on the
box?
Fixity, or checksums, record the order of
the bits so it can be re-checked in the
future.
Capturing fixity as early as possible in the
accessioning process makes sense - don’t
move the files several times before creating
a checksum.
Checksums pair nicely with other functions,
e.g. packaging (Bagit).
http://www.buzzfeed.com/leonoraepstein/16-fascinating-facts-about-jell-o#.uqrYYQqw7
6. Preparation: Virus scan
Keep pests out of the kitchen!
Scan for viruses so you don’t
ingest them into your preservation
environment!
Quarantine functionality in a
preservation system allows virus
definitions time to update.
City of Vancouver Archives, Deer in Malahat Lookout Kitchen
William Bros. Photographers Collection AM1545-S3-: CVA 586-497
7. Preparation: File identification
Know your ingredients!
Know what you’re cooking with:
identify file formats, ideally using
digital signatures for increased
precision.
Should identify not just the format,
but also the version
Identifying the file formats accurately
increases likelihood of getting
more/better technical metadata.
City of Vancouver Archives, [Woman mixing ingredients at] Dale's [Roast Chicken]
kitchen on Granville Street
William Bros. Photographers Collection AM1545-S3-: CVA 586-4012
8. Preparation: Validation, characterization, metadata
extraction
Are those noodles real?
Validation: is it a well-formed example
of that particular file format?
Characterization: what are the
particulars of this specific file? (e.g.
size, codec, bitrate, etc)
Extracting this technical metadata from
the files and storing in a standardized
way helps ensure their longevity.
http://travelwireasia.com/2013/08/fake-food-japanese-style-that-looks-good-enough-to-eat/
9. Preparation: PII and sensitive
information
Like in the analogue world, you may have
a requirement to flag files that contain
personally identifying information and
restrict access to the originals.
Unlike the analogue world, there are tools
available that can help you scan
automatically for this information!
This task can be performed during
processing, or after access is requested.
http://www.amazon.com/White-Horse-Whisky-Blindfolded-Taste/dp/B0159EOIXQ
10. Preparation: Normalization, migration, emulation
Strategies for dealing with software
obsolescence:
Normalization converts files into a more
preservation-friendly format while
retaining the originals
Migration migrates the files overtime as
new file formats emerge.
Emulation preserves the files and their
software/operating system.
UBC Archives, Two Students in Cooking Class in Home Economics, School of Family
and Nutritional Science fonds, UBC 101.1/15
11. Preparation: Putting it all together
If that all sounded like more kitchen
prep than Thanksgiving dinner, luckily
there’s an easier way!
Digital preservation systems can tie
much of the functionality together into
one workflow.
Some of these functions are also taken
care of in repository systems (coming
up next).
http://freshome.com/2013/03/22/what-you-can-learn-from-the-jetsons-about-home-automation/
13. Cooking: Preservation storage
Prep is critical, but it’s only the first step!
Like cooking a meal, preserving your content
for the long term requires specific tools and
methods.
As with food, the best way to preserve your
digital content is to use an appropriate storage
container to ensure that your content will be
safe and usable for the long term.
https://www.flickr.com/photos/29069717@N02/10111289655/
14. Cooking: Preservation storage
Preservatives and an airtight seal
Your storage container for digital content is a
repository. Repositories come in many flavours:
• Can have a public interface or be closed off
(“dark archive”)
• Can be a simple data store or something
really complex
• May come with built-in tools to help you
ensure that your data is valid for the long-term
https://www.flickr.com/photos/29069717@N02/10111289655/
15. Cooking: Fixity checking
Simpler - faster - better - surer!
Fixity checking ensures that your content is still viable.
By looking at the fixity record you created during
preparation and then re-running the tool you used to
create that fixity record in the first place, you can tell if
your content is still viable - all the bits are still present
and accounted for.
Your repository system should enable you to do this
automatically - no human intervention needed, unless
the fixity checks don’t match!
http://s.ecrater.com/stores/108769/55f584a6249cf_108769b.jpg
16. Cooking: Redundancy
Make sure there’s enough for seconds. And thirds!
Making many copies of your digital
content is critical to ensuring that you
have back-up if something goes wrong.
Two common kinds of redundancy are:
• Back-up copies of your database
preserved on different servers
• Geo-redundancy, usually provided
by a server hosting provider
https://c1.staticflickr.com/3/2096/5794109510_a4f966a812.jpg
17. Cooking: Technical metadata
The recipe for your digipres casserole
Technical metadata tells you what comprises the
digital content as well as how it’s put together.
There are different standards depending on the type
of technical metadata that you’re recording. PREMIS
is widely used to capture metadata specifically
relating to preservation; there are many others as
well.
Following a standard means that your metadata will
be consistent both within your repository and over
time. http://www.midcenturymenu.com/2010/06/the-mid-century-menu-ham-banana-casserole/
18. Cooking: Audit and control
Don’t let strangers mess around in your kitchen!
Performing regular, holistic audits to
check on the integrity of your files is
the best way to ensure that they’re not
degenerating over time.
Only authorized users should have
access to your repository. Controlling
who can edit your digital content -
including metadata - is a crucial
component to ensure that it’s stored
safely and securely.
http://land.allears.net/blogs/jackspence/21%20Yak%20%26%20Yeti%2001.jpg
19. Cooking: Future proofing
If you start with the basics, you’ll be able to cook anything
Choosing the best repository system isn’t just
about your present needs - it’s also about the
future.
Ensuring that your repository is open and built
around standards and best practices means that,
if you need to, you can migrate to a new system.
Adhering to standards and best practices is like
learning to chop an onion - it’s the foundation on
which your collections rely.
http://ecx.images-amazon.com/images/I/81DGvz%2BcNZL.jpg
21. Serving: know your designated community!
Who’s coming to dinner?
The OAIS reference model defines a designated
community as:
“An identified group of potential Consumers who
should be able to understand a particular set of
information. The Designated Community may be
composed of multiple user communities. A
Designated Community is defined by the Archive
and this definition may change over time.”
This means understanding that your end users
might have different needs than the institutional
actors responsible for ongoing preservation.
http://hahasforhoohas.com/stories/ten-things-you-never-want-say-dinner-guests-arrive
22. Serving: Applying access restrictions
Knowing what not to serve is just as important as knowing what to
serve!
You will need to make sure that you are applying
appropriate access restrictions. These might be
based on copyright, local statutes, donor
restrictions, licenses, etc. You’ll need clear policies
on who can access what when.
PREMIS Rights:
http://www.loc.gov/standards/premis/
Coyle, Karen. “Rights in the PREMIS Data Model.” A report for the Library of
Congress, December 2006. http://www.loc.gov/standards/premis/Rights-in-the-
PREMIS-Data-Model.pdf
https://makeameme.org/meme/no-dinner-for-pczmhb
23. Serving: Creating access derivatives (DIPs)
Or, don’t serve a whole chicken on wing night!
Preservation masters ≠ access copies!
For access, you want:
Smaller file sizes
In common formats
Supported by many web browsers and OSes
TIFF → JPG
WAV → MP3
http://vancouverfoodster.com/2012/11/27/tasting-plates-chinatown-strathcona/
Dissemination Information Package (DIP): An
Information Package, derived from one or
more AIPs, and sent by Archives to the Consumer
in response to a request to the OAIS.
24. Serving: adding descriptive metadata
Let your dinner guests know what’s on the menu
Use existing content standards: Dublin Core,
ISAD(G), RAD (Canada), MODS, etc.
This can be done in a database or content
management system (e.g. AtoM, ArchivesSpace,
CollectiveAccess; custom databases, etc), or in
locally created finding aids.
However you choose to do it, you will also need
to think about how users are eventually going to
access this information...
http://www.flavourbistro.co.nz/bistro-menu-g-173.html
25. Serving: indexing your content and making it discoverable
Send out the dinner invitations!
Your end users (or consumers) will need a
way to explore and understand the content
you are making available.
Some facility for searching and browsing will
greatly ease this.
If your resources are web-accessible, they
can be indexed by search engines and
become more broadly discoverable.
Indexing also includes adding access points -
give your users a way into the content!
Access Software: A type of software that presents
part of or all of the information content
of an Information Object in forms understandable to
humans or systems.
http://www.sandyloujohnson.com/974-2/
26. Serving: Maintaining a relationship with the master
You need to know where your hor d'oeuvres came from if you want to be able to serve them again in the
future
Additional descriptive metadata created outside of the preservation workflow
should remain linked to the AIP / digital object master.
Links to your rights statements are crucial for monitoring compliance!
Mutts comic strip, by Patrick McDonnell.
http://farmtotablela.com/farm-table-humor/
Provenance: maintaining the digital
chain of custody
If you need to generate updated DIPs
in the future, you want to be able to re-
trace that chain
27. Serving: Evaluating Access Systems for DigiPres
Channeling your inner food critic
If you are looking to implement an existing access
system as part of your digital preservation
environment, here’s a summary of some of the factors
to consider:
• Search and retrieval
• Digital object display
• Hierarchies and context
• Access restrictions / rights management
• Standards adherence
• Data exchange and interoperability
• Digital provenance (relationship to preservation masters)
https://www.pinterest.com/pin/73253931414036246/
29. Kitchen Management: The importance of policy
Digital preservation is not all about tools
and technology:
In standards like ISO 16363 (2012),
policies and organizational
infrastructure account for between ⅓ -
½ of the entire standard!
You need to ensure that your
organization has the will, the capacity,
and the vision to undertake digital
preservation over the long-term.
http://recruitloop.com/blog/who-really-needs-to-get-involved-in-the-recruitment-process/
30. Kitchen Management: The importance of policy
Example factors to consider:
• Does your organization’s mission statement
explicitly cover a commitment to digital
preservation?
• Do you have succession, contingency,
and/or escrow plans in place?
• Do you have training policies around digital
preservation?
• Are the duties of each staff associated with
each link in the chain documented?
• Do you have an internal auditing
mechanism?
• Do you have a long-term financial plan for
your preservation?
http://liaisoncollegeoakville.com/chef-diploma-programs/specialist-chef/
31. Kitchen Management: The value of collaboration
This ain’t Iron Chef!!!
• Digital preservation is hard - and ongoing
• Archives are underfunded - especially in
Canada
• There’s a lot to learn…
But we can learn together, and share
resources.
To be successful, we’ll need to
collaborate, not compete - like a
REAL professional kitchen!
http://www.popsugar.com/food/Interview-Next-Iron-Chef-Geoffrey-Zakarian-20967020
32. Shopping List
Tools and resources
http://www.middlevillemarketplace.com/shopping-list.php
Shopping List
33. Fixity
Tools to create checksums:
md5deep: http://md5deep.sourceforge.net/
md5summer: http://www.md5summer.org/
Built into various preservation systems/tools: Archivematica, Preservica, Bagger, DuraCloud, etc.
Tools to verify checksums:
Fixity: https://github.com/avpreserve/fixity
Built into various tools/systems as above
Tools to scan viruses
Clam AV : http://www.clamav.net/
34. Format identification
PRONOM database: http://www.nationalarchives.gov.uk/PRONOM/Default.aspx
Tools:
Format Identifier for Digital Objects (FIDO): https://github.com/openplanets/fido
Siegfried: https://github.com/richardlehane/siegfried
File Information Tool Set (FITS): http://projects.iq.harvard.edu/fit
DROID: https://github.com/digital-preservation/droid
35. Characterization, validation, and metadata extraction
File Information Tool Set (FITS): http://projects.iq.harvard.edu/fits
Metadata extraction tool: http://meta-extractor.sourceforge.net
ffprobe: https://ffmpeg.org/ffprobe.html
Exiftool: http://www.sno.phy.queensu.ca/~phil/exiftool/
MediaInfo: https://mediaarea.net/en/MediaInfo
JHOVE: https://github.com/openpreserve/jhove
veraPDF: http://verapdf.org/
36. Normalization, migration, and emulation
Imagemagick: http://www.imagemagick.org/script/index.php
Inkscape: http://www.inkscape.org/
FFMPEG: http://ffmpeg.org/ffmpeg.html
Ghostscript: http://www.ghostscript.com/
KEEP solutions Emulation Framework: http://emuframework.sourceforge.net/
bwFLA Emulation as a Service: http://bw-fla.uni-freiburg.de/
38. Technical metadata standards
PREMIS: http://www.loc.gov/standards/premis/
Coyle, Karen. “Rights in the PREMIS Data Model.” A report for the Library of Congress,
December 2006. http://www.loc.gov/standards/premis/Rights-in-the-PREMIS-Data-Model.pdf
METS: http://www.loc.gov/standards/mets/
PBCore: http://pbcore.org/schema/
NISO Metadata for Images in XML: http://www.loc.gov/standards/mix/
And many more, depending on the filetypes you’re working with!
39. Descriptive metadata standards
Dublin Core: http://dublincore.org/documents/dcmi-terms/
Rules for Archival Description (Canada):
http://www.cdncouncilarchives.ca/archdesrules.html
General International Standard for Archival Description - ISAD(G):
http://ica.org/en/isadg-general-international-standard-archival-description-
second-edition
MODS: http://www.loc.gov/standards/mods/
40. Derivatives and indexing 1/2
For derivatives: see the resources above for normalization. The same tools that
are used for preservation normalization can also be used for creating access
derivatives!
For search indexing: This will depend on how you are making your resources
available. A search index generally needs to be one component of an application
stack. Here are a few resources to look into:
Elasticsearch: https://www.elastic.co/products/elasticsearch
Solr: https://lucene.apache.org/solr/
Blacklight: http://projectblacklight.org/
41. Derivatives and indexing 2/2
For adding indexing terms for discovery: Use existing controlled vocabularies
whenever possible!
• Library of Congress vocabularies: http://loc.gov/library/libarch-thesauri.html
• Getty Vocabularies: http://www.getty.edu/research/tools/vocabularies/index.html
• Library Archives Canada controlled vocabularies: http://www.bac-
lac.gc.ca/eng/services/government-information-resources/controlled-vocabularies/Pages/controlled-
vocabularies.aspx
• UNESCO thesaurus: http://databases.unesco.org/thesaurus/
• JISC Directory of Metadata Vocabularies: http://www.jiscdigitalmedia.ac.uk/guide/controlling-your-
language-links-to-metadata-vocabularies/
• RBMS Controlled Vocabularies for Use in Rare Book and Special Collections Cataloging:
http://rbms.info/vocabularies/
42. Description, repository, and access systems
• Access to Memory: https://www.accesstomemory.org
• ArchivesSpace: http://www.archivesspace.org/
• CollectiveAccess: http://collectiveaccess.org/
• Omeka: http://omeka.org/
• Islandora: http://islandora.ca/
• Hydra: https://projecthydra.org/
• Avalon: http://www.avalonmediasystem.org/
• ResCarta Toolkit: http://www.rescarta.org/
Note that a lot of these systems will include the tools and standards described in
the previous slides
43. General & policy resources
• CCSDS - Reference Model for an Open Archival Information System (OAIS):
http://public.ccsds.org/publications/archive/650x0m2.pdf
• CCSDS - Audit and Certification of Trustworthy Digital Repositories:
http://public.ccsds.org/publications/archive/652x0m1.pdf
• TRAC review tool, developed by Developed by MIT in a project led by Nancy
McGovern, Head of Curation and Preservation Services at MIT Libraries:
https://wiki.archivematica.org/Internal_audit_tool
• COPTR - Community Owned digital Preservation Tool Registry:
http://coptr.digipres.org/Main_Page
• Open Preservation Foundation: http://openpreservation.org/
44. • POWRR Project - Preserving (Digital) Objects With Restricted Resources:
http://digitalpowrr.niu.edu/
• DigiPres Commons: http://www.digipres.org/
• Digital Preservation Q & A: http://qanda.digipres.org/
• National Digital Stewardship Alliance - Levels of Preservation:
http://ndsa.diglib.org/activities/levels-of-digital-preservation/
• NDSA Digital Preservation in a Box: http://dpoutreach.net/
• AVPreserve’s open source tools: https://www.avpreserve.com/avpsresources/tools/
• AVPreserve’s papers and presentations:
https://www.avpreserve.com/avpsresources/papers-and-presentations/
General & policy resources
Editor's Notes
If you’re cooking at home, you might just try a single iteration of a recipe on a whim. A professional chef, though, can’t just make one dish - they have to make many identical copies. In a sort-of similar fashion (this is where the metaphor really starts to stretch), your repository is your professional kitchen, not the one that you have at home - making one copy won’t satisfy your need to make sure that your content is safe.