These slides accompany a 1.5 hour webinar sponsored by the Western New York Library Resources Council, presented by Dan Gillean of Artefactual Systems on February 15th, 2017.
The session was intended to introduce participants to some of the key standards, services, and tools available to support digital preservation planning and activities. Part 1 focused on DP101, and how to begin tackling digital preservation in your institution. Part 2 introduced the Archivematica project's history, philosophy, and aims, while Part 3 was a live demonstration of Archivematica in action.
Thank you to WNYLRC for sponsoring this event!
2. Today’sAgenda
1. A brief introduction to Digital Preservation
2. A brief introduction to the Archivematica project
3. A brief hands-on demo with Archivematica
https://pixabay.com/en/notebook-plan-dates-coffee-cup-1895164/
5. A reference model – not a systems architecture!
https://wiki.archivematica.org/Overview
ISO 14721
6. Designated community:
• An identified group of potential
Consumers who should be able to
understand the preserved information
“Since a key purpose of an OAIS is to
preserve information for a Designated
Community, the OAIS must understand the
Knowledge Base of its Designated
Community to understand the minimum
Representation Information that must be
maintained.“ (p. 2-4)
8. InformationPackagesinOAIS
The bits
What we need
to interpret the
bits
Content information
What we need
to manage the
bits
Preservation Description Information
What we need to know about how it’s all put together
What we need to know to discover content in the package
10. Okay….
digital preservation?
City Light Substation located at 7th and Yesler. Item 3889, Engineering Department Photographic Negatives (Record Series 2613-07), Seattle Municipal Archives via
https://www.flickr.com/photos/seattlemunicipalarchives/3749710573
11. • Governance
• Organizational structure
• Staffing
• Procedural accountability
• Preservation policy framework
• Documentation
• Financial sustainability
• Security
ISO 16363Reminds us that much of digital
preservation readiness is not technical
– it’s organizational
12. ISO 16363
ISO 16363 is divided in 3 main
sections:
1. Organizational Infrastructure
2. Digital Object Management
3. Infrastructure and Security Risk
Management
13. Preservation Planning: monitoring, risk analysis, planning for obsolescence of preservation formats
Ingest: Q/A of SIPS, prep of AIPs
Data Management: managing the descriptive info allowing for management and retrieval of content
Archival Storage: storage, maintenance, and retrieval of AIPs
Administration: overall operation, configuration of hard/software, standards compliance, policy and
procedure
Access: managing DIP generation and supply to consumers, reports, etc
14. • Appraisal / selection
• Transfer / upload
• Virus scanning
• Checksum generation
• File identification
• Validation and characterization
• Metadata extraction
• SIP creation
• Ingest
• Normalization/migration or
emulation planning
• AIP preparation
• DIP preparation
• Preservation
• AIP completion and validation
• Storage and integrity checking
• Administration and Management
• Preservation planning
• Auditing
• Rights management
• Storage
• Geo-redundancy
• Fixity checking
• Arrangement / description / cataloguing
• Publishing / display / exhibition
• Discovery / retrieval
Your digital preservation activities might include…
Photograph of Women Working at a Bell System international Telephone Switchboard, https://catalog.archives.gov/id/1633445
23. ISO 16363Is the gold standard for auditing a
trustworthy digital repository… but don’t
be intimidated, or feel like you need
certification to be doing something
useful.
For a more accessible breakdown of
16363, See Kara Van Malssen’s slides
from PASIG NYC 2016:
https://figshare.com/articles/How_I_learned_to_
stop_worrying_and_love_ISO_16363/4055661
24. Level 1 (Protect) Level 2 (Know) Level 3 (Monitor) Level 4 (Repair)
Storage and
Geographic
Location
• 2complete copies not collocated
• Get media off diverse storage
media and into a system
• At least 3 complete copies
• At least 1 in different geographic
location
• Document storage system, media,
and what’s needed to use them
• At least 1 copy in location w
different disaster threat
• Obsolescence monitoring
process for storage system and
media
• At least 3 copies in locations w
different disaster threats
• Comprehensive plan to keep files
and metadata on currently
accessible media or systems
File Fixity and
Data Integrity
• Fixity check on ingest if checksum
provided w content
• Create fixity info if not provided on
transfer
• Check fixity on all ingests
• Use write-blockers w original
media
• Virus check high-risk content
• Fixity checks at regular
intervals
• Maintain fixity logs and supply
audit on demand
• Virus check all content
• Ability to detect corrupt data
• Check fixity in response to specific
events/activities
• Ability to replace/repair corrupted
data
• Ensure no one has write access to
all copies
Information
Security
• Identify who has read, write,
move, and delete authorizations
• Restrict who has those
authorizations to individual files
• Document access restrictions for
content
• Maintain logs of who
performed what actions on
files, incl. deletions and
preservation actions
• Perform audit of logs
Metadata
• Inventory of content and its
storage locations
• Ensure backup and non-collocation
of inventory
• Store admin metadata
• Store transformative metadata
and log events
• Store standard technical and
descriptive metadata
• Store standard preservation
metadata
File Formats
• Encourage creators to use open
formats and codecs when possible
• Inventory of file formats in use
• Monitor file format
obsolescence issues
• Perform format migrations,
emulation, etc. as needed
NDSALevelsofPreservation
Adapted from: http://ndsa.org/activities/levels-of-digital-preservation/
25. NDSA Levels of Preservation – Categories
Quantity of NDSA Levels of
Preservation Criteria
Quantity of related
ISO 16363 Criteria
Storage and Geographic Location 9 34
File Fixity and Data Integrity 12 29
Information Security 5 22
Metadata 6 50
File Formats 4 32
(Unmappable from ISO 16363) - 23
Blog post: https://www.avpreserve.com/papers-and-presentations/mapping-standards-for-richer-
assessments-ndsa-levels-of-digital-preservation-and-iso-163632012/
Mappings: https://www.avpreserve.com/wp-content/uploads/2016/05/ISO-Requirements-by-NDSA-
LoDP-Categories.xlsx
Slides: http://www.avpreserve.com/wp-content/uploads/2014/07/NDSA_ISO_Presentation_2014.pdf
AVPreserve–16363/NDSAmappings
43. WhatisArchivematica?
Archivematica is a web-
and standards-based,
open-source application
which allows your
institution to preserve
long-term access to
trustworthy, authentic
and reliable digital
content.
Standards based
Open source
Customizable
Integrated w 3rd
party systems
Active community
44. 20142008
2007: UNESCO REPORT 0.1-ALPHA
DASHBOARD
INTRODUCED
Archivematica’s development
0.7
1.0
RELEASED!0.9
0.8
Bradley, K., Lei, J., Blackall, C.
Towards An Open Source
Archival Repository and
Preservation System (2007)
Planning and development begin.
Initial Funding via UNESCO MotW
Subcommittee, IMF Archives, City of
Vancouver Archives
0.6-ALPHA
February 2010
May 2010
February 2011 February 2012
PREMIS
in
METS
0.10
April 2013
August 2012
STORAGE
SERVICE 0.2
January 2014
46. • It captures technical information about an object in order
to support the implementation of preservation strategies
such as normalization, migration or emulation (PREMIS
Object)
• It describes relationships between digital objects (PREMIS
Object)
• It provides an audit trail of actions taken by the digital
preservation repository to preserve the object (PREMIS
Event)
• It names the individuals, organizations and software tools
responsible for taking actions to preserve digital objects
(PREMIS Agent)
• It specifies the actions a repository is allowed to take to
preserve digital objects (PREMIS Rights)
PREMIS
PREMIS, or Preservation Metadata
Implementation Strategies, is the
recognized standard for metadata
about objects in a digital
preservation system.
47. • It provides a wrapper for other metadata, such
as PREMIS and Dublin Core.
• It defines relationships between digital objects
and other digital objects, and between digital
objects and their metadata.
• It can be used to provide technical metadata
about digital objects (although Archivematica
doesn’t implement it that way: we wrap PREMIS
in it instead)
METS, or Metadata Encoding and
Transmission Standard, was designed to
support inter-repository data exchange.METS
48. • Originally developed for exchange between
California Digital Library and Library of
Congress; specifications written up by IETF in
2008
• System agnostic, interoperable format for
storage and exchange
• “Bag and tag” approach: mandatory tag file
contains a manifest listing every file in the
payload together with its corresponding
checksum
BagIt
BagIt is a hierarchical file packaging format
designed to support disk-based or network-
based storage and transfer of arbitrary digital
content.
49.
50. PREMIS in METS XML
ArchivematicaAIPstructure
Packaged according to BagIt specifications
Virus scan, normalization report, extraction log, etc
For browsing in Archivematica
Original + normalized
objects, submission docs,
original metadata
included at SIP creation
53. A program is free software if the program's users have
the four essential freedoms:
1. The freedom to run the program as you wish, for any purpose (freedom
0).
2. The freedom to study how the program works, and change it so it does
your computing as you wish (freedom 1). Access to the source code is a
precondition for this.
3. The freedom to redistribute copies so you can help your neighbor
(freedom 2).
4. The freedom to distribute copies of your modified versions to others
(freedom 3). By doing this you can give the whole community a chance
to benefit from your changes. Access to the source code is a
precondition for this.
Free Software Foundation
Free Software Definition
https://www.fsf.org/licensing/essays/free-sw.html
What isFree Software?
58. Development Philosophy
Community-based development Bounty model of business
• Standards-based
• Open source / Creative Commons
• Generalize specific use cases
• Include all features in public release
• Accept community improvements
• Iterative development via multiple
contributions over subsequent
releases
• Maintain: documentation, software,
wiki,
• Produce additional resources (e.g.
videos, presentations, webinars)
• Participate in user forum
• Offer additional paid services
• Always include development in
public project
59. Do one thing well…
Micro-services Handshakes Partnerships
Gears – Joe DeSousa.
https://www.flickr.com/photos/mustangjoe/22711070429
Metal Handshake – Grey Geezer.
https://commons.wikimedia.org/wiki/File:Metal_Handsha
ke.jpg
Hands Passing Baton - tableatny,
https://www.flickr.com/photos/53370644@N06/497649
7160
66. archivesDIRECT
• Partnership with DuraSpace
• U.S. Based
• Launched August 2014
• Secure storage and
monitoring via DuraCloud
• Artefactual provides AM
technical support
http://archivesdirect.org/
67. Perpetua
• Partnership with Arkivum
• U.K. Based
• Launched July 2016
• Secure storage and
monitoring via Arkivum
• Artefactual provides AM
technical support
http://arkivum.com/perpetua/
68. ArchivesCANADA
Digital Preservation Service
• Partnership with The
Canadian Council of
Archives (CCA)
• Canada Based
• Launched September 2016
• Artefactual provides AM
technical support, storage,
monitoring
http://archivescanada.ca/ACDPS
important to note is that digital preservation is not all tools and systems – much of it is organizational, covering internal policies and procedures, workflow documentation and accountability chains, mission statements, budgeting, staffing and succession planning. Regardless of your resources or the technical expertise you have in-house, considering and prioritizing these important aspects means that you can start working on digital preservation today.
6 functional areas in the OAIS Reference Model:
Preservation Planning
Ingest
Data management
Archival Storage
Administration
Access
List is highly selective – does not include all tools, services, and standards
Dates are approximations
Standards: does not include content standards, only a couple metadata exchange standards like EAD
Line between service and a tool is blurred – e.g. Dataverse, Preservica, LOCKSS
Does not cover major version changes of tools, or formalization of standards (e.g. TRAC ISO 16363)
Many tools listed are open source, but a few aren’t (Preservica, Rosetta, etc). Means barrier isn’t financial.
421 tools listed as of February 2017
Understanding what materials your working with is critical to preparing a digital preservation strategy. If we’re aiming to be pragmatic, to do something now, then we need to forgo worrying about the edge case formats, and focus on what we are currently tasked with preserving.
This means building an inventory. It means understanding the types of file formats in your care – are they open formats, or proprietary? Common or rare? Do you have a large diversity of formats to contend with, or a smaller set of recurring ones? Once we know what we are working with, we can begin to build a strategy around what needs to be done – both minimally, and optimally.
The important thing is to identify key holdings and use cases - don’t worry about all possible edge cases for now.
If we acknowledge that digital preservation efforts remain vastly under-resourced, then it makes sense for each of us to be contributing to solutions that will benefit all. DP best practice already embraces many of these principles – open source and open documentation will allow us to collaborate and share, while open formats and open standards ensure that our efforts will remain accessible and interpretable in the long-term.
It can be scary, but the best thing you can do, no matter where you’re at in your preservation efforts, is to use some kind of metric to get a clear sense of how you’re doing before you determine what’s next. Self-assessment will help you figure out what you have the capacity to improve immediately, versus what you will need to plan to address over time.
Using a metric or model also helps make the requirements behind a trustworthy digital preservation environment concrete – they provide clear benchmarks which can be used to help concrete actions that will improve your preservation readiness.
Created in 2013 by the National Digital Stewardship Alliance
Provides you with 4 levels across 5 categories, with a total of 36 criteria
Can be less intimidating than ISO 16363 as a starting place
In 2014, Bertram Lyons of AVPreserve presented his analysis of the NDSA levels against ISO 16363. He has shared his work via the AVPreserve website, so you can use the levels as an entry point into the more granular requirements outlined in 16363.
There are many other maturity models and self-assessment tools out there – one more of note the Digital Preservation Capability Maturity Model, developed by Charles Dollar and Lori Ashley in 2013. It provides 5 stages or levels that can be used for assessment in 15 categories – 8 related to Digital Preservation Infrastructure, and 7 related to Digital Preservation Services.
Dollar and Ashley even created a free online assessment tool that can be used with the model, which can be found at www.digitalok.org
All this to say – pick a tool or metric that makes sense to you as a starting place. Even if you haven’t started formally thinking about digital preservation in your institution, run through the model and save your results. Now do it again in a year, so you can see the progress you’ve made.
When it comes to relating to technology itself, there are 2 opposing but related mentalities we have encountered doing work around digital preservation. The first is the Black Box:
A suspicion of, and failure to understand what the tools do and don’t do, how they work, etc. This can lead to overly complex workflows, or a failure to get started altogether. There is the hope that the problem will be handed off to someone else, or that some new proof will emerge to either confirm or deny these suspicions that the technology is untrustworthy – and until then, no action should be taken.
The converse mindset to this is the Magic Wand – that is, magical thinking about the powers of technology to “solve” digital preservation, for the process to be fully automated at the push of a button. Set and forget. If a tool is not fully automated, it must be inferior and unworthy of considering. And yet, so much of digital preservation is about more than just tools.
Ask questions, and help answer them in community forums
Submit documentation (or request it if you can’t write it)
Attend or organize meetups, user groups, and skillshares
Watch or deliver webinars
Help translate resources into another language
Make conference presentations
Write a blog post
Fill out a wiki page or add a review
Contribute code or otherwise support new software development
Support new standards development
If we acknowledge that much of our challenges stem from a lack of resources, then we need to make clear to our stakeholders the value in investing in digital preservation, and the consequences of ignoring it.
There are resources out there that can help you do this. For example, the Digital Preservation Coalition has assembled a DP Business Case Toolkit, full of tips and resources on how to make your case.
They also have a page linking to dozens of other related resources. It’s important to consider how doing an internal self-audit with a recognized metric or model can help you build your case. It can clarify what the expectations are and where your organization is falling short, and later, it can also help you demonstrate the progress you’ve made and justify further support.
Finally, be public. We can learn from each other’s failures as much as successes.
Standards based: OAIS, PREMIS, METS, BagIt, Dublin Core
Open source: A-GPLv3 license, free to study, use, modify, etc
Customizable: Add/change/remove FPR rules as needed
Integrated: dSpace, CONTENTdm, Islandora, LOCKSS, AtoM, DuraCloud, OpenStack, Archivist’s Toolkit, Arkivum, ArchivesSpace… etc
Active community:
PREMIS, or Preservation Metadata Implementation Strategies, is the recognized standard for metadata about objects in a digital preservation system.
BagIt is a hierarchical file packaging format designed to support disk-based or network-based storage and transfer of arbitrary digital content.
To support the original and ongoing aims of the project, Archivematica has always been, and will continue to be, released as open source software - currently, we release it under a strong viral license (AGPLv3) to ensure that the application is not forked or incorporated by someone wishing to charge access to its enhancements. In maintaining our commitment to the original project aims, we also seek in every way we can to lower or remove barriers to the project resources: to this end, Artefactual not only releases the code via our code repository, we also make our documentation available, our webinar recordings, our wiki resources, our presentation slides, and even as much free support as we can offer via the Archivematica user forum, all free of charge. With every major release, we also budget time to review and address many of the bugs reported to us by our user community, with the hope of seeing the project improve progressively in both large and small ways with each public release. To sustain ourselves as a business and be able to continue maintaining and developing Archivematica, Artefactual also offer additional paid services - including application hosting, consultation, training, theming, data migrations, and of course, custom development. This business model is sometimes known as "Professional open source" - at Artefactual, we think of it as the Bounty model of open-source development. As a company, we use our resources from these additional services to continue supporting the ongoing maintenance work required to keep the AtoM project sustainable and growing.
Every time we are contracted to develop a custom feature for an institution, we work with the client to ensure the feature respects established national and international standards, and we try to generalize its implementation so it can not only meet the use case of the institution in question, but also be of benefit to the entire Archivematica user community. We then include all of these enhancements in the next public release. Whenever possible, we also accept bug fixes and code contributions from our user community, and will handle the review and merging of this code into public releases, as well as its maintenance through subsequent releases, thereby reducing the burden on individual contributors over time. We have a number of development resources on our wiki to help users get started.
This is the community-based development heart of the Archivematica project. The growth and direction of Archivematica is determined by the individuals and institutions who recognize that open-source software requires maintenance to continue to be viable and relevant in the long-term, and who sponsor features, enhancements, and bug fixes that will benefit the project as a whole in addition to meeting their particular institutional or individual needs. This means that Archivematica, as an application, is truly what our community makes of it - the current version, like all versions before it, has been made possible thanks to contributions large and small from dozens of institutions and individuals. You can see this on the release announcements we maintain and on the Roadmap part of our wiki for the upcoming releases, where we try to acknowledge all the different institutions and individuals that have helped to make the new features possible. This is one of the joys of community-based development - seeing what we can accomplish as a community when we are all working towards common goals. It also means that institutions with more resources are able to invest in solutions that not only meet their needs, but also benefit the community at large and assist smaller, under-resourced institutions to have access to the same tools and applications. Everyone benefits from any single contribution - whether it is development or contributions to the project in other ways (documentation, user forum participation, papers and presentations, provision of services by other service providers, formation of user groups, and more).
From source systems
Hand-off to access and description systems
Hand-off for archival storage – repositories or other secure storage
Administrative hand-off
From source systems
Hand-off to access and description systems
Hand-off for archival storage – repositories or other secure storage
Administrative hand-off