The NY Philharmonic is currently archiving their entire 200-year history, which includes historical documents, music scores, programs, marketing material and business records. The solution has currently ingested terabytes of data and the final archive will contain petabytes, all stored and managed by Alfresco. Ingestion process imports assets, all of which are compromised of multiple images that are taken with digital cameras. This session will articulate how systems such as OpenMigrate, Imagemagick and SOLR were used in the process. We’ll discuss how the content model is comprised of custom aspects, custom associations, and we keep the repository running quickly. archives.nyphil.org
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
CASE-1 NY Philharmonic Case Study
1. Digitizing the New York
Philharmonic Archives
Mitch Brodsky, Digital Archives Project Manager
Alfresco DevCon 2011
@NYPhilArchive
2. About the New York Philharmonic
• Oldest Symphony Orchestra in the United States
• Began in 1842
• Formed as a co-operative organization
• A “save everything” culture
Alfresco Use Case - NYP Digital Archives Page 2
3. Project Scope
3,200 Programs
1,300 Marked Scores 72 Scrapbooks
The International Era
1943-1970
8,000 Business Record Folders 8,500 Photographs 4,200 Glass Lantern Slides
Alfresco Use Case - NYP Digital Archives Page 3
4. By the end of 2012
1.3 million pages
10 million nodes
15 Terabytes
And we’re only getting started…
Alfresco Use Case - NYP Digital Archives Page 4
5. Our Objectives
• Accurate representation of originals
• Comprehensive
• Easily/Freely Accessible
• A New, Sharable Model
• One repository: December 7, 1842 --> YESTERDAY
DO MORE, SEE MORE, USE MORE, SHARE MORE
GO
8. Public User
Software Architecture
Web Front End
Additional SOLR
Metadata XML: layer:
- Adds security
Different content model per doctype
- Adds control
Creates content node for storing metadata
- Relieves Dependencies
and associations
Image capture Storage
- Faceted Search
OM creates nodes in pre-defined hierarchy
Image Conversion
- Other advanced capabilities
Doctype
APACHE SOLR
Intermediate organization system
(different per doctype)
“pages” ID 1
Asset
Asset ID 2 Page 001
“metadata”
Page 002Content Item
- Properties populated with metadata
- Holds associations to images
Source Metadata
Alfresco Use Case - NYP Digital Archives Page 8
9. What We Want
• Data Entry
• Form Designer
• Metadata Templating
• Predefined Validation lists
• Improve Client Capacity
• Workflows or Document Library
• JPEG2000 (integration with Djatoka server)
• Completely isolate our SOLR-driven front end from Alf DB
10. Next Steps
• OCR
• Automate Workflows with Activiti
• Integrate Digital Preservation Tools
• Hook in Social Media apps
11. Visit Us
archives.nyphil.org
Mitch Brodsky, Digital Archives Project Manager
brodskym@nyphil.org
(212) 875-5933