SlideShare a Scribd company logo
1 of 27
Download to read offline
SCAPE
Johan van der Knijff
Koninklijke Bibliotheek – National Library of the Netherlands
DPC, PDF/A-3 Briefing, Leeds, 13.3.2013
PDF/A-3 for preservation
Notes on embedded files and JPEG 2000
Part 1: Embedded files
PDF/A-3: embedding of any file (type)
Key point:
Use of “embedded files” really means
“embedded file streams” = specific data
structure in PDF!
File specification dictionary
31 0 obj
<</Type /Filespec /F (mysvg.svg) /EF <</F 32 0 R>> >>
endobj
File specification dictionary
31 0 obj
<</Type /Filespec /F (mysvg.svg) /EF <</F 32 0 R>> >>
endobj
EF key
points to embedded file
stream
Embedded file stream
32 0 obj
<</Type /EmbeddedFile /Subtype /image#2Fsvg+xml /Length 72>>
stream
…SVG Data…
endstream
endobj
Uses of embedded file streams
File attachments not meant to be rendered by
viewer
File attachment annotation
EmbeddedFiles entry in name dictionary
PDF/A-3
Rendered in/by PDF viewer
Rendition actions
Screen annotations
PDF/A-3
What about inline images?
Not based on “embedded file stream”, but on
“Image XObject” data structure (allows
limited set of pre-defined formats)
What about inline images?
No impact on content that is meant to be
rendered by PDF viewer
But PDF/A-3’s may contain file of any possible
format as an attachment
Embedded files wrap-up:
Part 2: JPEG 2000
Supported since PDF/A-2
Image XObject
1614 0 obj
<</Subtype/Image/Width 615/Height 978/ColorSpace/DeviceRGB
/BitsPerComponent 8/Interpolate true/Length 5278
/Filter/JPXDecode>>
stream
… Image data …
::
::
endstream
endobj
Image XObject
1614 0 obj
<</Subtype/Image/Width 615/Height 978/ColorSpace/DeviceRGB
/BitsPerComponent 8/Interpolate true/Length 5278
/Filter/JPXDecode>>
stream
… Image data …
::
::
endstream
endobj
Identifies object as
JPEG 2000 image
ISO 19005-2 (PDF/A-2):
JPEG 2000 support based on subset of JPEG
2000 Part 2 (JPX baseline)
Only Part 1 of the standard (JP2) commonly
used for archival applications!
JP2 vs JPX
JP2
JPX
JPEG 2000 Part 1:
Basic still image format
JPEG 2000 Part 2:
= JP2 + assorted
advanced stuff …
Fragmented codestreams
Allowed in JPX Baseline!
OS PDF viewers – JPEG 2000 libraries
Ghostscript: OpenJPEG or JasPer
Evince: OpenJPEG
Mupdf: OpenJPEG
Firefox PDF viewer: built-in decoder
 None of these libraries support fragmented
codestreams!
Is it really a problem?
Fragmented codestreams extremely rare
But why is this feature even allowed in a long-
term archival format?
OS support of JPEG 2000 in general remains
problematic
#SCAPEProject
http://www.scape-project.eu
This work was partially supported by the SCAPE Project.
The SCAPE project is co-funded by the European Union under
FP7 ICT-2009.4.1 (Grant Agreement number 270137).
Funding

More Related Content

Viewers also liked

Animation in power point
Animation in power pointAnimation in power point
Animation in power point
leoleogo
 

Viewers also liked (6)

The social construction of reality
The social construction of realityThe social construction of reality
The social construction of reality
 
Animation in power point
Animation in power pointAnimation in power point
Animation in power point
 
Mail merge - Get Complete Information !!
Mail merge - Get Complete Information !!Mail merge - Get Complete Information !!
Mail merge - Get Complete Information !!
 
Mail merge
Mail mergeMail merge
Mail merge
 
Mail Merge in Microsoft Word
Mail Merge in Microsoft WordMail Merge in Microsoft Word
Mail Merge in Microsoft Word
 
Mail Merge - the basics
Mail Merge - the basicsMail Merge - the basics
Mail Merge - the basics
 

Similar to PDF/A-3 for preservation. Notes on embedded files and JPEG2000

Using the JPEG2000 image format for storage and access in biodiversity collec...
Using the JPEG2000 image format for storage and access in biodiversity collec...Using the JPEG2000 image format for storage and access in biodiversity collec...
Using the JPEG2000 image format for storage and access in biodiversity collec...
Chris Freeland
 
presentation
presentationpresentation
presentation
Videoguy
 

Similar to PDF/A-3 for preservation. Notes on embedded files and JPEG2000 (20)

Gewinen mit 3W
Gewinen mit 3WGewinen mit 3W
Gewinen mit 3W
 
Jpeg 2000 For Digital Archives
Jpeg 2000 For Digital ArchivesJpeg 2000 For Digital Archives
Jpeg 2000 For Digital Archives
 
Apple's live http streaming
Apple's live http streamingApple's live http streaming
Apple's live http streaming
 
Mpeg 7 slides
Mpeg 7 slides Mpeg 7 slides
Mpeg 7 slides
 
5.Arne_Nowak_Digital_Archiving_Pilots.pdf
5.Arne_Nowak_Digital_Archiving_Pilots.pdf5.Arne_Nowak_Digital_Archiving_Pilots.pdf
5.Arne_Nowak_Digital_Archiving_Pilots.pdf
 
spraa64
spraa64spraa64
spraa64
 
spraa64
spraa64spraa64
spraa64
 
spraa64
spraa64spraa64
spraa64
 
spraa64
spraa64spraa64
spraa64
 
Using the JPEG2000 image format for storage and access in biodiversity collec...
Using the JPEG2000 image format for storage and access in biodiversity collec...Using the JPEG2000 image format for storage and access in biodiversity collec...
Using the JPEG2000 image format for storage and access in biodiversity collec...
 
presentation
presentationpresentation
presentation
 
Content packaging and MPEG-21 DID
Content packaging and MPEG-21 DIDContent packaging and MPEG-21 DID
Content packaging and MPEG-21 DID
 
Hw2
Hw2Hw2
Hw2
 
Performance Analysis of Various Video Compression Techniques
Performance Analysis of Various Video Compression TechniquesPerformance Analysis of Various Video Compression Techniques
Performance Analysis of Various Video Compression Techniques
 
File types, photoshop
File types, photoshopFile types, photoshop
File types, photoshop
 
JPEG2000 Alliance IBC 2009
JPEG2000 Alliance IBC 2009JPEG2000 Alliance IBC 2009
JPEG2000 Alliance IBC 2009
 
Videostream compression in iOS
Videostream compression in iOSVideostream compression in iOS
Videostream compression in iOS
 
Mpeg 7-21
Mpeg 7-21Mpeg 7-21
Mpeg 7-21
 
Lecture 6 -_presentation_layer
Lecture 6 -_presentation_layerLecture 6 -_presentation_layer
Lecture 6 -_presentation_layer
 
Integrating media
Integrating mediaIntegrating media
Integrating media
 

More from SCAPE Project

More from SCAPE Project (20)

C sz z6
C sz z6C sz z6
C sz z6
 
SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with Nanite
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation Tool
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation Environments
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven Schlarb
 
Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3PO
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulation
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, Aarhus
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collections
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionality
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

PDF/A-3 for preservation. Notes on embedded files and JPEG2000

  • 1. SCAPE Johan van der Knijff Koninklijke Bibliotheek – National Library of the Netherlands DPC, PDF/A-3 Briefing, Leeds, 13.3.2013 PDF/A-3 for preservation Notes on embedded files and JPEG 2000
  • 2. Part 1: Embedded files PDF/A-3: embedding of any file (type)
  • 3.
  • 4. Key point: Use of “embedded files” really means “embedded file streams” = specific data structure in PDF!
  • 5. File specification dictionary 31 0 obj <</Type /Filespec /F (mysvg.svg) /EF <</F 32 0 R>> >> endobj
  • 6. File specification dictionary 31 0 obj <</Type /Filespec /F (mysvg.svg) /EF <</F 32 0 R>> >> endobj EF key points to embedded file stream
  • 7. Embedded file stream 32 0 obj <</Type /EmbeddedFile /Subtype /image#2Fsvg+xml /Length 72>> stream …SVG Data… endstream endobj
  • 8. Uses of embedded file streams
  • 9.
  • 10. File attachments not meant to be rendered by viewer
  • 11. File attachment annotation EmbeddedFiles entry in name dictionary PDF/A-3
  • 12.
  • 15. What about inline images?
  • 16. Not based on “embedded file stream”, but on “Image XObject” data structure (allows limited set of pre-defined formats) What about inline images?
  • 17. No impact on content that is meant to be rendered by PDF viewer But PDF/A-3’s may contain file of any possible format as an attachment Embedded files wrap-up:
  • 18. Part 2: JPEG 2000 Supported since PDF/A-2
  • 19.
  • 20. Image XObject 1614 0 obj <</Subtype/Image/Width 615/Height 978/ColorSpace/DeviceRGB /BitsPerComponent 8/Interpolate true/Length 5278 /Filter/JPXDecode>> stream … Image data … :: :: endstream endobj
  • 21. Image XObject 1614 0 obj <</Subtype/Image/Width 615/Height 978/ColorSpace/DeviceRGB /BitsPerComponent 8/Interpolate true/Length 5278 /Filter/JPXDecode>> stream … Image data … :: :: endstream endobj Identifies object as JPEG 2000 image
  • 22. ISO 19005-2 (PDF/A-2): JPEG 2000 support based on subset of JPEG 2000 Part 2 (JPX baseline) Only Part 1 of the standard (JP2) commonly used for archival applications!
  • 23. JP2 vs JPX JP2 JPX JPEG 2000 Part 1: Basic still image format JPEG 2000 Part 2: = JP2 + assorted advanced stuff …
  • 25. OS PDF viewers – JPEG 2000 libraries Ghostscript: OpenJPEG or JasPer Evince: OpenJPEG Mupdf: OpenJPEG Firefox PDF viewer: built-in decoder  None of these libraries support fragmented codestreams!
  • 26. Is it really a problem? Fragmented codestreams extremely rare But why is this feature even allowed in a long- term archival format? OS support of JPEG 2000 in general remains problematic
  • 27. #SCAPEProject http://www.scape-project.eu This work was partially supported by the SCAPE Project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137). Funding