New NDSA Report: The Benefits and Risks of the PDF/A-3 File Format for Archival Institutions

We’re lucky in the digital stewardship community that our challenges tend to be non life-threatening. Still, when we get fired up about something there is guaranteed to be spirited debate and passionate advocacy on all sides.

Such was the case with the release of the PDF/A-3 file format specification in October 2012. We wrote about it on the Signal shortly after and it was immediately a hot topic. To the barricades!

"Filing" from user TheBeachSaint on <a href="">Flickr</a>

“Filing” from user TheBeachSaint on Flickr

The PDF/A family of international standards defines a file format based on the Portable Document Format which provides a mechanism for representing electronic documents in a manner that preserves their static visual appearance over time, independent of the tools and systems used for creating, storing or rendering the files. “Static visual appearance” ultimately means that conforming PDF/A files are complete in themselves and use no external references or non-PDF data.

The first version of the PDF/A specification (PDF/A-1) was published in September 2005 and has been updated at regular intervals since. The A-3 version of the specification was received with some concern in the stewardship community as it adds a single and highly significant feature to its predecessors. The PDF/A-2 specification permitted the embedding of other files as long as the embedded files were valid PDF/A files. A-3 permits the embedding of files of any format.

While a PDF/A-3 file’s primary document is still intended to be robust against preservation risks over the very long term, PDF/A-3 does not require that the embedded files be considered archival content, creating a series of potential technical and policy challenges for preserving institutions.

The National Digital Stewardship Alliance Standards and Practices Working Group clearly recognized these challenges and felt the community would benefit from an examination of the format and what it means for collecting institutions.

Which leads to today’s release of the NDSA report on “The Benefits and Risks of the PDF/A-3 File Format for Archival Institutions” (pdf).

The report takes a measured look at the costs and benefits of the widespread use of the PDF/A-3 format, especially as it effects content arriving in collecting institutions. It provides background on the technical development of the specification, identifies specific scenarios under which the format might be used and suggests policy prescriptions for collecting institutions to consider.

For example, the report suggests that for memory institutions, the acceptance of embedded files in PDF/A documents would depend on very specific protocols between depositors and archival repositories that clarify acceptable embedded formats and define workflows that guarantee that the relationship between the PDF document and any embedded files is fully understood by the archival institution.

Additionally, the report notes that the complexity of the PDF format and the wide variance in PDF rendering implementations and creating applications suggests that PDF/A-3 may be appropriate for use in controlled workflows, but may not be an appropriate choice as a general-purpose bundling format.

Certainly, the introduction of such a problematic new feature in the latest version of the PDF/A family should press the community of memory institutions into a more strategic, active, and vocal role in the standards development process, and in the PDF/A process specifically.

This report is the latest in a series of NDSA publications and activities that provide insight on a range of digital stewardship issues. We welcome your comments on the PDF/A-3 report in addition to suggestions for areas where the NDSA can continue to provide benefit to the digital stewardship community.