Comparing Formats for Still Image Digitizing: Part Two

The following is a guest post by Carl Fleischhauer, a Digital Initiatives Project Manager in NDIIPP.

DocCover01This is the second post (of two) on the recently posted comparison of selected digital file formats compiled by the Still Images Working Group within the Federal Agencies Digitization Guidelines Initiative.  In this post, I’ll offer some thoughts about JPEG 2000, since one motivation for the comparison project was to size up JPEG 2000 against tried-and-true TIFF.  There is also bit here about PNG.  Meanwhile, the first post introduced the general topic and offered some notes about TIFF.

During the last four or five years, various specialists in our national and international circle have taken note of JPEG 2000.  The Library has made extensive use of JPEG 2000 in its online access applications for maps and scanned newspaper pages.  These are both large-raster content forms that benefit from JPEG 2000′s capability to tile images and to handle scaling (in this context, support for zooming).  For the maps and newspapers, the archival master files are uncompressed TIFFs.  Sets of derivative JPEG 2000 files provide an underlying raster dataset that a server-based application zooms and tiles to meet the end-users request and then delivers to the browser as cropped-to-order “old” JPEG files.

The Library has also used JPEG 2000 encoding wrapped in MXF format (a SMPTE standard) as the archival master target format when reformatting videotapes.  The video content is for the most part protected by copyright and access is limited to the Library’s premises, where end-user delivery is provided by MPEG files derived from the MXF masters.  Regarding the JPEG 2000 component, this application uses single-tile imagery and (thus far) has not taken advantage of scalability features.

There is a lot to recommend about JPEG 2000.  Both the wrapper and the encodings are proper capital-S standards from the International Organization for Standardization and the International Electrotechnical Commission.  The family of JPEG 2000 standards includes three encodings, with the main core encoding understood to be free of patent issues.  One key JPEG 2000 compression process employs wavelet transforms to provide a very clean image, even in a lossy mode.  (JPEG 2000 can also be employed in a lossless mode.) The encoding includes a number of “resiliency” features that add a bit of error-protection absent in most other encodings. The JPEG 2000 wrapper provides a bit more help with color documentation than TIFF, and it has a “box” that can carry XML-encoded metadata.

Allesio Damato's illustration of a wavelet transform for JPEG 2000. From:

Allesio Damato’s illustration of a wavelet transform for JPEG 2000. From:

The encoding can be structured to accommodate user-defined tiling and scalability.  At the Library we depend upon a commercial server application to do the work but other organizations take advantage of the JPEG 2000 Interactive Protocol (JPIP, a separate ISO/IEC standard).  The Wikipedia article about JPIP reports use in medical imaging applications (“zoom in on the xray”).

In 2011, the Library and FADGI organized a JPEG 2000 Summit; the papers are available online.  The speakers included several Europeans who were enthusiastic adopters of JPEG 2000 as a master format.

So what’s not to like?  Since the 2011 summit, we have participated in a number of discussions of the topic, including the aforementioned exchange in the Digital Curation group, a January 2013 blog by my colleague Chris Adams with the provocative title Is JPEG-2000 a Preservation Risk?, and a discussion at the March 2014 meeting of the FADGI Still Image Working Group.  Our online format comparison was informed by what we learned in these exchanges.  Here are a few selected statements about JPEG 2000 from our comparison matrix:

  • Sustainability Factors: Adoption: Moderate-to-Wide Adoption (moderate adoption in cultural heritage community, but widely adopted in communities such as moving images. Negligible support in browsers and still cameras)
  • Sustainability Factors: Transparency: Acceptable. Compression is compensated for by resiliency elements, intended to mitigate low levels of transparency. However, the format offers many options (tiling, quality layers, progression order, more), and some users have found that “legal” variations may not interoperate from one application to another.
  • Cost Factors: Implementation Cost: Medium-High (for reference, other formats including TIFF come in as low)
  • Cost Factors: Cost of Software Tools: Medium-High (best toolsets available currently are proprietary tools. Open source tools are not yet mature.) (for reference, other formats including TIFF come in as low)
  • Cost Factors: Storage Cost: Low (for reference, some other formats including TIFF come in as high)
  • System Implementation Factors: Level of difficulty/complexity: Medium-high (for reference, other formats including TIFF come in as low)
  • System Implementation Factors: Availability of tools: Limited to Moderate Availability (not all tools support all features)
  • Settings and Capabilities: Support for Color Maintenance: Good (good but not perfect documentation of color space. Standards group working on these) (for reference, the TIFF statement for this factor is Good, caveat: to insert an ICC profile or declare certain color spaces, you must use an extended tag set.)

Is it time to switch from TIFF to JPEG 2000?  As a trial lawyer might say, it’s not an open-and-shut case.  At the March FADGI meeting, the trend of discussion was toward lining up some pilot projects, perhaps in both the Library of Congress and the Government Printing Office, where extensive image holdings add weight to the storage management factor.  It is also possible that some of our scanning projects–think old catalog cards–will provide good candidates for lossy compression, at which JPEG 2000 excels.  (JPEG 2000 can also be used in a lossless mode.)

Meanwhile, at the FADGI meeting, we also heard from other agencies that have received significant numbers of JPEG 2000 image files from digitization partners.  These need to be managed for the long term and no one suggested that the agencies transform them from their svelte JPEG 2000 selves to chubby TIFFs.  Perhaps we can work up some pilot projects before long and, respecting Chris Adams’s push, budget in some actions that will improve the available tools.

I’ll close by looping back to our varying levels of confidence about some of the formats, especially PNG, aka Portable Network Graphics.  As I wrote in the Digital Curation thread, several years of inattentiveness on my part led me to relegate PNG to the use case of access-via-browser.  This was due to the fact that the format was initially created as a reaction to the threats of licensing fees for GIF (another good-for-browsers format) in the 1990s.

Last month, however, I re-read the W3C specification for PNG and found lots of nifty elements, on paper at least.  For example, the standard includes features that support color management.  These include a group of metadata tags under the heading Colour Space Information that could document an image’s primary chromaticities and white point, image gamma, and carry an embedded ICC profile.  In addition, PNG offers lossless compression with excellent results.  Do any libraries or archives use PNG as a mastering file?  Have people found that tools support some of the features that caught my eye, like the ones that support color management?  Inquiring minds seek to know!

Part One of this series appeared on Wednesday, May 14, 2014.