At the Museum is an interview series highlighting the variety of digital collections in museums and the interesting people working to create and preserve these collections. For this installment I interviewed Ellice Engdahl, Digital Collections & Content Manager, and Brian Wilson, Digital Access and Preservation Archivist, at The Henry Ford in Dearborn, Michigan.
Sue: Tell us about your background and how you ended up in this role.
Ellice: My professional experience prior to The Henry Ford has been in the for-profit publishing industry, in a number of different roles. I started in that field in 1998, working on print books, CD-ROMs and a then brand-new web product providing book recommendations to readers based on other books they’d read. I moved on to work on my firm’s new eBooks program, converting print books (ours as well as other publishers’) into what was then Open eBook XML (now ePub). I also was a project manager leading Agile software development teams (I got my PMP project management certification in 2007), and my last role there was as a technical content implementation manager. I started at The Henry Ford in mid-2011.
Sue: Give us some background on your current position at The Henry Ford.
Ellice: I’m part of our Digital and Emerging Media Department, and act as a program manager for our digitization efforts, bringing together a cross-functional team to prioritize and manage ongoing collections digitization work. I also act as collections content representative on new and existing digital products with internal staff and external agencies and vendors. My role has expanded slightly recently, so I’m starting to get involved with how we aggregate digitized collections content to tell stories on the web.
Sue: Tell us a bit about your online collections area. Could you shed some light on how you are approaching the presentation of special, individual items within the context of large-scale digitization efforts?
Ellice: The Henry Ford has a large collection and an even larger archive–we estimate we have about one million objects, plus 25 million items in our archives. Though we have pockets of “digitized” collections content dating back to the 1990s (I’ve heard Günter Waibel of the Smithsonian call these “random acts of digitization”), digitization as a consistent, standardized process and as a part of our everyday work here began around 2011, with 300 objects online. Now we’re at about 26,000 online, all imaged and cataloged with at least minimal metadata (generally to CCO standards).
I think the challenge for an institution with massive amounts of material, ranging from the size of buttons to planes, trains and automobiles, is to find a balance between making our collections accessible at a high-level, with less detail, and then finding ways to highlight the gems of the collection and the major stories we as an institution tell. We haven’t yet quite solved this problem. One of our curators calls our collection “the bottomless pit of wonderfulness,” which really sums up the blessing and the curse of having so much material to work with. Even deciding what we digitize first, with the amount of material so greatly outweighing the number of digitization staff, can be a challenge, let alone the amount of time that you spend on each individual item.
Sue: What advice would you have to offer others at similar institutions?
Ellice: Don’t get overwhelmed. It is possible, perhaps just possible, that the work of digitizing museum collections tends to attract Type A personalities who really want everything to be perfect. The nice thing to remember is that even getting some information about your collections online, particularly for objects that are in storage or otherwise publicly inaccessible, makes information available to scholars and the general public in a way that wouldn’t have been possible 25 years ago.
Also, apply the 80-20 rule, along with cost-benefit analysis, constantly–80% of our impact clearly comes from 20% of our effort. We run into this all the time when attempting to digitize a particularly large or complicated object or group of objects. We try to see if there are creative ways to get the work done, or if it could become the basis for a grant application, or if we have existing assets (for example, old black and white photos) that we could use instead of new photography. Sometimes we defer specific objects or parts of the collection because tackling them at that moment would be so difficult. Fortunately for us, we always have a new part of the bottomless pit of wonderfulness to tackle.
Sue: What do you see as the biggest challenge in presenting and preserving these digital items?
Ellice: I think our biggest challenge in presentation right now is how we go beyond simply a searchable database of objects connected only by metadata to tell integrated stories, similar to those a presenter here might tell you if you come to visit, and yet still allow the user to pursue threads from the story they are particularly interested in. We are working on this, but haven’t yet licked it.
Preservation and new technologies are also a major issue. For example, we have some 360-degree files, which allow users to navigate around the inside of vehicles, that are in Flash format. We’re currently considering whether our master file format for images should remain TIFF or should switch to JPEG2000 or something else. It’s easy to create a backlog of data in a format that becomes obsolete, unless you’re really paying attention.
Sue: Are you currently working with any “born digital” materials in your collections, and what are your future plans for these materials?
Ellice: We do have a major collection of born-digital material that we have been adding to our collections website and to Flickr: the Dave Friedman Collection (pdf). These are automobile racing negatives that were scanned at high resolution by the original photographer and delivered to us as digital files. Since these came to us as well-organized digital image files, we are able to make these accessible much more quickly than their physical counterparts. We expect to see much more born-digital material in the future, however, and anticipate the level of organization and preservation when something reaches us will vary, so this will bring additional challenges. For example, how do you make an Excel or PowerPoint file accessible when the original software that created it is obsolete? How do you retrieve material on a 3.5″ disk from a circa 1990 Smith-Corona word processor?
Sue: Would you say your institution has adopted a system for preservation of digital objects or records?
Ellice: We have two big categories to consider on this front: 1) what we do with the material we’re newly digitizing going forward, and 2) how we clean up the backlog of those “random acts of digitization.”
For the first category, our newly created images and metadata, we’re creating master TIFF files for all images and ingesting those into a backed-up preservation server. Access to this server is fairly limited, and we’re moving to even more restrictive permissions. We’re also in the planning stages to create checksum data for our master image files. For most public uses, we utilize a JPG derivative of the original TIFF file. Our collections object metadata is stored in the EMu and is also backed up locally nightly and weekly to tape.
The second category is much more difficult. We have pockets of institutional data (including digital collections data) on non-backed up physical media of varying age and obsolescence. We don’t really have the people, server space, or organizational plan to simply collect and dump these into top tier storage, but we are trying to move these pockets off removable media to space where they are at least backed up. Figuring out what they are, where they came from, whether any description exists for these digital files, and if so, where, is a huge effort. Right now, we’re picking through these as we find them and prioritizing our efforts based on institutional strategic goals.
Brian: Our digital preservation efforts over the last 3 years or so at The Henry Ford have been fairly basic and focused primarily on improving the storage of output from our large-scale digitization effort, which Ellice describes above. We’ve also quietly, and at times not so quietly, worked to raise awareness for the need for, and opportunities afforded by, digital preservation.
Sue: How are you incorporating the storage of these large-scale digital images into your workflow?
Brian: We have replaced the use of distributed storage devices and removable media with backed-up, network spinning disk storage for capturing the 1-1.5TB of TIFF master files we’re creating each year. To use the new storage space effectively we’ve had to create workflows and procedures that describe data locations and storage structure, file transfer, naming conventions and so forth. Our forecasting of storage use has also been improved from less than a week to about 6 months in order to aid ITS in planning for and obtaining additional space. By the end of this summer we plan to fully implement a new electronic staging area, which will provide a place to deposit material intended for preservation as well as workspace for archival processing. The staging area will then allow for stricter permissions to be placed on our preservation storage and more effective use of checksums.
Sue: What thoughts do you each have on the need for digital preservation in general?
Ellice: Digital preservation can be a tough sell to folks not intimate with digital content. The front-end of your digital collections is exciting and vibrant and beautiful, but many people don’t think about the back-end until they need a file and it’s missing or corrupted. It has taken some time to get momentum behind digital preservation at The Henry Ford, a sea change caused in large part by Brian’s efforts; he is passionate about preservation and works closely with our IT staff to move us forward.
Brian: Along with the technology and infrastructure work we have also been making efforts to raise awareness of the need for digital preservation. Two years ago we drafted a digital preservation policy that, while still not formally approved, has been used to guide decisions and to provide support in grant applications. Working with faculty at both Wayne State University School of Information and Library Science and the University of Michigan School of Information we have hosted several student interns who have made great contributions to our preservation policy, efforts at dealing with moving image and audio materials and implementation of the staging area I mentioned previously. And there’s been a good amount of just speaking up during meetings to say, “Hey, don’t forget about preserving this electronic data you’re talking about!” The capture of old micro-websites before they’re taken offline, and the collection and storage of facilities images from staff are examples of a couple of these “oh-by-the-way” type of issues.
For what’s next, we want to complete implementation of the staging area and then take a hard look at transferring to network storage the legacy digital moving image and audio files that we’ve produced over the years and that still reside on portable hard drives and removable disk media. And of course, continue our awareness and education efforts. To paraphrase one of my grad school mentors, “It’s not a battle, it’s a campaign.”