The NDSA levels of digital preservation are useful in providing a high-level, at-a-glance overview of tiered guidance for planning for digital preservation. One of the most common requests received by the NDSA group working on this is that we provide more in-depth information on the issues discussed in each cell.
To that end, we are excited to start a new series of posts, set up to help you and your organization think through how to go about working your way through the cells on each level.
There are 20 cells in the five levels, so there much to discuss. We intend to work our way through each cell while expounding on the issues inherent in that level. We will define some terms, identify key considerations and point to some secondary resources. If you want an overall explanation of the levels, take a look at The NDSA Levels of Digital Preservation: An Explanation and Uses.
Let’s start with row one cell one, Protect Your Data: Storage and Geographic Location.
The Two Requirements of Row One Column One
There are only two requirements in the first cell, but there is actually a good bit of practical logic tucked away inside the reasoning for those two requirements.
Two complete copies that are not collocated
For starters you want to have more than one copy and you want to have those two copies in different places. The difference between having a single point of failure and two points of failure is huge. For someone working at a small house museum that has a set of digital recordings of oral history interviews this might be as simple as making a second copy of all of the recordings on an external hard drive and taking that drive home and tucking it away somewhere. If you only have one copy, you are one spilt cup of coffee, one dropped drive, or one massive power surge or fire away from having no copies. While you could meet this requirement literally by simply making any type of copy of your data and taking it home, it will become clear that this alone is not going to be a tenable solution for you to make it further up the levels in the long run. The point of the levels is to start somewhere and make progress.
With this said, it’s important to note that all storage media is not created equally. The difference in error rates between something like a flash drive on your key chain, to an enterprise hard disk or tape is gigantic. So gigantic in fact that from error rate alone, you would likely be better off only having one copy on a far better quality piece of media than having two copies on something like two cheap flash drives. Remember though, the hard error rate of the storage devices is not the only factor you should be worried about. In many cases, human error is likely to be the biggest factor that would result in data loss, particularly when you have a small (or no) system in place.
“Complete” copies are an important factor here. Defining “completeness” is something worth thinking through. For example, a “complete copy” may be defined in terms of the integrity of the digital file or files that make up your source and your target. At the most basic level, when you make copies you want to do a quick check to make sure that the file size or sizes in the copy are the same as the size of the original files. Ideally, you would run a fixity check, comparing for instance the MD5 hash value for all the first copies with the MD5 hash value of the second copies. The important point here is that “trying” to make a copy is not the same thing as actually having succeeded in making a copy. You are going to want to be sure you do at least a spot check to make sure that you really have created an accurate copy.
For data on heterogeneous media (optical discs, hard drives, etc.) get the content off the media and into your storage system
A recording artist ships a box full of CDs and hard disks to their label for production of their next release. A famous writer offers an archive her personal papers and includes two of her old laptops, a handful of 5.25 inch floppies, and a few keychain quality flash drives. An organization’s records management division is given a crate full of rewritable CDs from the accounting department. In each of these cases, a set of heterogeneous digital media have ended up on the doorstep of a steward often with little or no preliminary communications. Getting the bits off that media is a critical first step. None of these methods of storage are intended for long term; in many cases things like flash drives and rewritable CDs are not intended to function, even in optimal conditions, for more than a few years.
So, get the bits off their original media. But where exactly are you supposed to put them? The requirement in this cell suggests you should put them in your “storage system.” But what exactly is that supposed to mean? It’s intentionally vague in this chart in order to account for different types of organizations, resource levels and overall departmental goals. With that said the general idea is that you want to focus on good quality media (designed for longer rather than shorter life), for example “enterprise quality” spinning disk or magnetic tape (or some combination of the two), and a way of managing what you have. For the first cell here, the focus is on the quality of the media. However, as requirements move further along it is going to become increasingly important to be able to be able to check and validate your data. Thus easy ways to manage the data on all of your copies becomes a critical component of your storage strategy. For example, a library of “good” quality CDs could serve as a kind of storage system. However, managing all of those pieces of individual media would itself become a threat to maintaining access to that content. In addition, when you inevitably need to migrate forward to future media, the need to individually transfer everything off of that collection of CDs would become a significant bottleneck for being able to move to future media. In short, the design and architecture of your storage system is a whole other problem space, one not really directly covered by the NDSA Levels of Digital Preservation.
You’ve Got to Walk Before You Can Run: First Steps for Managing Born-Digital Content Ricky Erway, 2012
The NDSA Levels of Digital Preservation: An Explanation and Uses Megan Phillips, Jefferson Bailey, Andrea Goethals, Trevor Owens
How Long Will Digital Storage Media Last? Personal Digital Archiving Series from The Library of Congress