Digital Preservation at CDL: Where We Are Headed

At CDL, our digital preservation strategy hinges on offering trusted, reliable, low-cost preservation services to the University of California. Over the past year, we’ve been busy moving forward with these values in mind. 

Along the way, the Merritt team has achieved its CoreTrust Seal certification for our preservation repository, evaluated two new cloud storage solutions, established a more thorough documentation portal, and embarked on a major data migration. At the same time, we’ve seen two of our colleagues move on to new endeavors, while two new ones – including myself, and our new lead developer – have been welcomed to the UC3 team.

Over the past months, we have also been rethinking and evaluating key aspects of our approach to digital preservation. Some of these aspects include:

  • As an organization, we would like to introduce increased geographic separation across digital object copies, to ensure that our copies reside in locations with different disaster threats.
  • Along the same lines, we would like to introduce additional storage diversification across object copies, also in order to mitigate risk.
  • Though cloud storage is an integral part of Merritt, its use has prevented our team from being able to effectively lower the cost of preservation for UC campuses and affiliated organizations. We would like to find ways to dramatically lower our costs.
  • We have not been satisfied with some of the technical limitations that come along with one of our storage solutions, particularly with regard to fixity checking, to confirm data in the repository remains unchanged, and unaffected by potential bit rot.

The first two aspects in particular relate directly to the National Digital Stewardship Alliance’s Levels of Preservation, a set of guidelines that have been well and widely received by the larger digital preservation community.

Moving forward, by the end of January 2020, when our team plans to complete the final steps of its data migration, we will have relocated the primary copy of the majority of Merritt’s objects and collections from OpenStack Swift storage at the San Diego Supercomputer Center (SDSC) to a new storage technology offered by SDSC. Known as Qumulo object storage, it is roughly one quarter of the cost of Swift, and will allow us to critically reduce the dollar amount per/TB that we pass on to campuses as a recharge.

For any individual digital object, use of Qumulo is a significant first step toward realizing our new preservation approach. Our approach, as we’ve agreed to pursue at UC3, also entails another object copy in a geographically-separate location. To this end, we’re in the process of entering into an agreement with Wasabi Hot Cloud Storage. Wasabi’s US-East-2 data center, part of the Iron Mountain facility in Manassas, Virginia, will serve as the location for this third copy. As with Qumulo, Wasabi storage is online and allows for fixity checking, without additional request or retrieval costs, which will in turn lend us the ability to run checks on two of three copies at any time.

The total cost for these two, auditable copies, plus a third, nearline copy stored in Amazon Glacier will still amount to less than a quarter of our current recharge per/TB. This new combination allows us to implement a preservation approach where there are three individual copies of each object, two of which are in geographically-separate regions, and one of which is stored in a less volatile, nearline service. And with such a significant cost reduction for campuses and affiliated organizations, we hope to break down the monetary barrier for libraries, cultural heritage institutions, and other organizations across the University of California.

By midsummer 2020, our plan is to have completed the implementation of our new storage configuration, allowing us to introduce campuses to a much more inexpensive option for digital preservation. 

We’re excited by this team goal, and look forward to engaging with libraries and organizations across UC to help them realize their own goals surrounding digital preservation.