Additional preservation assurance with DPN

CDL is a founding member of the Digital Preservation Network (DPN), a coalition of over 50 academic libraries, foundations, and non-profit memory institutions dedicated to the long-term preservation of the scholarly and cultural record.  UCLA and UCSD are also DPN members.  DPN supports a high level of preservation assurance through widespread replication of digital assets across a geographically-dispersed network of five technically and administratively heterogeneous repositories.  DPN membership agreements also incorporate language (a “quitclaim”) that ensures continuity of preservation management in the event a member organization cannot or chooses not to continue to exercise stewardship responsibility for material previously contributed to the network.  As a benefit of membership, CDL has the opportunity to contribute up to 5 TB of content to DPN annually at no additional cost.

In late 2015 the UC Libraries Advisory Structure (UCLAS) Direction and Oversight Committee (DOC) formed a DPN allocation project team (DAPT) to investigate the question of how best to take advantage of this DPN capacity by UC members.  The DAPT recommended that CDL’s 5 TB allotment should be used as “a common resource for systemwide benefit.”  CDL determined that the following collection groups, drawn from content managed in UC3’s Merritt repository, meet that criterion:

  • Dash – Open datasets from UCB, UCD, UCI, UCM, UCR, UCSC, UCSF, and UCOP: 140 datasets, 11,192 files, 266 GB
  • DataONE/ONEShare – Open datasets from outside UC: 242 datasets, 10,048 files, 12 GB
  • Digital Special Collections
    • California census data: 30 datasets, 6,100 files, 4 GB
    • LSTA collections – archival assets from 94 California public libraries, archives, historical societies, and other local memory institutions: 31,469 archival assets, 943,711 files, 935 GB
    • Online Archive of California (OAC): 306,273 archival assets, 4.8 million files, 144 GB
    • McGraw-Hill eBooks: 289 eBooks, 6,069 files, 7 GB
    • eScholarship Editions: 1,833 publications, 197,790 files, 9 GB
    • Mark Twain Editions: 2,946 publications, 38,508 files, 223 MB
  • eScholarship – UC open access publications: 138,562 publications, 6.7 million files, 698 GB
  • ETDs – Electronic theses and dissertations from UCB, UCI, UCLA, UCM, UCR, UCSB, UCSC, UCSD, and UCSF: 30,207 ETDs, 380,751 files, 225 GB
  • UC3
    • iPRES 2009 Conference proceedings: 46 papers, 1,602 files, 14 GB
    • UCLA modular digital courses: 1 course, 2,652 files, 234 MB
    • WAS – Archived copy of UC3’s deprecated Web Archiving Service: 4 objects, 59 files, 376 GB

All told, over 519,000 digital resources, 13.6 million files, and 3.3 TB have been successfully transferred to DPN, which maintains three independent external replicas, hosted across the Academic Preservation Trust (APT), HathiTrust, Texas Digital Library (TDL), and UCSD, in addition to the replication internal to the Merritt repository at the San Diego Supercomputer Center (SDSC) and the Amazon AWS S3 and Glacier storage clouds.  (As impressive as these numbers sound, the DPN subset constitutes only about 19% by number and 4% by size of the full Merritt corpus.)

Due to a flurry of deposits by DPN members at the end of 2017, submission processing took longer than expected, extending into February 2018.  To avoid a similar rush this year, the deposit of the 2018 Merritt material will begin earlier, with planning starting in September.