While many kinds of analog media age with a kind of dignity, taking on a patina of age and the term “vintage,” the same is generally not true of digital media. Of the range of digital media out there, the humble rewritable CD is likely one of the least loved and most rapidly aging and degrading out there. With that said, rewritable CDs and DVDs have been making their way into the collections of libraries, archives and museums for decades, and understanding how we should go about triaging them is a big deal. To that end, as a continuation of the National Digital Stewardship Alliance’s innovation working group’s insights interview series, I am excited to talk with John Passmore, Archives Manager at WNYC. John recently gave a talk about some work he is doing at WNYC to try and deal with old discs degrading on the shelves.
Trevor: Could you tell us a bit about some of what WNYC has on rewritable CDs? I think it helps make it more real for people if we get a sense of what’s on these discs?
John: There was a period of time, between approximately 2000 to 2007, when the station was recording master copies of broadcast recordings onto CD-Rs only. This was after most programs stopped using ¼ reel-to-reel tape and PCM F-1 Beta for backup, but before the station implemented its current born-digital production workflow.
Trevor: How did these end up on the shelves? I’m curious to hear a bit about the lifecycle of data inside WNYC. What role in the production process did these discs play?
John: These discs were created at the time of broadcast specifically for archival and access purposes. For live broadcasts, the master control room engineer would insert a blank disc into the stand-alone disc recorder and record the show. After the disc was finalized, the archives would receive it, label it, catalog it and put it in the vault. These CD-Rs became the master and only recording for many shows at the station during this period of time.
Trevor: In your talk (pdf) you showed an image of the complex lineage of various CD formats, could you talk us through that a bit? In particular, what is it about that lineage that is relevant to archivists and data managers?
John: The main point with that image was to show that for every kind of disc, there are different preservation considerations. That image is from Principles of Digital Audio, by Ken Pohlmann. In the book, Pohlmann describes what he calls the “complex interrelationships between members of the CD family.” Because the CD has such a remarkable range of applications, we as archivists need to be cognizant of those applications and to what degree those applications affect our preservation planning.
For example, are your discs commercially-pressed CDs or are they CD-Rs? Do your CDs contain data or do they contain audio files? Is the reflective layer on your discs gold or silver? What kind of dye did the manufacturer use in the recording layer? Does your institution have have a mixture of data/file formats, etc. CD-ROMS? CD-RWs? Super Audio CDs? CD-is?
Depending on the answers to these questions, your preservation plan may change. For example, DVDs are generally considered less stable than CDs, and certain dye layers have been found to be more stable than others. A sound collection assessment helps in planning and prioritization.
Trevor: Could you tell us a bit about the project you have taken on to migrate and transfer this information? What is your workflow like? What equipment are you using? What kind of staff experience and time is it taking?
John: We made a rough calculation that there were about 20,000 untransferred CD-Rs in the collection. And we wanted to find a cost-effective way to make the transfers happen that was fast, automated, and practical, while keeping in mind preservation best practices. We found an affordable option in the MFDigital Ripstation. It comes with some out-of-the-box software that allows the user to assign file names to ripped tracks (our CD-Rs have no embedded metadata), to concatenate audio on multi-track discs, and to set a threshold for rip-speed. We can also track rip duration and number of failed CDs per batch with some basic logging features.
There are some limitations with this solution, however. For example, we cannot manually set rip speeds. The Ripstation software also lacks the kind of robust error reporting and analytics/forensics tools that we’d like to see.
Luckily for us, the descriptive metadata for most of the discs (titles, date, description, etc) had previously been captured in our database. So we run a few automated scripts to extract technical metadata (Mediainfo XMLs) from the newly created digital files and import new instantiation metadata (PBCore XMLs) into our database. We have another series of scripts that transfer the ripped audio from the local drive on the Ripstation CPU to the station’s DAM/NAS/LTO solution.
For discs the Ripstation rejects, we use Plextor CD drives and Plextools Software to check correctable and uncorrectable errors. We then log the errors in a spreadsheet. We also attempt to re-rip the discs using Plextools, after an assessment of what caused the failure in the initial rip.
Time-wise, I think we can catalog and transfer a couple hundred discs a week. But that estimate doesn’t include troubleshooting problematic discs.
Skill-wise, it’s not incredibly complex. We have graduate school interns help run the machine, do some scripting, and document error rates.
Trevor: In migrating this data what have you learned thus far about these vintage CD-Rs? I would be particularly curious to know about their relative failure/success rates and any indicators you have come up with for assessing the health of a CD-R.
John: This is a tricky question. We haven’t gathered enough data to make any strong conclusions. But I would say that CD-Rs are something institutions should definitely be worrying about.
I’ve seen all sorts of issues with these discs, and I’m finding it hard to pin down exactly what we mean by ‘disc failure’. For example, a disc could successfully rip yet still have a pretty bad audio signal. In this case, the audio may not be suitable for rebroadcast, but it may still be useful for documentation purposes.
It is also hard to tell sometimes when/how/why the error occurred. It is something we are still trying to figure out. Did the error occur at the time of creation (hardware or human issues, a fingerprint on the discs, or Sharpie ink in the wrong place)? Or are errors the result of the dye layer fading away over time? Or is there some other factor that goes into disc failure (particular brands, particular dyes, storage conditions)? Obviously, our hope is that the tools we’ve chosen will eventually yield some important knowledge about these discs, but I would say that we are not there yet!
Trevor: It strikes me that it could be beneficial for the community if there were some standardized ways to share data like what you are collecting here. That is, as archives increasingly get into doing this work to migrate data they are also actually collecting a good bit of data on the kinds of discs that are and aren’t working. Do you have any thoughts on this? Is there a kind of technical metadata for these discs that you think could be useful for analysis in aggregate? If so, what kinds of data would you want people to start collecting and sharing.
John: Yes! That’s one of the goals. What kind of data can we collect that would help other institutions asses their collections? The plan for now is to collect error rates on the discs that seem to have failed. Then run the same error checks on successfully ripped discs and see what kind of pattern – if any – emerges.
Again, it’s really hard to tell right now what this kind of data collection will accomplish. But it’s possible that, as we transfer more CD-Rs, a story about their behavior will emerge. We’re lucky in one sense, because unlike many accelerated aging tests, our discs have ACTUALLY aged, so perhaps there’s a chance to get some interesting and uniquely useful information.
Also, the 20,000+ CD-Rs seem to be a pretty consistent sample. By and large, the station used the same brand CD-Rs for the period they were being created. The discs were created using more or less the same kind of professional hardware with the same workflows. And all of them have been stored in the same (theoretically excellent) environmental conditions, boxes, and jewel cases.
We are looking forward to the project progressing, and seeing what unfolds!