The following is a guest post by the entire cohort of the NDSR Boston class of 2014-15.
The first ever Boston cohort of the National Digital Stewardship Residency kicked off in September, and the five residents have been busy drinking from the digital preservation firehose at our respective institutions. You can look forward to individual blog posts from each resident as this 9-month residency goes on, but we decided to start with a group post to outline each of our projects as they’ve developed so far. (To keep up with us on a more regular basis, keep an eye on our digital preservation test kitchen blog.)
Sam DeWitt – Tufts University
I will be at Tufts’ Tisch Library during my residency, looking at ways that the university might better understand the research data it produces. The National Science Foundation has required data management plans from grant-seekers for several years now and some scholarly journals have followed suit by mandating that researchers submit their data sets along with accepted work. These dictates play a significant role in the widespread movement.
Data sharing, as a concept, is particularly trendy right now (try adding ‘big data’ to the term ‘data sharing’ in a Google search) but the the practice is open to debate. Its advantages and disadvantages are articulated quite nicely here. As someone who works in the realm of information science, I generally believe research is meant to be shared and that concerns can be mitigated by policy. But that is easier said than done, as Christine Borgman so succinctly argues in “The Conundrum of Sharing Research Data”: “The challenges are to understand which data might be shared with whom, under what conditions, why, and to what effects. Answers to these questions will inform data policy and practice.”
I hope that in these few months I can gain a broader understanding of the data Tufts produces while I continue to examine the policies, practices and procedures that aid in their curation and dissemination.
Rebecca Fraimow – WGBH
My project is designed a little differently from the ones that my NDSR peers are undertaking; instead of tackling a workflow from the top down, I’m starting with the individual building blocks and working up. Over the course of my residency, my job is to embed myself into the different aspects of daily operations within the WGBH Media, Library and Archives department. Everything that I find myself banging my head into as I go along, I document and make part of the process for redesigning the overall workflow.
Since WGBH MLA is currently in the process of shifting over to a Fedora-based Hydra repository — a major shift from the previous combination of Filemaker databases and proprietary Artesia DAM — it’s the perfect time for the archives to take a serious look at reworking some legacy practices, as well as designing new processes and procedures for securing the longevity of a growing ingest stream that is still shifting from primarily object-based to almost entirely file-based.
At the end of the residency, I’ll be creating a webinar in order to share some best practices (or, at least, working practices) with the rest of the public broadcasting world. Many broadcasting organizations are struggling through archival workflow problems without having the benefit of WGBH’s strong archiving department. It’s exciting to know that the work I’m doing is going to have a wider outward-facing impact — after all, sharing knowledge is kind of what public broadcasting is all about.
Joey Heinen – Harvard University
As has been famously outlined by the Library of Congress, digital formats are just as susceptible to obsolescence as analog formats due to any number of factors. At Harvard Library, my host for the NDSR, we are grappling with formats migration frameworks at a broad level though looking to implement a plan for three specific, now-obsolete formats — Kodak PhotoCD, RealAudio and SMIL Playlists. So far my work has involved an examination of the biggest challenges for each format.
For example, Kodak PhotoCD incorporates a form of chroma subsampling (Photo YCC) based off of the Rec. 709 standard for digital video rather than the various RGB or CIE profiles more typical for still images. Photo YCC captures color information that is beyond what is perceptible to the human eye and is well beyond the confines of color profiles such as RGB (an example of format attributes that drive the migration process so as not to lose fundamental content and information from the original).
Other challenges that impact a project such as this are managing the human components (stakeholder roles and arriving upon shared conclusions about the format’s most noteworthy characteristics) as well as ensuring that existing tools for converting, validating and characterizing are correctly managing and reporting on the format (I explored some of these issues here). A bibliography (PDF) that I compiled is guiding this process, the contents of which has allowed me to approach the systems at Harvard in order to find the right partners and technological avenues for developing a framework. Look for more updates on the NDSR-Boston website (as well as my more substantive project update on “The Signal” in April 2015).
Jen LaBarbera – Northeastern University
My residency is at Northeastern University’s Archives and Special Collections, though as with a lot of digital preservation projects and/or programs, my work spans a number of other departments — library technology services, IT, Digital Scholarship Group and metadata management.
My project at Northeastern relies heavily on the new iteration of Northeastern’s Fedora-based digital repository (DRS), which is currently in its soft-launch phase and is set to roll out in a more public way in early 2015. My projects at Northeastern are best summed up by the following three goals: 1) create a workflow for ingesting recently born-digital content to the new DRS, 2) create a workflow for ingesting legacy born-digital (obsolete format) content to the new DRS, and 3) help Northeastern Libraries develop a digital preservation plan.
I’m starting with the first goal, ingesting recently born-digital content. As a test case to help us create a more general workflow, we’re working on ingesting the content of the Our Marathon archive. Our Marathon is a digital archive created as a digital humanities project following the bombing at the 2013 Boston Marathon. The goal is to transfer all materials (in a wide variety of formats) from their current states/platforms (Omeka, external hard drives, Google Drive, local server) to the new DRS. I’ve spent the first part of this residency drinking in all the information I can about the DRS, digital humanities projects (in general and at Northeastern), and wrapping my brain around these projects; now, the real fun begins!
Tricia Patterson – MIT Libraries
My residency is within MIT’s Lewis Music Library, a subject-specific library at MIT that is much-loved by students, faculty, and alumni. They are currently looking at digitizing and facilitating access to some of their analog audio special collections of MIT music performances, which has also catalyzed a need to think about their digital preservation. The “Music at MIT” digital audio project was developed in order to inventory, digitize, preserve, and facilitate access to audio content in their collections. And since audio content is prevalent throughout MIT collections, the “Making Music Last” initiative was designed to extend the work of the “Music at MIT” digital audio project and develop an optimal, detailed digital preservation workflow – which is where I came in!
Through the completion of a gap analysis of the existing workflow, a broad review of other fields’ workflow methodologies, and collaborations with stakeholders across the board, our team is working on creating a high and low-level life cycle workflow, calling out a digital audio use case, and evaluating suitable options for an access platform. This comprehensive workflow will contribute to the overall institutional knowledge instead of limiting important information to one stakeholder and clarify roles between individuals throughout the process, improving engagement and communication. Finally, mapping out the work process enhances our understanding of requirements for tools – such as Archivematica or BitCurator – that should be adopted and incorporated with a high degree of confidence for success. As the process moves from design to implementation and testing, the detailed workflow also ensures reliability and repeatable quality in our processes. It’s been a highly collaborative and educational process so far – stay tuned for how it pans out!