From time to time, co-chairs of the National Digital Stewardship Alliance Arts and Humanities Content Working Group will bring you guest posts addressing the future of research and development for digital cultural heritage as a follow-up to a dynamic forum held at the 2014 Digital Preservation Conference.
The following is a guest post from Meg Phillips, External Affairs Liaison, National Archives and Records Administration. Opinions expressed are those of the author and do not necessarily represent positions of the National Archives and Records Administration.
Digital humanists and digital historians are employing research methods that most of us did not anticipate when we were learning to be archivists. Do new types of research mean archivists should re-examine the way we learned to do appraisal?
The new types of researchers are experimenting with methods beyond the scholarly tradition of “close reading.” When paper archives were the only game in town, close reading was all a researcher could do – it’s what we generally mean by “reading.” Researchers studied individual records, extracting meaning and context from the information contained in each document. Now, however, digital humanists are using born-digital or digitized collections to explore the benefits of computational analysis techniques, or “distant reading.” They are using computer programs to analyze patterns and find meaning in entire corpora of records without a human ever reading any individual record at all.
I have been interested in digital scholarship and its implications for archives for a while, but I hadn’t heard the phrase “distant reading” until seeing Franco Moretti’s book “Distant Reading” reviewed earlier this year. (See “What is Distant Reading?” in the New York Times and “In Praise of Overstating the Case: A review of Franco Moretti, Distant Reading” in Digital Humanities Quarterly for a taste of the debate over the book.) The phrase stuck with me as provocative shorthand for a new way of using records, and I started thinking about what distant reading might mean for archival appraisal.
Our traditions of archival appraisal are based on locating records that reward close reading. A series appraised as permanent contains individual records that contain historically valuable information. Both appraisal itself and the culling that happens during transfer or processing focus on removing records that do not contain permanently valuable information.
Now, however, it is possible to ask and answer entirely new kinds of questions with born-digital or digitized records. What did the network of influence in an organization look like? How did communication flow? Was the chief executive interacting with a particular vendor unusually often? When did a new concept or term first appear and how quickly did use of the new term spread? How did a disease spread through a community? Not only is it possible, but early adopters are now teaching these research methods to a new generation of students. For example, Professor Matthew Connelly is teaching a seminar at the London School of Economics called Hacking the Archives. The course challenges students of international history to explore the new kinds of questions computational research allows. These are questions whose answers emerge not from deep reading of individual records but from analysis of patterns in large bodies of records.
The interesting thing about these questions is that the answers may rely on the presence of records that would clearly be temporary if judged on their individual merits. Consider email messages like “Really sick today – not coming in” or a message from the executive of a regulated company saying “Want to meet for lunch?” to a government policymaker. In the aggregate, the patterns of these messages may paint a picture of disease spread or the inner workings of access and influence in government. Those are exactly the kinds of messages traditional archival practice would try to cull. In these cases, appraising an entire corpus of records as permanent would support distant reading much better. The informational value of the whole corpus cannot be captured by selecting just the records with individual value.
If we adjusted practice to support more distant reading, archivists would still do appraisal, deciding what is worth permanent preservation. We would just be doing it at a different level of granularity – appraising the research value of an entire email system, SharePoint site or social media account, for example.
Incidentally, on a practical level this level of appraisal might also lead to disposition instructions that are easier for creating offices to carry out.
Figuring out how to do appraisal to support both distant reading and close reading would be an excellent project for the archival and digital preservation fields. What questions would we want to answer? We could start with some questions like these:
- How many researchers are actually engaged in distant reading? What fields do they work in? Are their numbers increasing?
- Do they want to apply computational techniques to archival materials, for example Federal records in the National Archives, or in any other environment? Perhaps they are getting their source material somewhere else, bypassing archives.
- To what extent do their research methods rely on having a complete set of the records created rather than a subset of the most permanently valuable records?
- Do current definitions of a record and current recordkeeping regulations support a change to appraisal of entire corpora of records?
- How would we know which corpora of records were most useful to researchers?
- Is the benefit of distant reading worth the cost and risk of retaining more material that could have personal privacy or other protected content?
- Is there a meaningful difference between trying to support computational research and actually just keeping everything? (Perhaps this whole discussion is just the modern version of the old tension between historians who want to save everything and archivists who are trying to put their resources toward the most important materials.)
Staff at the National Archives and other institutions are starting to create opportunities for archivists to discuss questions like these. Josh Sternfeld of NEH, Jordan Steele of Johns Hopkins and Paul Wester and I from NARA will be holding a panel discussion of these issues at the Fall 2014 Mid Atlantic Regional Archives Conference meeting in Baltimore, for example. Paul and I will be also be speaking with Matthew Connelly and others on an American Historical Association panel at the 2015 annual meeting in New York City, “Are We Losing History? Capturing Archival Records for a New ERA of Research.”
However, we need to create even more opportunities for archivists to explore these issues with digital humanists. A forum that pulled together digital researchers, archivists, librarians and technologists could be a great opportunity for us all to learn from each other. Such an event could also spread the word about the exciting new things that can be done with digital primary sources and the rich collections of digital resources that are now available in archives and libraries.
Of course, we can also blog about the issues and hope that the community leaps into the fray!
In that spirit, do you think archival appraisal needs to change, and if so, how?