I’ve talked about Matthew Kirschenbaum’s work in a range of posts on digital objects here on The Signal. It seemed like it would be valuable to delve deeper into some of those discussions here in an interview.
If you are unfamiliar, Matthew G. Kirschenbaum is Associate Professor in the Department of English at the University of Maryland and Associate Director of the Maryland Institute for Technology in the Humanities. Much of his work now focuses on the critical and scholarly implications of the shift to born-digital textual and cultural production. He is the author of Mechanisms: New Media and the Forensic Imagination (MIT Press 2008). He was also a co-investigator on the NDIIPP funded Preserving Virtual Worlds project, a co-author of Digital Forensics and Born-Digital Content in Cultural Heritage Collections, and oversees work on the Deena Larsen Collection at MITH, a personal archive of hardware and software furnishing a cross-section of the electronic literature community during its key formative years, roughly 1985-1995. Currently he is a co-investigator of the BitCurator project and member of the faculty at the University of Virginia’s Rare Book School, where he co-teaches an annual course on born-digital materials.
Trevor: You have a Ph.D. in English literature and work in an English department. What are you doing so heavily involved in the cultural heritage digital archives and digital forensics community?
Matthew: When I teach at Rare Book School every summer, I introduce myself to our class by saying that I’m the English professor who instructs archivists and librarians about computer media. On the one hand, that makes me look very confused. On the other, though, it’s really just a linear and direct outgrowth of my scholarly training at the University of Virginia, an institution renowned for its tradition of attentiveness to texts in all their physical and material incarnations. That perspective was foundational to my first book, Mechanisms. So behind the glib throwaway line, I consider myself a textual scholar who is working with (very) recent cultural materials—primarily literary—many of which, by nature of their historical situation, are born-digital since so many writers are composing with computers just like the rest of us. Digital forensics, specifically, seems to me the natural companion of the rigorous evidence-based methodologies that have emerged in descriptive and analytical bibliography. My current book project, called Track Changes, is a literary history of word processing, and to write it I’m relying on both my training as a literary scholar and the knowledge I’ve gained from working with legacy media and file formats. So again, I simply see myself as following those disciplines using the tools and methods appropriate to the contemporary moment. That I’ve had the opportunity to teach and learn from so many practicing archivists is one of the great professional joys and privileges of my career.
Trevor: Within the digital preservation space there are some strong proponents of normalization of file formats and some equally strong proponents who eschew normalization in favor of staying true to the files one is given. When asked by a librarian or archivist for your perspective on this how would you respond? From your perspective what are the trade offs.
Matthew: The most obvious trade-off is of course resources. Normalization is attractive because it lends itself to batch processing and creates the foundation for a long-term preservation strategy. Institutions always have limitations on their resources and capabilities, and so normalization, which I take to mean migrating data away from legacy formats and media, is going to form the basis of the preservation strategy in many instances. Yet as a scholar who is committed to what we term the “materiality” of all artifacts and media, even supposedly virtual ones, I want to see as much of the original context as possible. This is an easier argument to make in some domains than in others. Games are an obvious example of where “normalization” would defeat the purpose of preservation, thus the widespread use of emulation in those situations. Sometimes we think that documents like word processing files or email offer fewer trade-offs with regard to normalization, since what people really want to see there is presumably the content, not the context. But you can never really predict what your users are going to want. To take an example from my current work on the literary history of word processing: Terrence McNally, in his “papers” at the Ransom Center, has a WordPerfect file wherein he comments about his discomfiture watching his text scroll off the edge of the screen into digital limbo. That’s an instance where a researcher wants to know what the original software was, how many lines and characters it permitted on the screen, what the size of the screen was, and so forth. In fact, I can tell you that the writers who were early adopters often obsessed over such details. The difference between a 40- and an 80-character display could be decisive in a decision to purchase.
The most dramatic example of what’s achievable in this regard is likely still the remarkable emulation work done at Emory for Salman Rushdie’s personal computers. Not only can users look at his wallpaper and other incidentals of the system, they can see which files were stored together in which folders, how software was configured, and so on—all details analogous to the sorts of things researchers find compelling in the physical world. Yet one does not have to go to such lengths to preserve material context. Obtaining and then retaining a bitstream image of the original media will allow future users to reconstruct the original context in as much detail as they like. Such a measure is logically prior to normalization, and relatively easy to implement.
Trevor: The cultural context and physical and digital technologies of computing have evolved and continue to evolve so fast. To what extent do you think we can develop practices and principles that are going to work with materials over the long term?
Matthew: Certainly with regard to legacy media, specifically disk-based magnetic media, I consider myself an optimist. I don’t, for example, think floppy disks are deteriorating at quite the rate claimed by some of our colleagues. I also think that in one hundred years we will know what ASCII and HTML and C++ are, along with Word and Excel, if for no other reason than those things are well documented on acid-free paper (walk into any Barnes and Noble and browse the computer section). And I’ve often said that “love will find a way”: meaning that when committed people care intensely, sometimes even irrationally, about a particular object or artifact, they are often—very often—able to find ways to recover and conserve it. My own best example here is the work around William Gibson’s poem “Agrippa,” an electronic artifact famously designed to disappear, in which I was heavily involved. Ben Fino-Radin has demonstrated the same principle with his work on The Thing BBS. Jason Scott demonstrates it seemingly on a daily basis, but see, for example, his collaboration with Jordan Mechner on recovering the original source code for Prince of Persia. Of course each of these situations which I cite as exemplary was fraught with perils and contingencies which could have easily rendered them fruitless. But I tend not to like the analogy to, say, early cinema (80% of the films made before 1930 are lost) because we are all so exquisitely aware of both the perils and importance of our born-digital heritage. NDIIPP and the NDSA certainly testify to that awareness and commitment in the US.
By the same token, the rise of the so-called cloud presents obstacles that are not primarily technical—for the cloud, as we all know, is merely a hard drive in some other computer—but rather legal and contractual. Likewise, the increasing tendency towards preemptive data encryption—practices which will surely become even more commonplace in the wake of recent revelations—threatens to make archival preservation of personal digital content all but unthinkable for entities who lack the resources of the militarized surveillance state. I know of very little that archivists can do in either of these instances other than to educate and advocate (and agitate). They are societal issues and will be addressed through collective action, not technical innovation.
Trevor: How has your thinking about the role of digital forensics tools developed since the publication of Digital Forensics and Born-Digital Content in Cultural Heritage Collections? Are there any areas where your thinking has evolved or changed? If so, please describe.
Matthew: I admit that when I first began learning about digital forensics I was drawn to the sexy CSI scenarios: recovering deleted files and fragments of files, restoring lost manuscripts, and so on. I still think there’s going to be some gee-whiz stuff in that area, something akin to the wizardry displayed by the team who worked on the 1000 year-old Archimedes Palimpsest for example, but I’ve come to appreciate as well the far less glamorous but wholly indispensable archival functions digital forensics can assist with, such as appraisal, arrangement and description, and the ongoing maintenance of fixity and integrity. I still enjoy quoting the historian R. J. Morris, who back in the 1990s opined: “Within the next ten years, a small and elite band of e-paleographers will emerge who will recover data signal by signal.” (And how could I not quote that for this venue!) But it’s also true that we have yet to see any really compelling examples of revisions, variants, alternate readings recovered in this way. The best demonstrations that I know come from my former Maryland colleague, Doug Reside in his work on Jonathan Larson’s lyrics and compositions for RENT, originally done on a Mac. By contrast, I was disappointed to learn that for both Ralph Ellison’s Three Days Before the Shooting and David Foster Wallace’s The Pale King, two examples of the very highest literary significance where authors left behind relevant born-digital materials, the scholars who prepared the posthumous editions worked from hard copy transcripts of the digital files, not the original disks or bitstream disk images.
Trevor: What do you see as the biggest hurdles archives face in making born digital materials part of their primary operations? Is this largely about a need for tools, frameworks, education and training, examples of how scholars are using born digital materials, the need for new ways to think about materials or other factors?
Matthew: The single biggest hurdle archives face for these materials are users. For the other things you name, I think the field is increasingly healthy. Oh, certainly work remains to be done, but just look at how far we’ve come in just the last five years: there are instructional opportunities available through SAA, RBS and others. There’s a growing pool of expertise amongst both professional practitioners and iSchool faculty. The technical landscape has taken shape through funded projects, meetings, social networks, and a growing journal literature. But even allowing for the relatively small number of digital collections that have been processed and opened to end users, interest in the scholarly community seems slight to non-existent. Emory’s work on Salman Rushdie’s computers, which I praised earlier, has, to my knowledge anyway, produced no great uptick of interest in his digital manuscripts in literary studies. This will doubtless change over time, but it will be slow—you need scholars working on the right topics, they need to be aware of the existence and import of relevant born-digital materials, they need to have or to be motivated to acquire the training to work with them, and finally the materials themselves must turn out to bear fruit. In the meantime I fear that lack of users will be one more reason resource-poor institutions choose to defer the processing of born-digital collections over other material. So I think we need users, or to put it more colloquially we need some big wins to point to in order to justify the expense and expertise processing these collections requires. Otherwise we may simply go the way of media conversion, outsourcing collections in bulk without regard for the material context of the data.
Trevor: The collection of computers at MITH bears some similarities to Lori Emmerson’s Media Archeology Lab and Wolfgang Ernst’s Media Archeological Fundus. To what extent do you think your approach is similar and different than theirs?
Matthew: That’s a great question. I think of places like MITH, which is a working digital humanities center, as well as the MAL, the MA Fundus, and Nick Montfort’s Trope Tank at MIT as inhabiting a kind of “third space” between manuscript repositories processing born-digital collections on the one hand, and computer history museums on the other. They’re really much more akin to fan sites and grassroots initiatives, like the Musee Mecanique penny arcade in San Francisco. Above all, these are entities whose commitment to the materiality of computer history is absolute. They adopt the media archaeological precept that not only does the materiality matter, but that the machines ought to be switched on. At MITH, you can fire up an Apple II or Amiga or Vectrex. You can also take a look at “Mysty,” a 200 lbs. IBM MT/ST word processor (1964!) that I have high hopes of one day restoring. We began collecting vintage computers when Doug Reside was still there, and over time the collection grew. They have been useful to us in several different funded projects over the years, and help distinguish us as a working digital humanities center. But what sets us apart is that we also have two individual author collections, Deena Larsen and Bill Bly—both early electronic literature pioneers—and we have worked to build a relationship to Library Special Collections to ensure their long-term safekeeping.
That last point is worth some further elaboration. I know that when MITH acquired first the Deena Larsen materials and then, more recently, Bill Bly, there were maybe a few eyebrows raised in the manuscripts world. Clearly here was digital humanities looking to usurp yet another role. But that wasn’t the motive at all. Rather, both Deena and Bill were attracted to the idea that we would be working with the collections, using them as research testbeds and sharing them with students. They saw them very much as teaching collections, not unlike the holdings at RBS where students are encouraged to handle the materials, sometimes even to take them apart (and put them back together again). Because MITH does not have other collections to process we were able to work at our own pace, experimenting and testing. But we’re also sensitive to the need for long-term stewardship, and so to that end have forged what may be a unique model of joint custody for these collections between MITH and University Special Collections. In an arrangement concretized by an MOU we are jointly developing procedures for processing these materials and eventually other born-digital collections at Maryland. MITH and the University Libraries are also fortunate enough to be hosting an NDSR fellow this coming year, and we have high hopes that our resident, Margo Padilla, will be able to help us think through the access portion of the model, by far (in my view) the toughest component. So while we align completely with the sensibilities of the MAL and other such places, we also have a rapidly maturing relationship with our institution’s special collections staff, and we hope that others may be able to benefit from that model.
Trevor: What projects or initiatives related to born digital materials are you most excited about now and why?
Matthew: Well, I would be remiss if I did not promote BitCurator, the open source digital forensics environment we’re developing along with a team at UNC SILS. BitCurator is not a tool or a set of tools, it’s an environment, specifically a custom Linux distribution that comes pre-configured with a range of open source tools, enhanced by additional scripting from us to link them together in a workflow and generate reports. We’re beginning a new phase of the project with a dedicated Community Lead in the coming year, and this will be critical for BitCurator in terms of its uptake. To that end we’re also developing an important relationship with Archivematica, where some of our BC scripts will be available as services.
Where I’ve really noticed the impact from BitCurator, though, is in my teaching. Permit me an anecdote. My first year at RBS, I attempted a SleuthKit installfest. I described the experience to Michael Suarez afterwards, and if you ever doubted that a distinguished bibliographer and Jesuit was capable of some salty language, his characterization of my description of the process would have disabused you. Lesson learned, the next two years I relied on screenshots and canned demos from the safety of the controlled environment on my laptop at the front of the room. Much safer, but not nearly as satisfying for the students. BitCurator, at least on my end, was born directly from those frustrations. Thus when this past year we were able to bring the students full circle—from analyzing a disk image, performing basic data triage like hashing and format identification, to searching the image for PII, generating DFXML metadata, and exporting it all as a series of human and machine-readable reports, it was hugely gratifying.
Trevor: You are active in the digital humanities community, with that said I don’t necessarily see many folks in the digital humanities working so extensively with born digital materials. What role do you think born digital materials have to play in the digital humanities and how do you think more digital humanists might be engaged in the research issues surrounding born digital primary sources?
Matthew: I think the potential here is huge, and it dumbfounds me that there isn’t already more cross-over and collaboration. Most DH folks, though, tend to work on older materials, if nothing else than for the obvious reason of copyright. There are some exceptions: Lev Manovich and his idea of cultural analytics, for example. Matt Jockers is going to begin working on more 20th century material (and has the corpus to do it with, an amazing feat) and Ed Finn has been working on contemporary material for a while. Still, they’re the exception. Part of it may be what’s always struck me as a pernicious and pointless division between new media studies and digital humanities, with the former trending towards contemporary digital cultural studies and the latter towards more established ventures in literary criticism and historical studies. But the digital humanities projects of today are the born-digital collections of tomorrow, and the vernacular culture of the Web is no less suitable for analytics from DH as the vernacular culture of two hundred years ago, which we now seek to apprehend through techniques such as distant reading and data mining. DH, it seems, to me, is the natural ally for the digital archivist in the scholarly world. (DHers, for their part, can perhaps better learn that there is such a thing as an archives profession and that their own free use of the term does not necessarily endear them to its practitioners, who have their own benchmarks for professionalism.) I’ve written down some additional thoughts about this, and perhaps the best thing to do is to point folks to this article here, particularly the concluding 5th section.
Thanks, Trevor, for this opportunity: The Signal is a terrific platform for the community, and I’m honored to be included here alongside so many friends and digital preservation pioneers!