The following is a guest post by Jefferson Bailey, Strategic Initiatives Manager at Metropolitan New York Library Council, National Digital Stewardship Alliance Innovation Working Group co-chair and a former Fellow in the Library of Congress’s Office of Strategic Initiatives.
Jason Scott will no doubt be familiar to many readers of this blog having been interviewed previously by Leslie Johnston in her post Jason Scott, Rogue Archivist. In the intervening year, however, Jason has undertaken a number of new projects and initiatives dedicated to preserving digital information and the history of digital technologies, as well as continuing his work with the Internet Archive. In this interview, Jason talks about some of his recent work preserving digital culture.
Jefferson: Thanks for talking with us again. Since your previous interview here described your work on textfiles.com and with the Archive Team, I wanted to focus on some of your more recent projects. First, you declared November 2012 to be Just Solve the File Format Problem month. Tell us about how that project came about, how it went, and its future plans.
Jason: Like any industry involving a lot of public money and a lot of complicated projects, the library and archives worlds have a number of supposedly insurmountable projects that show very little return on investment beyond the general sense of good feeling with a general improvement down the line as the project becomes more mature and complete. In the process of speaking to groups and individuals in the library and archives world, I found a common theme of there being great difficulty in discerning the characteristics of digital files and data to understand what format it was in. Several projects have been mounted to approach this problem, but they tended to work along certain families of data, or were locked in a semi-proprietary situation and database that would not easily be shared. To work that hard and then give away the results of that work would be insane. And since open-source projects are in many ways insane, I thought it might be a good project to tackle.
Understanding, collecting documentation, and providing code related to file formats is one of the greatest and most difficult problems, simply because of the large wealth of sources available, and the fleeting nature of so much technical documentation, especially when the technology underlying becomes obsolete. Again, people were working on this, but they were running into hard limits left and right.
What I proposed with this project was to create a general common space not under the purview of any specific group, and allow many people both within industry and outside of it to track down and provide classification of file formats. In this way, the information could be absorbed into the other registries, as well as there being very little issue in expanding an entry that anybody had expertise on. In other words many of the properties of the wiki.
Taking this further, I thought that this file format problem represented one of many other “insurmountable” problems that exist in the world, desperately in need of focus of volunteers and a rule of making the results available to all. I also know that the process of working on a volunteer project can really slow down and lose energy if it had no specific time frame.
Combining these both I decided a really great idea would be a “Just Solve The Problem” Months, giving 30 days for the focusing collaboration that energizes any wiki, and providing a good foundation towards continued improvement.
The first solve the problem month was in November of 2012. I am still deciding if there’ll be another one. The File Formats wiki can be found here.
Jefferson: What were some of the challenges, successes, and unexpected surprises of launching a user-driven, almost freeform approach to documenting file format types?
Jason: Creating the first example of an ongoing series is always a challenge, because people don’t understand what the series is, or how the first example fits into it. It’s kind of like having a television pilot. We might not understand that there’s a large story arc, or that the pilot merely exists to introduce everyone and that the second and later episodes might not work the same way. I found people who thought that we would attack the same problem every year, or others who understood we were to attack a different problem each year, leaving a bunch of projects in its wake. Looking back, I certainly would’ve called this a one off event, and then had endless sequels.
One major surprise, which truly caught me off-guard, was that I considered this project to be a freeform open opportunity for archivists, librarians and others to not be inhibited by the structure of the organizations they normally collaborate or work for. And in the complaints, clarification demands, and distaste for the open format and freeform approach, I discovered that there were a percentage of volunteers who obviously enjoyed and depended on the structure of their positions. Not everyone was like that, of course. But enough were that it really, really surprised me. Some folks walked away from the project immediately, when discovering that there wasn’t a standards body or reference document, for instance. Others were majorly turned off by some of the ad hoc classifications that have been added, saying that they were a division of energy and not needed. Many of course were excited to break new ground, and did amazing work.
I intentionally put crazy file formats in the collection, including DNA, piano rolls, human language , and looms. I wanted people to understand that we weren’t simply doing one type of file, and that putting an expensive definition on what a file meant and what a format meant would give us more leeway from contributions around the world. That said, I also knew that we would be dedicated to having better and better classification, so that somebody who did not care about mechanical formats or organic formats could go immediately to the computer-based formats or application based formats and the information they needed.
Like many wiki projects, a number of people stepped forward who were major energy during the month. I don’t want to call their names out, in case I miss one or classify one wrong. But the editing history of the wiki shows the handful of folks who tirelessly added formats and links to information, some of which continue to add new information every single day. These people are angels, and the world is better for them.
Ultimately, I consider the project a success, and its continued growth and modification will make it a classic reference body.
Jefferson: While The Signal focuses largely on the preservation of digital materials, we are often reminded how much of the history of digital technologies and digital culture exists in print form. I’ve been impressed by your efforts to preserve both the populist, consumer-level cultural materials around computers but also much of, as you say, the “manuals, notes, booklets, ephemera” related to hardware and software. Tell us about The Computer Magazine Collection and the The Bitsavers Collection.
Jason: Computer hardware and software is nothing without its documentation. You might be able to stumble around, get some things running, and make good guesswork or even quality guesswork as to how it functions, but without the documentation you will always suffer from not knowing how everything works or why. You certainly won’t know how the original creators intended the machine to be used, or intended the software to be run, and the urge to just give up on a project because you don’t have information on it will always increase. By making manuals and documentation available, the world wins, even if we don’t have the original hardware or software at the moment.
Similarly, a lot of other technical and historical information is buried in the pages of computer magazines, newsletters, and flyers written around the time of the computer hardware’s glory days. Besides articles, there were also advertisements, type-in programs, and reviews that gave critical information to understanding what role these computers and software played. While it’s possible to get by with just the hardware, software or documentation, nothing beats finding out what writers of the time considered to be the important points and what was driving the industry.
One of the outside groups that has been working tirelessly for over a decade to digitize documentation, pamphlets and other written materials is the bit savers collection. The group is at bitsavers.org, And besides documentation, they have also captured the original bits off of magnetic tape and disks. I wrote a mechanism that automatically mirrors their contents on archive.org for easier reading and sorting. But the credit definitely goes to that group for their tireless efforts in bringing once lost material online.
Jefferson: Preserving that documentary evidence of how computers worked their way into our homes and our lives seems vitally important to understanding our social attitudes towards how we create, interact with, and ultimately preserve digital artifacts. In working with these collections, what novel insights into, or new understanding of, our relationship with digital technologies did you gain?
Jason: Technology industries are often quicker to adapt to changing needs or requests then other industries. In the pages of writing, you can sometimes see reaction to strange new features that later become absolute requirements, or which became analogues in the mobile world. Certainly in the 1970s, magazines and journals utilize all sorts of perspectives to technology as being a general idea. Puzzle articles sitting next to electronics and sitting next to software overviews, considering them all to be part of owning a computer. Through the 1980s and later, these general-purpose computing magazines split off into highly specialized periodicals, making a much more in depth review of aspects of those subjects, but losing the sense that human beings just thought of computers as computers. We lost something in that, but we have gained a lot of other things.
Jefferson: Much of this work is part of your self-described Charge of the Scan Brigade and your ongoing work with the Internet Archive. But, rogue archivist that you are, you have other non-IA projects focused on documenting digital history. Tell us about those.
I often take possession of computer artifacts, such as magazines, machines, and software, some of which I see about being transported to full professional archives elsewhere. So being a clearinghouse definitely takes some of my time.
Professional speaking on the subject happens occasionally, and is always a fun time.
Jefferson: In your previous interview, you ended with some advice for individuals facing data loss and for institutions looking to collaborate with projects like yours. I was hoping this time you could give advice specifically to archivists, curators, historians, and the preservation-minded that are hoping to preserve and make accessible both digital content and the physical collections of computer history. How would you advise these sorts of professionals to be more rogue?
Jason: Computer and technology history appears to still be a strangely fringe subject for many archives, yet many of the correspondence and other information related to all subjects are moving to computers. Getting archives and libraries to realize that situation is an important first step. Along with that realization will hopefully come funding in efforts around preserving computer data or having multiple digital backups available between organizations. I would like to think of a world where various libraries mirror each other’s data in return for having a secondary off-site backup. As for the preservation projects themselves, the fact is, it will not be as easy to walk into a space and pull away a pile of books or artwork or letters. It’ll be a case of being handed a laptop, or being given access to a series of Internet services like Gmail or Dropbox.
Much like how it was with home computers in the early 1980s, I would like to hope that various archivists are taking the initiative within their groups, and becoming knowledgeable enough to pass on what they’ve learned to the others.
It’s just the fact that web-based materials are becoming the dominant form of many types of archives in libraries, and getting ahead of the curve or catching up with it should be the top priority.