Continuing the NDSA Insights interview series, I am thrilled to talk about the new Library of Congress Recommended Format Specifications with Ted Westervelt, head of acquisitions and cataloging for U.S. Serials – Arts, Humanities & Sciences at the Library of Congress. Ted has been overseeing the development of the Recommended Format Specifications. While the specifications cover analog as well as digital materials, I’m particularly interested in exploring the implications of the digital side of this work here on The Signal.
Trevor: For starters, could you give us a sense of the range of digital materials covered in the format specifications? How were the digital formats covered decided on?
Ted: The Library’s mission is to collect the creative output of the nation and the world and only a few subject areas fall outside our scope. This is irrespective of the format in which content is created and in this day and age, content is being created digitally at least as much as it is being created in traditional analog formats. So we tried to be as comprehensive as possible and the specifications attempt to cover as broad a range of digital materials as possible. In identifying the digital materials in the specifications, we were helped by the fact that much digital material has clear analog counterparts – e-books and books, digital and print photographs, etc. – though there are areas in which content can be created digitally in ways which have no analog counterpart and we tried to capture that as fully as we could as well.
Trevor: Could you tell us a bit about who the audience is for this both inside and outside the Library of Congress? Who is this intended to help and how does it help them?
Ted: There are two key audiences for the specifications. We intend them for use by our own staff, in helping to provide them guidance in identifying material for addition to the Library’s collection. We have staff who can identify works covering the subject matter we want and who can identify the best ways to acquire it. But we are looking at building a collection for the long-term and we need our staff – and the sources from which we acquire these materials – to be aware of the attributes which will make content more preservable and more accessible in the long-term, as this will be crucial in enabling us to make the correct decisions in acquiring content for the collection. This is not to say that we shall begin acquiring digital content simply because we are issuing the specifications. There is much work that the Library must undertake in order to make that a feasible task. But the specifications do allow us to begin planning for that work so that we can acquire digital content on the scale which we want.
However, the specifications are not merely for our own internal needs and benefit. The specifications are written from the perspective of an institution committed to the acquisition and preservation of creative content for the long-term. But, the results of our work in identifying what we at this time judge to be the characteristics of creative works which best enable them to be preserved and made accessible in the future have potential benefits to all stakeholders in the creative process. No matter what part of the process or business in which people or organizations are involved – creation, sales, acquisitions, management – ensuing that we are working with content which can last and be useful in the long-term is something from which each of us can and would benefit.
Trevor: Encryption and digital rights management appears a lot in the specifications. The phrase “Files must contain no measures that control access to or use of the digital work (such as digital rights management or encryption)” appears 19 times in it. Could you give a little background on why this is such a critical issue for libraries?
Ted: Digital rights management is an important issue for the Library. Our mission is to further the creative output of the nation. This cannot happen without a system in which there are enough incentives for individuals, groups and organizations to create content and to invest in its distribution. If we cannot ensure that this happens, we threaten the free flow of ideas and we are all the poorer. The Library understands and supports the rights of content creators as it does the needs of content consumers and the interests of both are always in our minds. The current use of digital rights management is an understandable attempt to ensure that those who have invested their time, energy and money in these works are justly rewarded for it.
However, we have to be aware that encryption and digital rights management makes it more difficult to ensure that we can preserve creative works and make them accessible in the long-term. This is something we need to take into account when trying to balance the rights and needs of all parties involved. We need to make the drawbacks of the current use of digital rights management apparent if we are to help all parties reach a business model which rewards creativity, encourages the sharing of that creativity and ensures the output of that creativity will last.
Trevor: One could imagine a different version of the specifications that had analog formats up front and digital in the back. Instead, we find treatment of digital formats continuously situated alongside their analog counterparts. Could you tell us a bit about how this decision was made and explain some of the logic behind it? Is it correct to assume that the implication is that digital now pervades every individual area the Library of Congress collects in and that this represents a desire to maintain continuity with those areas of collecting? Or is there something else at play here?
Ted: There is no point pretending that the Library is collecting digital content on the scale and scope with which it is collecting analog content. We would like to and the specifications are one step to help get us there, but we are not there yet and it will take some time and effort. However, the specifications are meant to engage with the world outside the Library. And, inside the Library and outside it, no one is under any illusion that digital content and analog content are two separate and unrelated spheres and never the twain shall meet.
In most fields of the creation of works which the Library would seek to acquire, if the content is being created in an analog format, it is also being created in a digital format. It makes less and less sense to think differently about two instances of the same intellectual content, simply because in one instance it is a print book and in another it is an e-book. An individual might prefer to purchase one or the other for personal reasons, but they do not think of them as particularly different items. If an individual or a group or an organization wishes to create or acquire or preserve a textual object, they are more likely than not going to want to think about it as both an analog and a digital object. Separating analog from digital would fail to engage with the world in which we increasingly live.
Trevor: There are two categories of works in the document that have no mention of analog or print materials; Software and data sets. Both of these are formats which libraries are not particularly well known for collecting. What does putting these on par with text, image, audio and video suggest about the future of collecting? Are there any things in these two format sections that you would want to particularly direct our attention to?
Ted: The Library is building a collection which is and should be agnostic to the question of whether content is analog or digital. We collect content which represent the subject areas we desire and which represent the creative output of the nation and the world. Much of that is now being created in both analog and digital formats; some is being created in digital formats only; and, with regard to software and datasets, some can only be created digitally. But all of it is content which the Library will want to acquire if it is to continue to build its collection with the same scope and scale it has had in the past.
The real challenge with software and datasets is not only are we still figuring out what we can do with them now, but we are also trying to figure out what we can do to ensure they can be used in the future. The flexibility and the functionalities which make them so useful and appealing raise greater challenges to preserving them or to developing best practices which will aid that preservation. These two sections of the specifications reflect the ambiguities of the current situation and lack some of the precision of the other sections. We do believe that what we have in the specifications for these formats reflects accurately the current situation and hopefully will be useful in furthering the discussions which can lead to greater ease of preservation for these formats in the future.
Trevor: Several of the formats make reference to things like “Final production /release version of content rather than pre-production version.” Given the relationship between the Library of Congress and the Copyright Office, it makes a lot of sense to focus on this kind of final published and final version. With that said, many libraries and archives also collect whole archives of work that might document the workflow and creative process of individuals and or organizations. I’m curious for your reaction to both this specific point, could you tell us a bit about the preference for “final versions”? With this instance in mind, could you suggest similar cases where the focus and approach to formats in this document would and wouldn’t be compatible with different collecting objectives?
Ted: I think you have struck on a couple of key questions which have been in our minds throughout the whole process of creating the specifications and will remain with us for some time to come. The specifications are definitely written from the perspective of a particular institution, the Library of Congress. To develop them, we worked closely and collaboratively with our colleagues in the Copyright Office. They were able to provide us with their unique expertise and knowledge and a template for the specifications in the form of the Copyright Best Edition Statement (pdf). We know that there will be decisions we have made in them which might not fit perfectly with the specific thrusts of other institutions. And we hope to hear from our colleagues and stakeholders when this is the case. The Library values that sort of input and it will help us make the specifications more useful in the future, either by adapting them or by making it clear where certain activities fall outside their scope.
Likewise, one of the major questions, now and in the future, is at what point or points in the creative process does an institution want to collect and preserve a work. This is, at least for the time being, a question which can only be answered by the institution concerned, based on their collecting scope and their resources and technical capabilities. The scope of the Library of Congress is so broad that focusing on the final version in general is a practical necessity, at least for now.
But this is a fascinating question. If someone writes an e-book, posts it through her blog, gets comments on the blog, revises and then publishes it as an e-book through a major distributor, what would we consider the work that should be preserved? What would she consider the complete work to be? We can do so much more in terms of collaboration and creation and in the dissemination of this output and it all raises so many questions. However, this is not the first time we’ve had a great leap forward in the technology of creating and sharing information. We just need to remember that not everything from these previous great leaps was able to survive. What we want to do with the specifications is to help people increase the likelihood that, if they want their works to survive, that they can.