This is a guest post by Charlotte Kostelic, National Digital Stewardship Resident with the Library of Congress and Royal Collection Trust for the Georgian Papers Programme. Her project focuses on exploring ways to optimize access and use among related digital collections held at separate institutions. This work has included a comparative analysis of international metadata standards and a series of user interviews in order to determine how current practices meet user needs. A final report on this project will be released at the end of December.
On November 13th the Library of Congress hosted Computation in Conversation: Fostering New Fluencies in Collections as Data a lecture and workshop led by Shawn Averkamp, Manager of Metadata Services at New York Public Library. I had the opportunity to host the event as an enrichment session for the National Digital Stewardship Residency (NDSR) program; I hoped to use this opportunity to invite someone who would highlight the ways in which collections as data can be accessible for all library users regardless of their level of technical expertise. Within NDSR, I am working on a project that focuses on user needs for digital collections for research. In the context of my project, research could mean the work an elementary school student does for a class project or the work an academic does in preparing a journal article. What these and all other users have in common is a need for data to be made accessible in formats that they can use and understand.
Averkamp’s presentation focused on how librarians in every department of a library – from public services and outreach to cataloging and systems – can contribute to the development of collections as data with their unique expertise. The presentation also explored how librarians can present collections as data in a way that lowers the barrier to use. In preparing for the workshop and presentation, Averkamp met with librarians across NYPL in order to ask them about the types of data users ask for and the ways in which they use the data sets NYPL has made public. Institutions such as the Library of Congress and NYPL have made significant amounts of data accessible to users. However, Averkamp highlighted that users may not necessarily know what to do with the data or may think that computational work might not be for them.
While the term computational use might suggest that one needs to be able to write code, it could also mean working with data in a spreadsheet. The barriers to using computational methods with collections as data can be even higher when the labor that goes into transforming data sets into visualizations or other digital projects is not made visible in the end product. Averkamp’s workshop aimed to lower this barrier by presenting simple, free tools that can be used to make a data set ready for computational use. Using just Google Sheets and Timeline.js, workshop attendees were able to standardize dates and load the data into a template so that individual objects from collections held by the Library of Congress and NYPL could be presented in a timeline. Anyone can view Averkamp’s slides or try the workshop on their own by following her guide here: https://github.com/saverkamp/loc-talk-2017.
By presenting each of the steps that it takes to transform a data set into a timeline, Averkamp also highlighted how context can be lost with each change made to the data set. She discussed this loss of context in relation to Caroline Sinders’ concept of the data ethnographer. Sinders highlights the necessity of data ethnographers who will be able to describe the social and cultural contexts in which a data set was created. For library collections as data this could mean publishing a library’s cataloging guidelines, the date the data was created, and the transformations that were made to the data before it was made publicly available. Averkamp’s workshop demonstrated that by documenting the context of their data sets’ creation, as well as providing simple tools for using collections as data, librarians and libraries can lower the barrier to using collections as data.
You can follow Charlotte Kostelic and Shawn Averkamp on Twitter and find Averkamp’s workshop notes on GitHub