The following is a guest post from Library of Congress Labs Innovation Intern, Charlie Moffett. In the course of crafting data-driven narratives with digital collections, he created @govislandbot and an open-source mapping tutorial. Below he shares his processes, some of the challenges he encountered, along with the code.
I started my remote internship with LC Labs expecting to build a Twitterbot to support the Library of Congress Baseball Americana engagement with MLB All-Star Week 2018. Around the same time, I was ramping up an unrelated school project to build a chatbot prototype that would connect visitors on Governors Island with stories about its historic district. My aim in that undertaking was to use the physical context of the island to seamlessly engage New Yorkers with interesting, localized digital content using an AI tool I hadn’t much explored up to that point. What I soon realized however was that if my aim was to connect Gov Island history with its context in space, I needed first to connect more meaningfully with the digital collections that I was channeling history from. I didn’t feel right about asserting that this ‘place-based’ approach to storytelling would be worth exploring without first establishing an appreciation for the digital collections I’d be serving up. It dawned on me that my internship with the Library could be the perfect opportunity to meet with and learn from the staff behind these collections, as well as other members of LC Labs who were leveraging the newly launched loc.gov JSON API to provide machine-readable access to the collections for a variety of apps and purposes.
As I pivoted with my primary objective for the internship, one of the first things I decided on with the chatbot prototype was to focus on just a select few primary sources as data sources for the stories I’d be crafting for the bot to tell. During an in-person visit early on in my internship, I met with key folks from Chronicling America (ChronAm), of the Serial and Government Publications Division and the Veterans History Project (VHP), of the American Folklife Center.
In meeting with Robin Butterhof of ChronAm, I picked up on intricacies about how historical newspapers are collected and stored that later became critical for the prototype as I figured out how to programmatically access, manipulate, and deliver newspaper content within the Facebook Messenger environment. Chris Ehrman (also ChronAm) was kind enough to drill into the data extraction process with me through a number of his own Python scripts for a Beyond Words bot he was building: retrieving relevant attributes, downloading content in multiple formats, and visualizing summary statistics about the publications I was pulling from.
Connecting with Megan Harris and Jeanine Nault (Reference Specialist and Digital Assets Specialist with VHP, respectively) helped me imagine what it might look like to include multimedia stories in the chatbot experience, namely the sounds and transcripts of interviews with veterans who had served on Governors Island. During the same visit, I also met with Robert Brammer, a Legal Reference Librarian with the Law Library of Congress, to learn about their process of building the Law Library chatbot, including challenges and successes the chatbot had seen thus far.
These meetings and the follow-up research I performed as a result added entirely new and invaluable elements to my prototyping process. Laura Wrubel, Software Development Librarian at George Washington University Libraries, helped me to synthesize a lot of what I had picked up by then about the possibilities of the Library’s APIs and data. Listening to her talks, reading her documentation and meeting to review my own progress greatly helped my team back in New York to get going with digital collections. The work I was doing there with my peers from school and industry mentors from NYC Media Lab on the chatbot project centered primarily around user experience design and how we planned to pitch our stakeholders on the value of the project. My internship with LC Labs, on the other hand, allowed me to wade into the richness of the collections and brainstorm directly with the stewards of the material. That experience helped me to not only build a better product but, more importantly, understand and appreciate a much larger portion of the data and software lifecycle.
Because we were making a prototype, we moved fast and made content decisions to prove out UX concepts. The intent behind the bot was never to construct an authoritative history of the island anyhow, but to instead present historical material in their own voices with the added context of physical proximity to the subject at hand.
Instead of just showing images of the newspaper pages, we copied OCR data from the PDF versions of the pages into text bubbles for the viewer to digest within Messenger. This introduced yet another moment of curation in our process, not only because we were selectively choosing which chunks of the article to include in our stories, but also through spell-checking and correcting of OCR data before cementing it into the bot. I imagined various configurations in these moments – using raw OCR, including the entirety of articles for users to sift through, using only imagery to preserve nuances of the material – but ultimately our impetus to craft the right user experience for our perceived audience reigned supreme, and we made more than a few quick and dirty shortcuts to get to a presentable prototype. But while we may not have necessarily engaged in “deep” storytelling, my team and I still had to contend with the “value” of the story for the audience and convey the right elements of the material as determined by the various groups we solicited throughout the project.
In the end, I found that I was able to strike a balance between partially overlapping but distinct agendas to demo our proof of concept to the stakeholders in the fellowship, and at the same time create a space where one could reflect on this process of data collection, curation, and stewardship to inform current and future product designs. I’m particularly grateful to have had the opportunity in this project to flesh out an “emerging technology” prototyping experiment with a different sort of context than might otherwise have been obvious. It’s clear to me that these tools, platforms and workflows can enable greater access to stores of wonderful data, but the environments that those data are embedded in and depend on for sustenance are perhaps even more worthwhile to examine. As a next step, I hope to share my learnings with my peers and encourage other budding data scientists to spend time with the collections and imagine their own new and exciting applications of our Library.
My documentation for the chatbot prototype includes additional notes, screenshots, and open-source code, all of which can be accessed on the Open Science Framework.