The Library of Congress has a big announcement for any past or potential users of the loc.gov application programming interfaces (APIs): we heard your feedback, we noted your pain points, and we improved our documentation!
I had the opportunity to collaborate with Patrick Rourke, software engineer in the Library’s IT Design and Development Directorate, and many other internal and external partners to publish a series of updates to loc.gov/apis. This website is the home of technical documentation for various Library of Congress application programming interfaces (APIs).
Some of our changes include:
- Clearly describing the structure of common JSON response objects. The updated site outlines the various parts of a JSON response return from a search results endpoint (/search, /collections, /format) and by an item or resource endpoint.
- Foregrounding that there is no such thing as a uniform response from the gov JSON/YAML API. The heterogeneity of the data structures derives from differences in digitization and metadata practices. The SOLR schema that underlies the loc.gov search engine and the JSON/YAML API provides some stability for certain data attributes, but has the ability to be extended to serve the needs of the content and the web presentation.
- Expanding our documentation to include other Library of Congress APIs and data services. The Library of Congress has several!
- Adding detailed instructions about how to use faceting to limit result set sizes.
To celebrate this milestone, and all the people who make this work possible, we are taking a walk down memory lane showcasing the development of the Library’s websites and APIs over time. We’re also highlighting several examples of how people have used LC APIs to further their own inquiry.
1990s – 2000s
The Library registered the loc.gov domain in 1990. Concurrently, the American Memory website and the Prints and Photographs Online Catalog (PPOC) was developed for sharing LC collections online. At the ALA conference on June 22, 1994, the Library debuted these websites to the public.
In these early years, the Library accomplished several important milestones in using technology to provide public services. In 1994, The National Digital Library program was launched to provide online access to selected digitized collections. In 1995, the Newspaper and Current Periodical Reading Room launched the first Library Reading Room World Wide Web home page with links to periodicals available online. It was visited half a million times in its first year.
Then, in 2000, the Library released its children and family website “America’s Library” on April 24. It logged 30 million visits before the end of the year. That same year, Congress established the National Digital Information Infrastructure and Preservation Program (NDIIPP), which marks the start of the Library’s Web Archiving Program.
In 2007, the longstanding National Digital Newspaper Program culminated in the creation of the Chronicling America website, starting with over 500,000 digitized newspapers from 1900-1910 and a directory of 150,000 US newspapers published from from 1690 to 2007. The Library also launched its first blog at loc.gov/blog in 2007. It was one of the first official federal government blogs available for the public.
2010 – Present
In the 2010s and 2020s, the world began creating data at unprecedented speeds with people spending more time online than ever before. These decades marked a period of rapid evolution for the Library’s digital and mobile technology. In 2010, The Law Library, Music Division, and Science, Business and Technology Division all launched their own dedicated blogs.
2011: loc.gov gets a refresh
In 2011, work began on an enterprise-wide effort to manage the Library’s existing website content and provide a foundation on which to develop new capabilities. This Library-wide project encompassed all of the Library’s web presences, including today’s loc.gov presentation, under a more holistic, managed umbrella. The loc.gov JSON application programming interface was created around this time to serve digital content to the website, and power search and discovery. Originally, the API served data only in the JSON format–the YAML format was added in 2017.
2017: LC for Robots
In September 2017, LC Labs was established to design experimental technology initiatives and invite the public to engage with the digital collections. Our team’s inaugural launch included LC for Robots, a hub for resources on machine-readable access to the collections that included prototype documentation for the JSON/YAML API. In December 2017, the Library added a YAML representation for the loc.gov API output.
2018: Prototyping with the API
In 2018, digital library software developer Laura Wrubel spent several months on research leave at the Library. In partnership with LC Labs, Laura built various applications that pulled data from the API to visualize digital collections, and she ran a “Library Carpentry” workshop for Library staff that heavily emphasized how to use the loc.gov API.
Library of Congress fellows and interns created supporting documents and demonstrations around how to work with collections as data via various APIs. They built Jupyter Notebooks for creating location-based narratives, leveraged the Chronicling America API to create a newspaper bot, and visualized collections information as a Southern Mosaic. They also used the JSON/YAML API as part of a digital scholarship workshop at the Kluge Center and to build a Chrome browser extension that populates new tabs with free-to-use-and-reuse Library images.
Meanwhile, Labs staff members managed the Beyond Words crowdsourcing pilot and began planning for the By the People platform, which leverages the API to gather image assets and item and page level metadata for volunteer transcription.
2019: API as data source for machine learning
By 2019, the Library began to see patterns in the gaps between what users wanted to do with our data and what data was available. To bridge those gaps and explore how data workflows could be enhanced with emerging technologies, LC Labs began a “season of machine learning.” This investigation, which continues to this day, examines how machine learning techniques could be applied to increase the discoverability of digital archival materials. All six demonstration experiments with the Project Aida team at the University of Nebraska-Lincoln leveraged the API to test computational processes like Document Segmentation; Graphic Element Classification and Text Extraction; Digitization Type Differentiation; and more.
2020: Innovations leveraging the API
2020 featured both big and small innovations with the API. The Library launched the LOC Collections App for iOS, built on the API, allowing the public to discover and share digital collections content on their mobile devices. An Android version of the app was released the following year.
The Library also hosted two Innovators in Residence, both of whom used the API to work with digital collections at scale. Brian Foo, creator of Citizen DJ, used the API as the starting point to query the Library’s moving image and audio collections for rights-free materials. Ben Lee, inventor of the Newspaper Navigator dataset and search application, leveraged the Chronicling America API to build a pipeline for extracting visual content from millions of digitized historic newspapers.
As more work moved online due to the COVID-19 pandemic, Library staff began developing resources to make it easier for people to engage with digital materials using code. Rachel Trent, digital projects coordinator in the Geography and Maps Division, created two Jupyter Notebooks for accessing maps programmatically. And reference specialist Betty Brown used the resources available on LC for Robots to support a researcher who sought to bulk download the playbills in the Federal Theatre Project.
2021: Computational readiness
By 2021, the API is foundational to many projects exploring Library collections. LC Labs built on its first phase of machine learning research to design and test two humans-in-the-loop workflows using the API as a data source. Simultaneously, the Experimental Access experiment leveraged the real-time nature of the API to spin up an environment for visualizing collections information in new and engaging ways that were responsive to user needs.
The API also undergirded the design of the 2021 Innovator in Residence project Speculative Annotation by Courtney McLellan. This dynamic website leveraged the Library’s implementation of the International Image Interoperability Framework (IIIF) to create an image annotation experience for K-12 teachers and students.
On the other end of the spectrum, the Library of Congress also published its Guide to Digital Scholarship, which includes tutorials for working with the API in a browser and command line.
2022: Engaging the public with Computing Cultural Heritage in the Cloud & Congress.gov API
To deepen understanding of the computational research experience, LC Labs onboarded three scholar-practitioner teams under the Computing Cultural Heritage in the Cloud (CCHC) grant. Their projects ranged in subject matter and technical approach but all three relied heavily on the API to build a corpus for research.
In an October CCHC Data Jam, Labs moved data to the cloud for retrieval but still relied on the API as a source of metadata. This meant our data jam participants provided lots of feedback on the JSON response that was usefully integrated in our latest revisions to the documentation.
2023: Experimental data sandbox launches
The LC Labs team continues to iterate data.labs.loc.gov, an experimental space to deepen understandings about the technical and social dimensions of providing access to Library data. These data packages will not replace the loc.gov JSON API as primary access point for datasets—rather, the two will continue to inform one another about the benefits and drawbacks of each method.
Thanks for coming on this trip down memory lane–if you’re an API and digital library enthusiast, watch this space!
Subscribe to the Signal blog— it’s free!