Today’s guest post is from Abbie Grotke, Assistant Head of the Digital Content Management section at the Library of Congress.
Rick Fitzgerald contributed to the Library’s web archiving program for over fifteen years as a cataloger and metadata expert. He recently moved on to a new position in Collections Management Division, as a Cataloging Specialist/Problem Resolution Officer, and is no longer “doing web archiving” as I say, so I sat down with him for a chat and to reminisce about some of the work he accomplished with our team and experiences he had collaborating with the web archiving program during its formative years.
While working in the Library’s Acquisitions and Bibliographic Access Directorate as an Acquisitions and Cataloging Librarian, Rick was instrumental in helping us develop some groundbreaking approaches to describing and making available the Library’s web archives, such as a new way to structure our internal data and a process for automatically releasing minimal records as soon as web archives come out of a one year embargo and adding more robust descriptive data as resources allowed. These changes enabled us to clear a massive backlog, reduce the delay in making content accessible, and provide better descriptive records.
We wish you the best in your new endeavors, Rick – you are already missed by the Web Archiving Team!
What is your background?
I grew up in New Jersey and moved to Arizona for college. I was an art major as an undergrad. I also hung out in the university libraries a lot, learning their collections by browsing stacks, LCSH red books and text-interface catalogs, and slowly I realized I was spending more time in the library than in the studio. So I switched paths and went to library school in 2001. I thought I’d end up working in an art library or with visual materials, and my student work reflected that. At the same time, I had an interest in technology and taught myself some web skills, rented some server space, and played around building websites. I started at the Library of Congress in 2003 cataloging serials and doing acquisitions work, and then I moved to a group focused on Library’s e-resources in early 2007. From that team, I got my hands into a lot of other digital projects at the Library.
When did you start working with the web archiving team and what brought you to this work?
While working on the e-resources team, I helped catalog several web archiving collections, including the Iraq War 2003 Web Archive, and eventually moved to another detail in 2009 in order to learn the entire process of making a collection available for research use once it had been collected. Many early pioneers at the Library had already built out much of the infrastructure of the web archiving program starting from its early history, and I was brought in to help manage certain elements of program related to cataloging and public access.
How would you describe your role in web archiving over the years?
I began as a cataloger, then started taking more of a metadata management role, working with developers to design various metadata flows between systems. It ended as more of an acquisitions role, working to build collections in collaboration with recommending officers and the web archiving team.
You were integral to coming up with the concept of the “entity model.” Can you explain what that is and how it changed things for the web archiving program, including how we describe and make our collections accessible?
In 2011, we began the gradual migration of the web archives metadata from a stand-alone application to one that integrated more with the Library’s main web presence and other types of digital collections. For many years, we had a record for each URL nominated for archiving every time it was added to a new collection, which resulted in a lot of redundancies and conflicts after years of doing it this way. The same URL in multiple election archives, for instance, would have three different catalog records (one in each collection), sometimes with differing access controls. It was confusing! So, we set out to develop a way of describing a website with a single catalog record that showed the entire history of our attempts to archive it – all the URLs, all the collections, and all of the information that documented changes over time. This was called the “entity model” – a name that may not mean much to the general users, but was critically important to our program and how we structure our data. This also allowed us to more effectively reuse work and consolidate many of the things we were already doing, which was really impactful, not just within cataloging, but also on the permissions and technical side. It was a pretty daunting task, as it involved redesigning around a very active program, but with many hands we were able to accomplish that, as well as clearing some sizable backlogs that had accumulated in the interim period and before.
During your time with us, you’ve witnessed and been involved in major changes in how we work and manage data at scale, and in particular how we create records for massive amounts of web archives.
I’ve learned that it is good to challenge your own assumptions about things, and trust the people around you, as they’re the ones who are going to get you through the most difficult of times. When I think of sheer number of people who have worked on this effort over time, all working at capacity and often beyond it, it’s overwhelming. This includes lots of people who have since moved on. It really has been a fun and crazy ride.
You’ve worked on some pretty amazing collections while here. Do you have a favorite collection or item from the web archives?
When I first started I did a lot of looking at election candidate sites from the 2000-2008 period, I’m pretty partial to those, as they were often built by the candidates themselves, reflecting an era of both web design and political messaging on the web.
You’ve cleaned up endless amounts of titles and descriptive records from the Library’s web archives. Do you have any good stories to tell about the content we are archiving, from this unique perspective you’ve had?
I guess the story I tell most often is that I learned to translate “Home,” “Index,” and “Welcome to our website” in about 50 different languages.
Tell us a bit about how you expanded your work into acquiring content and doing QA for a few collections spearheaded by the Acquisitions and Bibliographic Access directorate.
The potential has always been there for web harvesting to be used as a method for collecting digital publications that are made available through websites. There had been several promising attempts in that area, but the first large scale effort to do so was the State Government Websites of the United States Web Archive, which collects works published by state agency publications, some of which we had stopped receiving in print or had stopped being issued in print altogether. The success of that effort led to similar efforts for foreign government publications and openly available serials.
Through these efforts we learned a lot – what harvesting can and can’t do in this area, how to perform quality assurance on web crawls at a more granular level, how to iteratively improve crawling to get the content when first attempts don’t work. It also exposed the Acquisitions and Bibliographic Access Directorate to this kind of work. There’s a great nucleus of people working to help this along.
What is your proudest accomplishment related to your work with the web archives?
Clearing the backlog, for sure. And helping the team move towards a system of monthly releases of newly collected content. I think when I started, I was already coming into a two to three year backlog, and it took a long time to dig out of it (and a lot of help). It feels good to know that when something is collected, it is made accessible to users as soon as possible. We now have over 30,000 web archives from almost 100 collections available on loc.gov. I also assisted the team in developing a system of enhancing minimal records using name and subject headings from the Library’s id.loc.gov service and used it to better enable faceted search and browse.
What’s your favorite web archiving team memory or story you like to tell?
About five years ago, I was given the opportunity to work one day a week in the Digital Content Management Section space with the Web Archiving Team–so most of my best memories come from that time period. I enjoyed the small things like passing around ideas with the team, having long conversations, experimenting, and listening in as others did the same. The whiteboards where always full, and there were bits of paper everywhere with random notes scrawled on them. Many of the ideas generated became successful efforts related to all facets of web archiving – anything from solving huge technical architecture challenges to writing how-to guides that helped the team and other staff do their work.
While a lot of that energy transitioned to online collaboration tools during the pandemic, it was also instructive to figure out novel ways to stay creative while looking out for ourselves and each other.
I’ve talked too long! Thanks for the opportunity to interview.