Welcome to the Wild World of Web Archiving

The following is a guest post by Nicholas Woodward, an Information Technology Specialist and the newest member of the Library’s Web Archiving team.

woodward2The path that lead me to the Library of Congress was long and circuitous, and it includes everything from a tiny web startup to teaching economics in Nicaragua to rediscovering a passion for developing software in Austin, Texas. Like many folks who develop software in the academic and library world I have a deep interest in the social sciences and humanities, in addition to technology.

But unlike others who began in these fields and subsequently developed technological knowledge and skills to do new and exciting things, I did the opposite. I spent years in the technology industry only to find that it had little value for me without serious contemplation of what effect it has on other peoples’ lives. Only later did I discover that software development in the library and academic environments allows one to incorporate such considerations as the practical applications for research or how different forces in society influence technological development and vice versa into the process of writing code.

But I’m jumping ahead. Let’s get the events out of the way. In 2003 I graduated from the University of Nebraska-Lincoln with a BS in computer science and started working full-time at a very small web development company. After deciding there must be more to life than making websites for a salary, I joined the Peace Corps in 2005 and worked as a high school teacher in Nicaragua for roughly 2.5 years. After a brief stint observing elections in Guatemala, I returned to the U.S. in hopes of going back to school to study the social sciences with a focus on Latin America. My dream scenario took shape when I was accepted to an MA program in the Teresa Lozano Long Institute of Latin American Studies at the University of Texas at Austin. I earned my MA in 2011 and subsequently earned an MS in Library and Information Science in 2013, also at UT.

It was while an MA student that a graduate research assistantship would change my career path for good. As a dual research assistant for the Latin American Network Information Center and the Texas Advanced Computing Center I had the incredible opportunity to conduct research on a large web archive in a high-performance computing environment. In the process I learned about things such as the Hadoop architecture and natural language processing and Bayesian classifiers and distributed computing and…

But the real value, as far as I was concerned, was that I could see directly how software development could be more than just putting together code to do “cool stuff.” I realized that developing software to facilitate research and discovery of massive amounts of data in an open and collaborative fashion not only increases the opportunities for alternative types of knowledge production but also influenced how it gets created in a very profound way. And being a part of this process, however small, was the ideal place for me.

Which brings us to today. I am thrilled to be starting my new role as an Information Technology Specialist with the web archiving team of the Library’s Office of Strategic Initiatives. It is an incredible opportunity to learn new skills, incorporate knowledge I’ve acquired in the past and contribute in whatever ways I can to an outstanding team that is at the forefront of Internet archiving.

As the newest member of the web archiving team, my focus will be to continue the ongoing development of Digiboard 4.0 (pdf), the next version of our web application for managing the web archiving process at the Library of Congress. Digiboard 4.0 will build on previous software that enables Library staff to create collections of web-archived content, nominate new websites and review crawls of the Internet for quality assurance, while also making the process more efficient and expanding opportunities for cataloging archived websites. Additionally, part of my time will include exploratory efforts to expand the infrastructure and capacity of the web archiving team for in-house Internet crawling.

I look forward to the challenges and opportunities that lay ahead as we contribute to the greater web archiving community through establishing best practices, improving organizational workflows for curation, quality review and presentation of web-archived content and generally expanding the boundaries of preserving the Internet for current and future generations.