Recently, the world of web archiving has been a busy one. Here are some quick updates:
- The National Library of Estonia released the Estonian Web Archive to the public. This is of particular note because the Legal Deposit Law in Estonia allows the archive to be publicly accessible online. If you read Estonian you can browse the 1003 records that make up the 1.6 TB of data in the archive. A broad crawl of the entire Estonian domain is planned in 2014.
- Ed Summers from the Library of Congress gave the keynote address at the National Digital Forum in New Zealand titled The Web as Preservation Medium. Ed is a software developer and offers a great perspective into some technical aspects of preserving the Web. He covers the durability of HTML, the fragility of links, how preservation is interlaced with access, the importance of community action and the value of “small data”.
- The International Internet Preservation Consortium 2014 General Assembly will be held at the Bibliothèque nationale de France in Paris May 19-13, 2014. There is still a little time to submit a proposal to speak at the public event on May 19th titled Building Modern Research Corpora: the Evolution of Web Archiving and Analytics.
Call for Proposals announcement from the IIPC:
Libraries, archives and other heritage or scientific organizations have been systematically collecting web archives for over 15 years. Early stages of web archiving projects were mainly focused on tackling the challenges of harvesting web content, trying to capture an interlinked set of documents, and to rebuild its different layers through time. Institutions, especially those on a national level, were also defining their legal and institutional mandates. Meanwhile, approaches to web studies developed and influenced researchers’ and academics’ use of web archives. New requirements have emerged. While the objective of building generic collections remains valid, web archiving institutions and researchers also need to collaborate in order to build specific corpora – from the live web or from web archives.
At the same time, “surfing the web the way it was” is no longer the only way of accessing archived web content. Methods developed to analyse large data sets – such as data or link mining – are applicable to web archives. Web archive collections can thus be a component of major humanities and social sciences projects and infrastructures. With relevant protocols and tools for analysis, they will provide invaluable knowledge of modern societies.
This conference aims to propose a forum where researchers, librarians, archivists and other digital humanists will exchange ideas, requirements, methods and tools that can be used to collaboratively build and exploit web archive corpora and data sets. Contributions are sought that will present:
- models of collaboration between archiving institutions and researchers,
- methods and tools to perform data analytics on web archives,
- examples of studies performed on web archives,
- alternative ways of archiving web content.
Abstracts (no longer than one page) should be sent to Peter Stirling (peter dot stirling at bnf dot fr) by Friday December 9, 2013. Full details are available at the IIPC website.