Category: datasets

Finding By the People Transcriptions in the Library’s Digital Collections

Today’s guest post is from Dr. Victoria Van Hyning, who served as a By the People Community Manager at the Library from 2018-2020. Starting in Fall 2020, she will be an Assistant Professor of Library Innovation at the University of Maryland iSchool, where she will continue her research on crowdsourcing, outreach, and inclusion.   The […]

In the Library’s Web Archives: Dig If You Will the Pictures

The Digital Content Management section has been working on a project to extract and make available sets of files from the Library’s significant Web Archives holdings. This is another step to explore the Web Archives and make them more widely accessible and usable. Our aim in creating these sets is to identify reusable, “real world” […]

In the Library’s Web Archives: Totally Tabular Data

This is a guest post by Pedro Gonzalez-Fernandez, a Digital Collections Specialist in the Digital Content Management Section at the Library of Congress. He holds both an MLS and an MA in Music History and Literature from University of Maryland, as well as a BA in Music from Shepherd University. The Digital Content Management section has […]

Excel is threatening the quality of research data — Data Packages are here to help

This week the Frictionless Data team at Open Knowledge International will be speaking about making research data quality visible at the International Digital Curation Conference (#idcc17). Dan Fowler looks at why the popular file format Excel is problematic for research and what steps can be taken to ensure data quality is maintained throughout the research process. Our Frictionless Data project aims […]

Publicly available data from Twitter is public evidence and does not necessarily constitute an “ethical dilemma”.

An article in Scientific American suggests further ethical considerations should be made for research derived from Twitter data. Ernesto Priego questions first the extent to which Twitter will actually release all of its valuable data and also argues archiving and disseminating information from Twitter and other public archives does not have to be cause for an “ethical dilemma” so long as […]