This post is by Natsuko Nicholls and John Kratz. Natsuko is a CLIR/DLF Postdoctoral Fellow in Data Curation for the Sciences and Social Sciences at the University of Michigan.
The problem: finding a repository
Everyone tells researchers not to abandon their data on a departmental server, hard drive, USB stick , CD-ROM, stack of Zip disks, or quipu– put it in a repository! But, most researchers don’t know what repository might be appropriate for their data. If your organization has an Institutional Repository (IR), that’s one good home for the data. However, not everyone has access to an IR, and data in IRs can be difficult for others to discover, so it’s important to consider the other major (and not mutually exclusive!) option: deposit in a Disciplinary Repository (DR).
Many disciplinary repositories exist to handle data from a particular field or of a particular type (e.g. WormBase cares about nematode biology, while GenBank takes only DNA sequences). Some may be asking if the co-existence of IRs and DRs means competition or is mutually beneficial to both universities and research communities, some may be wondering how many repositories are out there for archiving digital assets, but most librarians and researchers just want to find an appropriate repository in a sea of choices.
For those involved in assisting researchers with data management, helping to find the right place to put data for sharing and preservation has become a crucial part of data services. This is certainly true at the University of Michigan—during a recent data management workshop for faculty, faculty members expressed their interest in receiving more guidance on disciplinary repositories from librarians.
The help: directories of data repositories
Fortunately, there is help to be found in the form of repository directories. The Open Access Directory maintains a subdirectory of data repositories. In the Life Sciences, BioSharing collects data policies, standards, and repositories. Here, we’ll be looking at two large directories that list repositories from any discipline: DataBib and the REgistry of REsearch data REpositories (re3data.org).
DataBib originated in a partnership between Purdue and Penn State University, and it’s hosted by Purdue. The 600 repositories in DataBib are each placed in a single discipline-level category and tagged with more detailed descriptors of the contents.
re3data.org, which is sponsored by the German Research Foundation, started indexing relatively recently, in 2012, but it already lists 628 repositories. Unlike DataBib, repositories aren’t assigned to a single category, but instead tagged with subjects, content types, and keywords. Last November, re3data and BioSharing agreed to share records. re3data is more completely described in this paper.
Given the similar number of repositories listed in DataBib and re3data, one might expect that their contents would be roughly similar and conclude that there are something around 600 operating DRs. To test this possibility and get a better sense of the DR landscape, we examined the contents of both directories.
The question: how different are DataBib and re3data?
Contrary to expectation, there is little overlap between the databases. At least 1,037 disciplinary data repositories currently exist, and only 18% (191) are listed in both databases. That’s a lot to look for one right place to put data, because except for a few exceptions, most IRs are not listed in re3data and Databib (you can find a long list of academic open access repositories). Of the repositories in both databases, a majority (72%) are categorized into STEM fields. Below is a breakdown of the overlap by discipline (as assigned by DataBib).
Another way of characterizing repository collections by re3data and Databib is by the repository’s host country. In re3data, the top three contributing countries (US 36%, Germany 15%, UK 12%) form the majority, whereas in Databib 58% of repositories are hosted by the US, followed by UK (12%) and Canada (7%). This finding may not be too surprising, since re3data is based in Germany and Databib is in the US. If you are a researcher looking for the right disciplinary data repository, the host country may matter, depending on your (national-international/private-public) funding agencies and the scale of collaboration.
The full list of repositories is available here .
The conclusion: check both
Going forward, help with disciplinary repository selection will be increasingly be a part of data management workflows; the Data Management Planing Tool (DMPTool) plans to incorporate repository recommendations through DataBib, and DataCite may integrate with re3data. Further simplifying matters, DataBib and re3data plan to merge their services in some as-yet-undefined way. But, for now, it’s safe to say that anyone looking for a disciplinary repository should check both DataBib and re3data.