Institutional Repositories: Part 1

If you aren’t a member of the library and archiving world, you probably aren’t aware of the phrase institutional repository (IR for short). I certainly wasn’t aware of IRs prior to joining the CDL, and I’m guessing most researchers are similarly ignorant. In the next two blog posts, I plan to first explain IRs, then lay out the case for their importance – nay, necessity – as part of the academic ecosphere. I should mention up front that although the IR’s inception focused on archiving traditional publications by researchers, I am speaking about them here as potential preservation of all scholarship, including data.

Academic lIbraries have a mission to archive scholarly work, including theses. These are at The Hive in Worcester, England. From Flickr by israelcsus.

Academic lIbraries have a mission to archive scholarly work, including theses. These are at The Hive in Worcester, England. From Flickr by israelcsus.

If you read this blog, I’m sure you are that there is increased awareness about the importance of open science, open access to publications, data sharing, and reproducibility. Most of these concepts were easily accomplished in the olden days of pen-and-paper: you simply took great notes in your notebook, and shared that notebook as necessary with colleagues (this assumes, of course geographic proximity and/or excellent mail systems). These days, that landscape has changed dramatically due to the increasingly computationally complex nature of research. Digital inputs and outputs of research might include software, spreadsheets, databases, images, websites, text-based corpuses, and more. But these “digital assets”, as the archival world might call them, are more difficult to store than a lab notebook. What does a virtual filing cabinet or file storage box look like that can house all of these different bits? In my opinion, it looks like an IR.

So what’s an IR?

An IR is a data repository run by an institution. Many of the large research universities have IRs. To name a few, Harvard has DASH, the University of California system has eScholarship and Merritt, Purdue has PURR, and MIT has DSpace. Many of these systems have been set up in the last 10 years or so to serve as archives for publications. For a great overview and history of IRs, check out this eHow article (which is surprisingly better than the relevant Wikipedia article).

So why haven’t more people heard of IRs? Mostly this is because there have never been any mandates or requirements for researchers to deposit their works in IRs. Some libraries take on this task– for example, I found out a few years ago that the MBL-WHOI Library graciously stored open access copies of all of my publications for me in their IR. But more and more these “works” include digital assets that are not publications, and the burden of collecting all of the digital scholarship produced by an institution is a near-insurmountable task for a small group of librarians; there has to be either buy-in from researchers or mandates from the top.

The Case for IRs

I’m not the first one to recognize the importance of IRs. Back in 2002 the Scholarly Publishing and Academic Resources Coalition (SPARC) put out a position paper titled “The Case for Institutional Repositories” (see their website for more information). They defined an IR as having four major qualities:

  1. Institutionally defined,
  2. Scholarly,
  3. Cumulative and perpetual, and
  4. Open and interoperable.

Taking the point of view of the academic institution (rather than the researcher), the paper cited two roles that institutional repositories play for academic institutions:

  1. Reform scholarly communication – Reassert control over scholarship, reduce monopoly power of journals, and bring relevance to libraries
  2. Promote the university – Serve as an indicator of the university’s quality; showcase the university’s research; demonstrate public value and increase status.

In general, IRs are run by information professionals (e.g., librarians), who are experts at documenting, archiving, preserving, and generally curating information. All of those digital assets that we produce as researchers fit the bill perfectly.

As a researcher, you might not be convinced by the importance of IRs given the  arguments above. Part of the indifference researchers may feel about IRs might have something to do with the existence of disciplinary repositories.

Disciplinary Repositories

There are many, many, many repositories out there for storing digital assets. To get a sense, check out re3data.org or databib.org and start browsing. Both of these websites are searchable databases for research data repositories. If you are a researcher, you probably know of at least one or two repositories for datasets in your field. For example, geneticists have GenBank, evolutionary biologists have TreeBase, ecologists have the KNB, and marine biologists have BCO-DMO. These are all examples of disciplinary repositories (DRs) for data. As any researcher who’s aware of these sites knows, you can both deposit and download data from these repositories, which makes them indispensable resources for their respective fields.

So where should a researcher put data?

The short answer is both an IR and a DR. I’ll expand on this and make the case for IRs to researchers in the next blog post.