Data Privacy/Security Is Not An Afterthought

I’m currently doing a lot of reading for an upcoming presentation at the American Library Association annual meeting on learning analytics, library patron privacy, and data management. This panel presentation is being given in response to the trend in academic libraries to mine existing data, especially patron-level data, to justify the library’s value on campus. I’m particularly interested in this topic because I recognize many practices from data management that can inform the design of such projects. Because of this, I often have a hard time reading literature in this area as I’m finding data-handling practices that I would advise against as a data management expert.

One report that has especially struck me as problematic is the ACRL’s “The Value of Academic Libraries” report from 2010. This report gives lots of details on how libraries can benefit from an expansion of their assessment programs by collecting new types of data for analysis. The problem that I have with this report is that it gives only a token nod to privacy as something to be accounted for (note: privacy is a core value in librarianship).

The report states that privacy considerations need to be worked out but fails to contextualize it’s major recommended assessment strategies within the scope of actual privacy and security practices. It says “account for privacy” without telling one how to do so and fails to acknowledge privacy in all of the parts of the research process where it belongs (such as in planning, data collection, training, etc.). In other words, the report does not to do what all human subject research should do: bake privacy and security considerations into all aspects of a project from the very beginning.

Here’s where I come back around to general research best practices, as this is a data management blog and not a library blog. All research projects involving human subjects or personally identifiable information need to account for – from the very beginning of a project – participant privacy and how to keep personal information safe. This should happen even before a researcher even collects her first data point and continue through the end of the project.

Baking privacy and security considerations into a project from the very beginning affects things in several ways.

The first way is administrative. Participant privacy is a significant part of the Institutional Review Board (IRB) process for getting approval for human subject research. If a researcher cannot describe his security practices – from secure storage to anonymization strategies – he does not get approval for that research. In other words, it’s a requirement for human subject research conducted at a university.

Second is design. Focusing on privacy affects the design decisions of a study in terms of what information a researcher should/shouldn’t collect and how that information is stored. It’s possible that in taking this privacy lens, a researcher will have to limit an avenue of inquiry, but such avenues would pose a risk to research participants. An important part of doing such research is in the balancing of risk and reward – this cannot be an afterthought.

Third, a focus on participant privacy is a focus on ethical research. In the words of Zook, et al. (from a big data paper I rather like), researchers should “acknowledge that data are people and can do harm”. Only by incorporating privacy and security considerations upfront and throughout a project can we truly make such an acknowledgement.

Data librarians tend to talk about data management plans a lot, in that they should be created at the start of every research project. This recommendation goes from “should” to “must” when personal information is involved. Researchers conducting projects involving people need to make data handling decisions upfront and keep making privacy-based decisions throughout a project. Privacy is a feature, not a bug. We do poorly by our study participants by acting in any other way.

If you are interested in library learning analytics, privacy, and data management and will be in Chicago on June 25th, I encourage you to attend my ALA panel. It’s sure to be an interesting discussion!