Open data initiatives may hold much promise and value, but more attention is needed on how these projects are developing as complex socio-technical systems. Rob Kitchin elaborates on four specific areas that have yet to be fully interrogated. These critiques affirm that open data initiatives need to be much more mindful of the positive and negative implications of how open data functions within such complexity.
I’ve been a long time supporter of open data and providing analytic tools to citizens to enable evidence-informed participation in public debate. Since 2006, when it was initially established as the Cross-Border Regional Research Observatory, I have been PI on the All-Island Research Observatory, a project that provides access to various government datasets in the Republic of Ireland, Northern Ireland and Europe, along with interactive mapping and graphing tools. The core project team of Justin Gleeson, Aoife Dowling and Eoghan McCarthy have worked hard to leverage datasets out of various agencies and negotiate more favourable licensing terms, add value and insight to these datasets, promote data journalism through collaboration with the Irish Times and Irish Examiner, and provide open access to a couple of thousand datasets through the AIRO datastore.
The arguments concerning the benefits of open data are now reasonably well established and include contentions that open data lead to increased transparency and accountability with respect to public bodies and services; increases the efficiency and productivity of agencies and enhances their governance; promotes public participation in decision making and social innovation; and fosters economic innovation and job and wealth creation (Pollock 2006; Huijboom and Van der Broek 2011; Janssen 2012; Yiu 2012).
What is less well examined are the potential problems affecting, and negative consequences of, open data initiatives. Consequently, as a provocation for Wednesday’s (Nov 13th, 4-6pm) Programmable City open data event I thought it might be useful to outline four critiques of open data, each of which deserves and demands critical attention: open data lacks a sustainable financial model; promotes a politics of the benign and empowers the empowered; lacks utility and usability; and facilitates the neoliberalisation and marketisation of public services. These critiques do not suggest abandoning the move towards opening data, but contend that open data initiatives need to be much more mindful of what data are being made open, how data are made available, how they are being used, and how they are being funded.
Funding and sustainability
Because, to date, attention has been largely focused on the supply-side of accessing data and creating open data initiatives, insufficient attention has been paid to the economics of creating sustainably funded initiatives. Data might be non-rivalrous in nature, meaning that it can distributed for marginal cost but the initial copy needs to be paid for along with on-going data management and customer service (Pollock 2006). As such, open data might well be a free resource for end-users, but its production and curation is certainly not without significant cost (especially with respect to appropriate technologies and skilled staffing). In many cases, the data being opened has to date been a major source of revenue for organisations, and in the case of companies, competitive advantage. A key question, therefore, centres on how open data projects are funded sustainably in the absence of a direct revenue stream?
A number of different models have been suggested (see Ferro and Osella 2013), but it is generally acknowledged that securing a stable financial base is best achieved by direct government subvention. Here, it is argued that such a subvention will be offset by two factors. First, open data will produce diverse consumer surplus value, generating significant public goods which are worth the investment of public expenditure. Second, open data will lead to new innovative products that will create new markets, which in turn will produce additional corporate revenue and tax receipts (Pollock 2009). These tax receipts will be in excess of additional government costs of opening the data. This may well be the case with high value datasets such as mapping and transport data, but much less likely with most other datasets.
de Vries et al. (2011) reported that the average apps developer made only $3,000 per year from apps sales, with 80 percent of paid Android apps being downloaded fewer than 100 times. In addition, they noted that even successful apps, such as MyCityWay which had been downloaded 40 million times, were not yet generating profits. Instead, venture capitalists are investing in projects with potential whilst a sustainable business model is sought. Given austerity and cutbacks across governments finding the necessary funds to open data is a challenge. And yet, the consequences of reductions or fluctuations in the financial base of open data services are likely to be a decline in data quality, responsiveness, innovation, and general performance (Pollock 2008). At present, the jury is still out on whether opening up all public sector data is economically viable and sustainable, especially in the short term.
Politics of the benign and empowering the empowered
Another consequence of focusing on gaining access to the data, is to ignore the politics of the data themselves, what the data reveals, or how they are used and for whose interests (Shah 2013). The open data movement largely seeks to present an image of being politically benign and commonsensical, promoting a belief that opening up data is inherently a good thing in and of itself by democratising data. For others, making data accessible is just one element with respect to the notion of openness. Just as important are what the data consist of, how they can be used, and how they can create a more just and equitable society. If open data merely serves the interests of capital by opening public data for commercial re-use and further empowers those who are already empowered and disenfranchises others, then it has failed to make society more democratic and open (Gurstein 2011; Shah 2013).
Implicit in most discussions on open data is that the data is neutral and objective in nature and that everyone has the potential to access and use such data (Gurstein 2011; Johnson 2013). However, these are not the case. With respect to open data themselves, as Johnson (2013) contends, a high degree of social privilege and social values are embedded in public sector data with respect to what data are generated relating to whom and what (especially within domains that function as disciplinary systems, such as social welfare and law enforcement), and whose interests are represented within the data set and whose interests are excluded. As such, value structures are inherent in data sets and these subsequently shape analysis and interpretation and work to propagate injustices and reinforce dominant interests.
Citizens have differential access to the hardware and software required to download and process open data sets, as well as varying levels of skills required to analyze, contextualize and interpret the data (Gurstein 2011). And even if some groups have the ability to make compelling sense of the data, they do not necessarily have the contacts needed to gain a public voice and influence a debate, or the political skill to take on a well resourced and savvy opponent. As such, the democratic potential of open data has been overly optimistic, with most users those with high degrees of technical knowledge and an established political profile (McClean 2011). Indeed, open data can work to further empower the empowered and to reproduce and deepen power imbalances (Gurstein 2011). An oft-cited example of the latter is the digitization of land records in Karnataka, India, where an open data project, which was promoted as a ‘pro-poor’ initiative, worked to actively disenfranchise the poor by enabling those with financial resources and skills to access previously restricted data and to re-appropriate their lands (Gurstein 2011; Slee 2012; Donovan 2012). Far from aiding all citizens, in this case open data facilitated a change in land rights and a transfer of wealth from poor to rich. In other words, opening data does not mean an inherent democratization of data. Indeed, open data can function as a tool of disciplinary power (Johnson 2013).
Utility and usability
In a study of a number of different open data projects, Helbig et al. (2012) reported that many are too technically focused amounting to “little more than websites linked to miscellaneous data files, with no attention to the usability, quality of the content, or consequences of its use.” The result is a set of open data sites that operate more as data holdings or data dumps, lacking the qualities expected in a well organised and run data infrastructure such as clean, high quality, validated and interoperable data that comply with data standards and have appropriate metadata and full record sets (associated documentation); preservation, backup and auditing policies; re-use, privacy and ethics policies; administrative arrangements, management organisation and governance mechanisms; and financial stability and a long term plan of development and sustainability. Many sites also lack appropriate tools and contextual materials to support data analysis. Moreover, the data sets released are often low-hanging fruit, consisting of those that are easy to release and contain non-sensitive data that has relatively low utility. In contrast, data that might be more difficult and demanding to make open, due to issues of sensitivity or because they require more management work to comply with data protection laws, often remain closed (Chignard 2013).
Part of the issue is that many open data sites have been rough and ready responses to an emerging phenomena. They have been built by enthusiasts and organisations who have little experience of data archiving or the contextual use of the data being opened. They have been supported and promoted by hackathons and data dives, which reproduce many of these issues. As McKeon (2013) and Porway (2013) contend, these events, which invite coders and other interested parties to build apps using open data, can do as much harm as good. Whilst they do focus attention on the data and are good for networking, those doing the coding often have little deep contextual knowledge with regards to what the data refers, belong to a particular demographic that is not reflective of wider society (e.g., young, educated and tech-orientated), and believe that deep structural problems can be resolved by technological solutions. In other words, they are “built by a micro-community of casual volunteers, not by people with a deep stake in seeing the project succeed” (McKeon 2013). Further, hackathon created solutions often remain at version 1.0, with little after event follow-up, maintenance or development.
Because of these various teething issues, rather than creating a virtuous cycle, where the release of more and more data sets, in more formats, produces growing use, and therefore the release of more data, as assumed by the open data movement, Helbig et al. (2013) note that many sites have low and declining traffic as they do not encourage use or facilitate users, and are limited by other factors such as data management practices, agency effort and internal politics. After an initial spark of interest, data use drops quite markedly as the limitations of the data are revealed and users struggle to work out how the data might be profitably analyzed and used. McClean (2011), for example, notes that analysis arising from open data has had limited impact on political debates, and concludes with respect to COINS (government financial data in the UK), that after “a brief flurry of media interest in mid-2010, in the immediate aftermath of the release, … reports explicitly mentioning COINS are now extremely rare and those members of the press who were most interested obtaining access to it report that it has not proved particularly useful as a driver of journalism.”
Where data are released periodically (e.g., quarterly or annually), usage tends to be cyclical and often tied to specific projects (such as consultancy reports) rather than to have a more consistent pattern of use. In such cases, Helbig et al. (2012) observed a set of negative or balancing feedback loops slowed the supply of data and use, thus further decreasing usage. Thus, after some initial ‘quick wins’, the danger is that any virtuous cycle shifts from being positive to negative, and thus the rationale for central government funding of such initiatives is undermined and in due course cut.
Neoliberalisation and marketisation of public services
Jo Bates (2012) argues, “open initiatives such as OGD [open government data] emerge into a historical process, not a neutral terrain.” As with all political initiatives, the politics of open data are not simply commonsensical or neutral, but rather are underpinned by political and economic ideology. The open data movement is diverse and made up of a range of constituencies with different agendas and aims, and is not driven by any one party. However, Bates makes the case that the open data movement, in the UK at least, had little political traction until big business started to actively campaign for open data, and open government initiatives started to fit into programmes of forced austerity and the marketisation of public services. For her, political parties and business have appropriated the open data movement on “behalf of dominant capitalist interests under the guise of a ‘Transparency Agenda’” (Bates 2012).
In other words, the real agenda of business interested in open data is to get access to expensively produced data for no cost, and thus a heavily subsidised infrastructural support from which they can leverage profit, whilst at the same time removing the public sector from the marketplace and weakening its position as the producer of such data. Indeed, because the income from data/data services disappears by opening data (which is especially acute in trading funds where data production and management was largely being funded by fees with some public subsidy), public sector bodies are more likely to be forced outsource such services to the private sector on a competitive basis or cede data production to the private sector which they then have to procure (Gurstein 2013). Here, data services and data derived from public data has to be purchased back by the data creator. At the same time the data literacy of the organisation is hollowed out. Moreover, because open data often concerns a body’s own activities, especially when supplemented by key performance indicators, they facilitate public sector reform and reorganisation that promotes a neoliberal, New Public Management ethos and private sector interests (McClean 2011; Longo 2011).
Such processes, Bates (2013) argues, are part of a deliberate political strategy to open up the “provision of almost all public services to competition from private and third sector providers”, with open data about public services enabling “service users to make informed choices within a market for public services based on data-driven applications produced by a range of commercial and non-commercial developers” (original emphasis). In such cases, the transparency agenda promoted by politicians and businesses is merely a rhetorical discursive device. If either party was genuinely interested in transparency then it would be equally supportive of the right to information movement (freedom of information) and the work of whistleblowers (Janssen 2012) and also loosening the shackles of intellectual property rights more broadly (Shah 2013). Instead, governments and businesses are generally resistant to both.
Open data initiatives hold much promise and value. They are radically altering access to publicly produced data and making new kinds of analysis possible. They are creating new forms of transparency and accountability, fostering new form of social participation and evidence-informed modes of governance, and promoting innovation and wealth generation. At the same time, much more critical attention needs to be paid to how open data projects are developing as complex socio-technical systems with diverse stakeholders and agendas. To date, efforts have concentrated on the political and technical work of establishing open data projects, and not enough on studying these discursive and material moves and their consequences. As a result, we lack detailed case studies of open data projects in action, the assemblages surrounding and shaping them, and the messy, contingent and relational ways in which they unfold. It is only through such studies that are more complete picture of open data will emerge, one that reveals both the positive and negatives of such projects, and which will provide answers to more normative questions concerning how they should be implemented and to what ends.
This post is a modified extract from a forthcoming book by Rob Kitchin, The Data Revolution: Big Data, Open Data, Data Infrastructures and Their Consequences (Sage, London).
This originally appeared on the Programmable City blog and is reposted with the author’s permission.
Note: This article gives the views of the author, and not the position of the Impact of Social Science blog, nor of the London School of Economics. Please review our Comments Policy if you have any concerns on posting a comment below.
Prof. Rob Kitchin is an ERC Advanced Investigator on The Programmable City project at the National Institute for Regional and Spatial Analysis at the National University of Ireland Maynooth. He’s PI for two data infrastructures – the All-Island Research Observatory and the Digital Repository of Ireland – and is author or editor of 22 books. His webpage can be found here.