Category: Data science

Philosophy of Data Science series – Noortje Marres: Technology and culture are becoming more and more entangled.

Mark Carrigan continues his investigation of data science with this latest interview with Noortje Marres on Digital Sociology. Growing digital awareness means lots of opportunities for collaboration between sociology and related fields and there is also a chance for sociologists to challenge the deeply-rooted narrative of a clash between technology and democracy. This interview is part of an ongoing series on the Philosophy of […]

Stand Up and Be Counted: Why social science should stop using the qualitative/quantitative dichotomy

Qualitative and quantitative research methods have long been asserted as distinctly separate, but to what end? Howard Aldrich argues the simple dichotomy fails to account for the breadth of collection and analysis techniques currently in use. But institutional norms and practices keep alive the implicit message that non-statistical approaches are somehow less rigorous than statistical ones. Over the past year, I’ve met with many doctoral students […]

Five Minutes with Marieke Guy: “By opening up data, citizens can be more directly informed and involved in decision-making.”

What exactly is open data and how does it relate to education? Marieke Guy from the Open Knowledge Foundation will be speaking at the LSE this Wednesday 26 November 5-7pm as part of the Learning Technology and Innovation NetworkED series (booking still open). Ahead of her talk she answers a few questions on the opportunities and vulnerabilities involved in providing greater access to […]

Standards for scientific graphic presentation: Interactive figures could significantly improve understanding of data.

Over the previous hundred years, a lot of work has gone into standardizing the way scientific data is presented. All of this knowledge has been largely forgotten. Jure Triglav wants us to bring the past back to life. Drawing on lessons learned from the New York City subway system and the graphic standards of 1914, he argues for the modernization […]

Data Descriptors: Providing the necessary information to make data open, discoverable and reusable.

Data need to be more than just available, they need to be discoverable and understandable. Iain Hrynaszkiewicz introduces Nature’s new published data paper format, a Data Descriptor. Peer-review and curation of these data papers will facilitate open access to knowledge and interdisciplinary research, pushing the boundaries of discovery. Some of the most tangible benefits of open data stem from social and interdisciplinary […]

How is data science different to mainstream statistics? Communication and visualization are key features of analysis.

Hadley Wickham argues statistics is a part of data science, but not the whole thing. Data science is addressing many of the areas ignored by mainstream academic statistics. For example statistics has a lot to say about collecting data but little to say about refining questions crucial for good analysis. The end product of an analysis is not a model: it […]

Book Review: Visual Insights: A Practical Guide to Making Sense of Data by Katy Börner and David E. Polley

This book, developed for use in an information visualisation MOOC, covers data analysis algorithms that enable extraction of patterns and trends in data, with chapters devoted to “when” (temporal data), “where” (geospatial data), “what” (topical data), and “with whom” (networks and trees); and to systems that drive research and development. Jamie Cross finds that the book’s hands-on sections demand time and effort, and […]

Data carpentry is a skilled, hands-on craft which will form a major part of data science in the future.

As data science becomes all the more relevant and indeed, profitable, attention has been placed on the value of cleaning a data set. David Mimno unpicks the term and the process and suggests that data carpentry may be a more suitable description. There is no such thing as pure or clean data buried in a thin layer of non-clean data. In reality, […]