Category: Data science

Standards for scientific graphic presentation: Interactive figures could significantly improve understanding of data.

Over the previous hundred years, a lot of work has gone into standardizing the way scientific data is presented. All of this knowledge has been largely forgotten. Jure Triglav wants us to bring the past back to life. Drawing on lessons learned from the New York City subway system and the graphic standards of 1914, he argues for the modernization […]

Data Descriptors: Providing the necessary information to make data open, discoverable and reusable.

Data need to be more than just available, they need to be discoverable and understandable. Iain Hrynaszkiewicz introduces Nature’s new published data paper format, a Data Descriptor. Peer-review and curation of these data papers will facilitate open access to knowledge and interdisciplinary research, pushing the boundaries of discovery. Some of the most tangible benefits of open data stem from social and interdisciplinary […]

How is data science different to mainstream statistics? Communication and visualization are key features of analysis.

Hadley Wickham argues statistics is a part of data science, but not the whole thing. Data science is addressing many of the areas ignored by mainstream academic statistics. For example statistics has a lot to say about collecting data but little to say about refining questions crucial for good analysis. The end product of an analysis is not a model: it […]

Book Review: Visual Insights: A Practical Guide to Making Sense of Data by Katy Börner and David E. Polley

This book, developed for use in an information visualisation MOOC, covers data analysis algorithms that enable extraction of patterns and trends in data, with chapters devoted to “when” (temporal data), “where” (geospatial data), “what” (topical data), and “with whom” (networks and trees); and to systems that drive research and development. Jamie Cross finds that the book’s hands-on sections demand time and effort, and […]

Data carpentry is a skilled, hands-on craft which will form a major part of data science in the future.

As data science becomes all the more relevant and indeed, profitable, attention has been placed on the value of cleaning a data set. David Mimno unpicks the term and the process and suggests that data carpentry may be a more suitable description. There is no such thing as pure or clean data buried in a thin layer of non-clean data. In reality, […]