A Data Visualization Work in Progress

Data Visualization for Social Science: A Practical Introduction with R and ggplot2

I’m writing a book on data visualization, provisionally titled Data Visualization for Social Science: A practical introduction with R and ggplot2. As part of that process, largely because I’ve benefited so much myself from the availability of open and widely shared tools for software development, I’m making the draft version of the book available as its own website. It can be found at http://socviz.co.

The pitch for the book, more or less, is that it tries to cover best practices in data visualization, for common social science tasks, grounded in good empirical work on the perception of graphics, in a way that is clear, friendly, and provides the code you need to actually make the graphs. Here’s an excerpt from the Preface:

The main goal of this book is to introduce you to both the ideas and the methods of data visualization in a comprehensible and reproducible way. Some classic works on visualizing data, such as Tufte (1983), present numerous examples together with some general taste-based rules of thumb for constructing and assessing plots. In what has now become a large and thriving field of research, more recent books provide excellent discussions of the cognitive underpinnings of successful and unsuccessful graphics, with many compelling and illuminating examples Ware (2008). Others provide consistent, thorough, and sensible advice about how to graph data under different circumstances (Cairo 2013, Few 2009, Munzer 2014), but choose not to introduce the reader to the tools used to produce the graphics they show. This may be because the software used is an (often proprietary or costly) point-and-click application that requires a fully visual introduction of its own, such as Tableau, Microsoft Excel, or SPSS. Or perhaps the necessary software is freely available, but showing how to use it is not what the book is about Cleveland (1994). Conversely, we have excellent cookbooks that provide code “recipes” for many kinds of plot, but for that reason do not take the time to introduce the beginner to the principles behind the output they produce (Chang 2013). Finally, thorough introductions to particular software tools and libraries also exist, but can sometimes be hard for beginners to digest, as they sometimes presuppose a background in either statistical methods or software concepts that the reader may not have (Wickham 2016).

Each of the texts I just cited is well worth your time. When teaching people how to make graphics with data, however, I have repeatedly found the need for an introduction that takes the time to explain why you are doing something, but without skipping the necessary details of how to produce the images you see on the page. So this book has two main aims. First, I want you get you to the point where you can reproduce for yourself almost every figure in the text, while understanding why the code is written the way it is. And second, I want you be able to look at some data of your own, and feel confident about your ability to get from a rough picture in your head to the code that produces a high-quality graphic on your screen or page.

This book is a hands-on introduction to the principles and practice of looking at and presenting data using R and ggplot. R is a powerful, widely used, and freely available programming language for data analysis. You may be interested in exploring ggplot after having used R before, or be entirely new to both R and ggplot and just want to graph your data. I do not assume you have any prior knowledge of R.

I hope people will find it useful, and welcome feedback on the manuscript.