Data Visualization, Second Edition

I’ve written a second edition of Data Visualization: A Practical Introduction, which ideally should come out with Princeton University Press later this year. As with the first edition, a full draft of the book is available at https://socviz.co. The production process is just getting started so there’s no new cover yet, and there isn’t a link to pre-order. But (also like last time) I’ve put up a link to a form that lets you add your email if you’d like to be notified when it’s available to buy. You’ll only get one email (from me personally, not a marketing department) if you do; no spam or anything.

The revised edition is a pretty thorough rewrite. Naturally all the code is brought up to date for ggplot 4 and R version 4.5 and higher. The code from the first edition still runs, but you’ll get warnings and so on; those are all now gone. The back half of the book has been pretty thoroughly redone to reflect big changes in the availability of software for maps, (the sf package) and extracting results from models (the marginaleffects package). Meanwhile, several years of teaching this material (and getting feedback from others) have resulted in shifts of emphasis here and there to introduce just a little bit more on data wrangling. As the book goes on I also shift from an “object-based” approach to writing plots to a more “pipeline-based” one.

The recent rise of LLMs and coding agents gets some discussion, too. There the question is “Why can’t I just have a robot write all the code for me?” I don’t dismiss this question out of hand, and I don’t pretend that agents aren’t very powerful. My feeling about this is summed up in the Preface:

Perhaps you have a robot to help you write your code now. Large Language Models (LLMs) and coding agents are now part of the workflow of code generation and evaluation. They can do a great deal; so much so that it might seem superfluous to spend any time with the iterative, write-try-redo approach to visualization that this book presents. Can’t the robot write all the code instead? Not quite. It’s not that I believe repeatedly doing repetitive and error-prone tasks yourself is a virtue. To the contrary, that’s what computers are for. This book is full of examples where we end up automating something in order not to worry about it. But I also want you, the reader, to learn how to do good graphical work in a reproducible way. That means having a keen eye for quality and a good nose for error. Cultivating those senses requires practice and a vocabulary to express them. It seems faintly absurd to have to say explicitly but, whatever tools you use, your work will be better if you know what you are doing and understand why you are doing it. This book teaches you ggplot specifically, but it is not trying to lock you in to a particular framework. It’s just that, the way you acquire a general skill or a wide-ranging taste is by first learning some more specific version of those things, and then practicing them. Automation can come a later. In the words of the author Ann Leckie, you don’t learn how to do something by not doing it. For that reason, this book remains a hands-on introduction.

Or to put it another way, the book is an introduction to how to do something. One feature of books like it is that they tend to have two audiences: people who don’t know anything about the topic, and who’d like to learn something about it, and people who know a lot, at least in relative terms, and who have forgotten what it’s like not to know it. When the first edition came out, one of the early Amazon reviews was a complaint that the book seemed “pretty introductory” in its content. I mean, my Brother in Christ, that is right there in the title.

As with any corner of the vast division of labor that is human society, not everyone has to know about any specific thing in great detail. We’re all taking huge amounts of stuff for granted at any moment. But if you want to be proficient in some piece of that enormous web, it’s better that you know rather than not know what’s what. There’s nothing wrong with using tools that give you tremendous leverage. You do it every time you use a stand mixer in the kitchen, or a sander in the garage. You do it every time you turn your computer on, in fact. But you still need to develop the capacity to tell good work from bad, or correct from incorrect output, or safe uses from dangerous ones. That way you can take advantage of the power tools without being at risk of slicing your own or anyone else’s arm off.