Yesterday I came across Aaron Penne’s collection of very nice data visualizations, one of which was of monthly births in the United States since 1933. He made a tiled heatmap of the data, taking care when calculating the average rate to correct for the varying number of days in different months. Aaron works in Python, so I took the opportunity to play around with the data and redo the plots in R.
On Twitter the other day, Philip Cohen put up some data on changes in Bachelor’s degrees awarded between 1995 and 2015. The data come from the National Center for Education Statistics. It seemed like a good candidate for drawing as a figure, so I had a go at it:
Changes in the number of Bachelor’s degrees awarded over the past twenty years.
Afterwards, I was messing around with the data and wanted to draw some time-series plots for the various subject areas the NCES tracks.
Data Visualization: A Practical Introduction will be published later this year by Princeton University Press. You can read a near-complete draft of the book at socviz.co. If you would like to receive one (1) email when the book is available for pre-order, please fill out this very short form. The goal of the book is to introduce readers to the principles and practice of data visualization in a humane, reproducible, and up-to-date way.
Data Visualization for Social Science: A Practical Introduction with R and ggplot2
I’m writing a book on data visualization, provisionally titled Data Visualization for Social Science: A practical introduction with R and ggplot2. As part of that process, largely because I’ve benefited so much myself from the availability of open and widely shared tools for software development, I’m making the draft version of the book available as its own website.
Here are two small sites I made recently, and which I may continue to tweak and expand. The first, plain-text.co, presents “The Plain Person’s Guide to Plain-Text Social Science”. It is designed to address some questions about managing research and writing projects in the social sciences using plain-text and free or mostly-free tools like Emacs (or other text editors), R, pandoc, and make. The second, vissoc.co which I’ve mentioned before, compiles notes from a short course in data visualization I taught last semester.
The Gravitational Waves paper that was in the news yesterday has almost a thousand authors. (Actually there’s more than one paper—there’s the “discovery” paper and the “implications” paper.) Out of interest, I fed the list of authors in the “implications” paper into R and constructed an affiliation network with ties based on the university or research institute listed. Then I colored the nodes by the country of the primary institutional affiliation.
ASA Section Membership and Revenues.
I taught a half-sized introductory seminar on data visualization last semester. It’s an introduction to some principles of data visualization for working social scientists, and is focused mostly on teaching people how to use ggplot effectively. I’ve made the (slightly rough-and-ready) course notes available as a website. The notes include numerous code samples, .Rmd files for every week, and there’s a GitHub repository containing all the material to build the site, including the datasets used to make the plots.
A few days ago, Matt Yglesias shared this tweet from Liz Ann Sonders, Chief Investment Strategist with Charles Schwab, Inc:
DailyShot: Here is a comparison of the monetary base with the S&P500 ... Coincidence? pic.twitter.com/QsdNhJdbRP
— Liz Ann Sonders (@LizAnnSonders) January 15, 2016 Matt remarked that “Friends don’t let friends use two y-axes”. It’s a good rule. The topic came up a couple of times during the data visualization short course I taught last semester.
I’m teaching a short graduate seminar on Data Visualization with R this semester. Following Matt Salganik, I wanted students to be able to submit homework or other assignments as R Markdown files, but to have a way to make sure their R code passed some basic stylistic checks provided by lintr before they submitted it to me. Students write .Rnw files containing discussion or notes interspersed with chunks of R code.
The United Kingdom’s election results are being digested by the chattering classes. So, yesterday afternoon I thought I’d see if I could grab the election data to make some pictures. Because the ever-civilized BBC has election web pages with a sane HTML structure, this proved a lot more straightforward than I feared. (Thanks also in no small part to statistician Hadley Wickham’s rvest scraping library, alongside many other tools he has contributed to the community of social scientists who use R to do data analysis.