Data Visualization for Social Science: A Practical Introduction with R and ggplot2 I’m writing a book on data visualization, provisionally titled Data Visualization for Social Science: A practical introduction with R and ggplot2. As part of that process, largely because I’ve benefited so much myself from the availability of open and widely shared tools for software development, I’m making the draft version of the book available as its own website.
I saw this pie chart via Beth Popp Berman on Twitter yesterday:
Pie charts of student debts by percent of all borrowers and percent of all debt. As you probably know, the perceptual qualities of pie charts are not great. In a single pie chart, it is usually harder than it should be to estimate and compare the values shown, especially when there are more than a few wedges and when there are a number of wedges reasonably close in size.
The Congressional Budget Office released its cost estimate report for the American Health Care Act yesterday. There are a few tables at the back summarizing the various budgetary and coverage effects of the proposed law. Of these, Table 4 is pretty interesting. The CBO “projected the average national premiums for a 21-year-old in the nongroup health insurance market in 2026 both under current law and under the AHCA. On the basis of those amounts, CBO calculated premiums for a 40-year-old and a 64-year-old, assuming that the person lives in a state that uses the federal default age-rating methodology”.
Update: Since writing this post, I’ve repeatedly tried to delete the offending review from my profile, but Google Scholar keeps re-inserting it as part of its automated trawl through its corpus of articles. So it seems that the robots are determined to grant me these citations whether I want them or not.
Google Scholar is one of the most visible and widely-used examples of the rise of “impact measurement” in academia.
I was playing with some county-level data from the U.S. general election, partly out of a spirit of honest inquiry and partly out of a feeling of morbid curiosity. Because I had some county-level census data to hand, I took a look at the results using some extremely basic demographic information—the two variables that structure America’s ur-choropleths, namely population density and percent black. I focused on the counties that flipped from their vote in the 2012 general election.
Yesterday I had a conversation on Twitter with Josh Zumbrun that followed on from this tweet:
This is one of the most horrifying graphics I've ever seen:https://t.co/wM0VJZn0Wg pic.twitter.com/qaUaNFtRPl
— Josh Zumbrun (@JoshZumbrun) September 28, 2016 The striking maps he linked to tracked the rise in deaths due to drug-related overdoses over the past 15 years, caused in large part to the surge in use of heroin and synthetic opiates. The details are in the WSJ report on the problem.
Last year I wrote about vaccination exemptions in California kindergartens, drawing on school-level data provided by the state of California about the number of kindergarteners with “personal belief exemptions” (or PBEs) that allow them not to be vaccinated. Today I came across a ggplot package called ggbeeswarm that’s designed to create a “beeswarm plot”, or a 1-D scatterplot with a bit of information about the density of the distribution. I had used geom_jitter to do something like this for one of my plots last year, but the geoms in ggbeeswarm are better.
Here are two small sites I made recently, and which I may continue to tweak and expand. The first, plain-text.co, presents “The Plain Person’s Guide to Plain-Text Social Science”. It is designed to address some questions about managing research and writing projects in the social sciences using plain-text and free or mostly-free tools like Emacs (or other text editors), R, pandoc, and make. The second, vissoc.co which I’ve mentioned before, compiles notes from a short course in data visualization I taught last semester.
In the next week or two I’ll be talking to some social science students about tools for doing research and writing up results. Over the years I’ve accumulated various things on the topic, ranging from bits of advice to templates or things I use myself. My focus is on managing the various pieces of the work process in plain-text, especially when it comes to writing code you can read later, and keeping track of the work you’ve done.
ASA Section Membership and Revenues. I taught a half-sized introductory seminar on data visualization last semester. It’s an introduction to some principles of data visualization for working social scientists, and is focused mostly on teaching people how to use ggplot effectively. I’ve made the (slightly rough-and-ready) course notes available as a website. The notes include numerous code samples, .Rmd files for every week, and there’s a GitHub repository containing all the material to build the site, including the datasets used to make the plots.