Categories ▸ Sociology
Back in April, in Ireland, my nephew Luke made his first communion alongside his school classmates. I did much the same thing myself in much the same place about forty years ago. My brother tells me that the preparation nowadays is a little more humane than the version we enjoyed. But there is as much anticipation beforehand, and no less excitement on the day. Luke’s little suit lacked the stylish navy-blue velvet panels mine sported in 1980, but in essence the event was the same in its purpose, its form, and in most of its details.
PhDs awarded in selected disciplines, 2006-2016.
Thierry Rossier asked me for the code to produce plots like the one above. The data come from the Survey of Earned Doctorates, a very useful resource for tracking trends in PhDs awarded in the United States. The plot is made with geom_line() and geom_label_repel(). The trick, if it can be dignified with that term, is to use geom_label_repel() on a subset of the data that contains the last year of observations only.
I was playing around with the gganimate package this morning and thought I’d make a little animation showing a favorite finding about the distribution of baby names in the United States. This is the fact—I think first noticed by Laura Wattenberg, of the Baby Name Voyager—that there has been a sharp, relatively recent rise in boys’ names ending in the letter ‘n’, at the expense of names with ‘e’, ‘l’, and ‘y’ endings.
The data from the 2018 wave of the General Social Survey was released during the week, leading to a flurry of graphs showing various trends. The GSS is one of the most important sources of information on various aspects of U.S. society. One of the best things about it is that the data is freely available for more than forty years worth of surveys. Here I’ll walk through my own quick look at the data, in order to show how R can tidily manage data from a complex survey.
A few years ago I wrote a post about the stickiness of college and university rankings in the United States. It’s been doing the rounds again, so I thought I’d revisit it and redraw a few of the graphs I made then.
In 1911, Kendric Babcock made an effort to rank US Universities and Colleges. In his report, Babcock divided schools into four Classes, beginning with Class I:
The better sort of school.
I was asked for some examples of posters I’ve made using R and ggplot. Here are four. Some of these are done from start to finish in R, others involved some post-processing in Illustrator, usually to adjust some typographical elements or add text in a sidebar. I’ve linked to a PDF of each one, along with a pointer to the original post about the graphic.
If you’re interested in learning more about how to making graphs and charts using R and ggplot, then by a staggering coincidence there’s a new visualization book out that can help you with that.
Based on the heatmaps I drew earlier this month, I made a poster of two centuries of data on mortality rates in France for males and females. It turned out reasonably well, I think. I will probably get it blown up to a nice large size and put it up on the wall. I’ve had very good results with PhD Posters for work like this over the years, by the way.
Data Visualization: A Practical Introduction will begin shipping next week. I’ve written an R package that contains datasets, functions, and a course packet to go along with the book. The socviz package contains about twenty five datasets and a number of utility and convenience functions. The datasets range in size from things with just a few rows (used for purely illustrative purproses) to datasets with over 120,000 observations, for practicing with and exploring.
As part of the run-up to the release of Data Visualization (out in about ten days! Currently 30% off on Amazon!), I’ve been playing with graphing different kinds of data. One great source of rich time-series data is mortality.org, which hosts a collection of standardized demographic data for a large number of countries. Mortality rates are often interesting to look at as a heatmap, as we get data for a series of ages (e.
Since the U.S. midterm elections I’ve been playing around with some Congressional Quarterly data about the composition of the House and Senate since 1945. Unfortunately I’m not allowed to share the data, but here are two or three things I had to do with it that you might find useful.
The data comes as a set of CSV files, one for each congressional session. You download the data by repeatedly querying CQ’s main database by year.
To be notified of updates, you can
subscribe to the RSS feed for the site.