Mon, Nov 19, 2018

Zero Counts in dplyr

Here’s a feature of dplyr that occasionally bites me (most recently while making these graphs). It’s about to change mostly for the better, but is also likely to bite me again in the future. If you want to follow along there’s a GitHub repo with the necessary code and data. Say we have a data frame or tibble and we want to get a frequency table or set of counts out of it.

Sat, Nov 17, 2018

Congress Over Time

Since the U.S. midterm elections I’ve been playing around with some Congressional Quarterly data about the composition of the House and Senate since 1945. Unfortunately I’m not allowed to share the data, but here are two or three things I had to do with it that you might find useful. The data comes as a set of CSV files, one for each congressional session. You download the data by repeatedly querying CQ’s main database by year.

Tue, Nov 6, 2018

Spreading Multiple Values

Earlier this year my colleague Steve Vaisey was converting code in some course notes from Stata to R. He asked me a question about tidily converting from long to wide format when you have multiple value columns. This is a little more awkward than it should be, and I’ve run into the issue several times since then. I’m writing down the answer (or, an answer) here so that I can find it again myself.

Wed, Sep 12, 2018

Asa Section Demographics

The American Sociological Association released some data on its special-interest sections, including some demographic breakdowns. Dan Hirschman wrote a post on Scatterplot looking at some of the breakdowns. Here are some more. I was interested in two things: first, the relative prevalence of Student and Retired members across sections, and second the distribution of women across sections. About 53% of all ASA members are women, substantially higher than some other social sciences and many other academic disciplines.

Wed, Aug 1, 2018

I Can't Believe It's Not Butter

Yesterday, Vox ran a story about changes in food consumption patterns in the United States over the past few decades. It featured this graph: Vox Time Series When I saw it, one of those little bells went off in my head: As a rule, when you see a sharp change in a long-running time-series, you should always check to see if some aspect of the data-generating process changed—such as the measurement device or the criteria for inclusion in the dataset—before coming up with any substantive stories about what happened and why.

Thu, May 31, 2018

Conversational Disciplines

Recently Tyler Cowen asked whether there has been progress in Philosophy. Agnes Callard wrote a thoughtful reply, saying amongst other things: We don’t demand progress in the fields of fashion or literature, because these things please us. Philosophy, by contrast, is bitter, and we want to know what good it will do us, and when, finally, it will be over. It is not pleasant to be told that maybe you don’t know who you are, or how to treat your friends, or how to be happy.

Tue, Apr 10, 2018

Visualizing the Baby Boom

To close out what has become demography week, I combined the US monthly birth data with data for England and Wales (from the same ONS source as before), so that I could look at the trends together. The monthly England and Wales data I have to hand runs from 1938 to 1991. I thought combining the monthly tiled heatmap and the LOESS decomposition would work well as a poster, so I made one.

Sun, Apr 8, 2018

Animated Population Pyramids in R

Amateur demography week continues around here. Today we are looking at the population of England and Wales since 1961, courtesy of some data from the UK Office of National Statistics. We have data on population counts by age (in nice, detailed, yearly increments) broken down by sex. We’re going to tidy the data, make a pyramid for a year, and then make an animated gif that shows the changing age distribution of the population over more than fifty years.

Sat, Apr 7, 2018

Us Monthly Births

Yesterday I came across Aaron Penne’s collection of very nice data visualizations, one of which was of monthly births in the United States since 1933. He made a tiled heatmap of the data, taking care when calculating the average rate to correct for the varying number of days in different months. Aaron works in Python, so I took the opportunity to play around with the data and redo the plots in R.

Sociology and other distractions, since 2002. View an index of posts by category. R-related posts also appear on R-Bloggers.



To be notified of updates, you can subscribe to the  RSS feed for the site.