Cleaning the Table

10 November 2019

While I’m talking about getting data into R this weekend, here’s another quick example that came up in class this week. The mortality data in the previous example were nice and clean coming in the door. That’s usually not the case. Data can be and usually is messy in all kinds of ways. One of the most common, particularly in the case of summary tables obtained from some source or other, is that the values aren’t directly usable.

Read More…

Reading in Data

9 November 2019

Here’s a common situation: you have a folder full of similarly-formatted CSV or otherwise structured text files that you want to get into R quickly and easily. Reading data into R is one of those tasks that can be a real source of frustration for beginners, so I like collecting real-life examples of the many ways it’s become much easier. This week in class I was working with country-level historical mortality rate estimates.

Read More…

Dogs of New York

28 October 2019

The other week I took a few publicly-available datasets that I use for teaching data visualization and bundled them up into an R package called nycdogs. The package has datasets on various aspects of dog ownership in New York City, and amongst other things you can draw maps with it at the zip code level. The package homepage has installation instructions and an example. Using this data, I made a poster called Dogs of New York.

Read More…

A decade or more ago I read a nice worked example from the political scientist Simon Jackman demonstrating how to do Principal Components Analysis. PCA is one of the basic techniques for reducing data with multiple dimensions to some much smaller subset that nevertheless represents or condenses the information we have in a useful way. In a PCA approach, we transform the data in order to find the “best” set of underlying components.

Read More…

Last year I wrote about the slightly tedious business of spreading (or widening) multiple value columns in Tidyverse-flavored R. Recent updates to the tidyr package, particularly the introduction of the pivot_wider() and pivot_longer() functions, have made this rather more straightforward to do than before. Here I recapitulate the earlier example with the new tools. The motivating case is something that happens all the time when working with social science data. We’ll load the tidyverse, and then quickly make up some sample data to work with.

Read More…


 

Recent Work

  • Data Visualization: A Practical Introduction. Princeton University Press. Buy on Amazon Abstract
  • “Transformative Treatments.” Noûs 52: 320–335. Abstract  pdf
  • “Visualizing the Baby Boom.” Socius 4: 1-2 Abstract  pdf
  • “The Plain Person’s Guide to Plain Text Social Science.” Abstract  pdf
  • “By the Numbers.” European Journal of Sociology (2017), 58:512-519 Abstract  pdf

Current Teaching


about

I am Professor of Sociology at Duke University. I’m also affiliated with the Kenan Institute for Ethics. Read a brief overview of my work or my Curriculum Vitae.

Where

subscribe

To be notified of updates, you can subscribe to the  RSS feed for the site.

search