October 10, 2019

· Sociology · R

The General Social Survey, or GSS, is one of the cornerstones of American social science and one of the most-analyzed datasets in Sociology. It is routinely used in research, in teaching, and as a reference point in discussions about changes in American society since the early 1970s. It is also a model of open, public data. The National Opinion Research Center already provides many excellent tools for working with the data, and has long made it freely available to researchers. Casual users of the GSS can examine the GSS Data Explorer, and social scientists can download complete datasets directly. At present, the GSS is provided to researchers in a choice of two commercial formats, Stata (.dta) and SPSS (.sav). It's not too difficult to get the data into R (especially now that the Haven package is pretty reliable), but it can be a little annoying to have to do it repeatedly. After doing it one too many times, I got tired of it and I made a package instead. The gssr package provides the GSS Cumulative Data File (1972-2018) and the GSS Three Wave Panel Data File (2006-2010), together with their codebooks, in a format that makes it straightforward to get started working with them in R. The gssr package makes the GSS a little more accessible to users of R, the free software environment for statistical computing, and thus helps in a small way to make the GSS even more open than it already is. The package presently lives at http://kjhealy.github.io/gssr/, as it is still in development. There is a vignette providing an overview of what's included, and you can see the source code on GitHub.

All Categories


I am Professor of Sociology at Duke University. I’m also affiliated with the Kenan Institute for Ethics. Read a brief overview of my work or my Curriculum Vitae.



To be notified of updates, you can subscribe to the  RSS feed for the site.