gssr Update
Update (April 15th 2024)
gssr is now two packages: gssr and gssrdoc. They’re also available as binary packages via R-Universe which means they will install much faster. See this post for details.
NORC released version 2a of the 1972-2022 General Social Survey cumulative file. I’ve updated {gssr}
, an R package that makes it more convenient for R users to work with GSS Data. One handy feature of {gssr}
is that it lets you see documentation for individual GSS variables as R help pages.

Details on every GSS variable are available in the R help system.
gssr
is a data package, bundling several datasets into a convenient
format. The relatively large size of the data in the package means it is
not suitable for hosting on CRAN, the
core R package repository.
Install direct from GitHub
You can install gssr from GitHub with:
|
|
Load the package:
|
|
Single GSS years
You can quickly get the data for any single GSS year by using
gss_get_yr()
to download the data file from NORC and put it directly
into a tibble.
|
|
The GSS data comes in a labelled format, mirroring the way it is encoded for Stata and SPSS platforms. The numeric codes are the content of the column cells. The labeling information is stored as an attribute of the column.
Here’s a typical workflow for getting the data ready:
|
|
The Cumulative Data File
The GSS cumulative data file is large. It is not loaded by default when
you invoke the package. (That is, gssr
does not use R’s “lazy loading”
facility. The data file is too big to do this without error.) To load
one of the datasets, first load the library and then use data()
to
make the data available. For example, load the cumulative GSS file like
this:
|
|
This will take a moment. Once it is ready, the gss_all
object is
available to use in the usual way:
|
|
In addition to the integrated help, information about the variables is also contained in the gss_dict
object:
|
|
There are also a few convenience functions. For example, to see which years some questions were ask, use gss_which_years()
:
|
|