Widening Multiple Columns Redux
Last year I wrote about the slightly tedious business of spreading (or widening) multiple value columns in Tidyverse-flavored R. Recent updates to the tidyr package, particularly the introduction of the pivot_wider()
and pivot_longer()
functions, have made this rather more straightforward to do than before. Here I recapitulate the earlier example with the new tools.
The motivating case is something that happens all the time when working with social science data. We’ll load the tidyverse, and then quickly make up some sample data to work with.
|
|
What we have are measures of sex, race, stratum (from a survey, say), education, and income. Of these, everything is categorical except income. Here’s what it looks like:
|
|
Let’s say we want to transform this to a wider format, specifically by widening the educ
column, so we end up with columns for both the HS
and BA
categories, and as we do so we want to calculate both the mean of income
and the total n
within each category of educ
.
For comparison, one could do this with data.table
in the following way:
|
|
|
|
Until recently, widening or spreading on multiple values like this was kind of a pain when working in the tidyverse. You can see how I approached it before in the earlier post. (The code there still works fine.) Previously, you had to put spread()
and gather()
through a slightly tedious series of steps, best wrapped in a function you’d have to write yourself. No more! Since tidyr
v1.0.0 has been released, though, the new function pivot_wider()
(and its complement, pivot_longer()
) make this common operation more accessible.
Here’s how to do it now. Remember that in the tidyverse approach, we’ll first do the summary calculations, mean
and length
, respectively, though we’ll use dplyr
’s n()
for the latter. Then we widen the long result.
|
|
This gives us an object that’s equivalent to the df_wide_dt
object created by data.table
.
|
|
And there you have it. Be sure to check out the complement of pivot_wider()
, pivot_longer()
, also.