Afterwards, I was messing around with the data and wanted to draw some time-series plots for the various subject areas the NCES tracks. After cleaning up the data, we end up with a tidy table that looks like this:
>data_l# A tibble: 594 x 5# Groups: yr field_of_studyyrcountyr_pctcutoff<chr><int><dbl><dbl><lgl>1Agricultureandnaturalresources19701.27e41.51FALSE2Architectureandrelatedservices19705.57e30.663FALSE3Area;ethnic;cultural;gender;andgroupstud… 19702.58e30.307FALSE4Biologicalandbiomedicalsciences19703.57e44.25TRUE5Business19701.15e513.7TRUE6Communication;journalism;andrelatedprograms19701.03e41.23TRUE7Communicationstechnologies19704.78e20.0569FALSE8Computerandinformationsciences19702.39e30.284TRUE9Education19701.76e521.0TRUE10Engineering19704.50e45.36TRUE# ... with 584 more rows
The data and code for everything here (including the figure above) is available on Github, by the way.
What we want is a small-multiple of the trends for each subject area with more than two percent of degrees conferred. That’s what the cutoff variable is for. When we create small multiples, we almost always want to order them in some sensible way. This is almost never the default of alphabetically by category. Instead, we will reorder the panels (the facets, in ggplot’s terms) by some statistic of interest—most often, the mean value of the variable we’re showing. We set up some labels (because we’ll be reusing them) and draw the plot. The key bit is the ~ reorder(field_of_study, -yr_pct) instruction.
my_xlab="Year"my_ylab="Percent of all BAs conferred"my_caption="Data from NCES Digest 2017, Table 322.10"my_subtitle="Observations are every 5 years from 1970-1995, and annually thereafter"my_title_1="US Trends in Bachelor's Degrees Conferred, 1970-2015,\nfor Areas averaging more than 2% of all degrees"my_title_2="US Trends in Bachelor's Degrees Conferred, 1970-2015,\nfor Areas averaging less than 2% of all degrees"p<-ggplot(subset(data_l,cutoff==TRUE),aes(x=yr,y=yr_pct,group=field_of_study))p+geom_line()+facet_wrap(~reorder(field_of_study,-yr_pct),labeller=label_wrap_gen(width=35),ncol=5)+labs(x=my_xlab,y=my_ylab,caption=my_caption,title=my_title_1,subtitle=my_subtitle)+theme_minimal()+theme(strip.text.x=element_text(size=6))
The result is a nice graph. R and ggplot have taken care of the layout for us. As is often the case, the number of categories doesn’t fit evenly into the number of rows in the plot. There’s a space left over in the bottom row. By default, ggplot will add x-axis labels to the next available panel on the row above (“English Language and Literature/Letters”).
Again on Twitter, DrDrang asked if there was a way, in effect, to force the bottom row of the plot to be filled in. Ggplot’s small multiples intelligently minimize redundancy in x- and y-axes labeling, but maybe we don’t like having that gap at the bottom and the associated need for another labeled axis in the row above. The facet_wrap() function has an as.table argument that’s set to TRUE by default. The help says
If TRUE, the default, the facets are laid out like a table with highest values at the bottom-right. If FALSE, the facets are laid out like a plot with the highest value at the top-right.
This fills the bottom row, but it breaks the high-to-low ordering that we’re trying to set with reorder(). We can get it back manually. First we create vars, which summarizes the areas of study by mean number of degrees awarded over the years. Separately, we great a vector, o, the same length as the subset of categories we’re going to display.
Here, instead of using reorder(), we recode the field_of_study variable on the fly, reordering its factor levels to reflect the desired panel order. We keep as.table = FALSE. The field_of_study categories then appear in the order we want.
We can do the same again for the fields with less than two percent of all degrees on average:
Because we have a different number of categories, we need to manually reorder the variable again. This isn’t an ideal solution. What we really want is a way to automatically figure out how many facets we have, and then fill them from the bottom in the order we desire. I’m not sure this is easily doable.