2020

March 7

This Be the Kirsch

They pluck your plums, your mum and dad
They eat them for their supper, too
They gobble all the fruit you had
And leave some bullshit note for you

But they were robbed blind in their day
Of damsons, prunes, and blackthorn sloes
Their breakfast treats were poached away
And justified with old-style prose

“Forgive us” both your parents moan
“They were delicious, sweet, and cold”
They wonder why I never phone
And from them my own kids withhold

(See also)

March 5

Spanish Flu

I was teaching some dplyr and ggplot today. Because Coronavirus is in the, uh, air, I decided to work with the mortality data from http://mortality.org and have the students practice getting a bunch of data files into R and then plotting the resulting data quickly and informatively. We took a look at the years around the 1918 Influenza Epidemic and, after poking at the data for a little while, came to realize why it was called the Spanish Flu. Here’s some code you can run if you download the (freely available) 1x1 mortality files from <mortality.org>.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


library(here)
library(janitor)
library(tidyverse)

## Where the data is locally
path <- "data/Mx_1x1/"

## Colors for later
my_colors <- c("#0072B2", "#E69F00")

## Some utility functions for cleaning
get_country_name <- function(x){
  read_lines(x, n_max = 1) %>%
    str_extract(".+?,") %>%
    str_remove(",")
}

shorten_name <- function(x){
  str_replace_all(x, " -- ", " ") %>%
    str_replace("The United States of America", "USA") %>%
    snakecase::to_any_case()
}

make_ccode <- function(x){
  str_extract(x, "[:upper:]+((?=\\.))")
}

First we’re going to make a little tibble of country codes, names, and associated file paths.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28


filenames <- dir(path = here(path),
                 pattern = "*.txt",
                 full.names = TRUE)

countries <- tibble(country = map_chr(filenames, get_country_name),
                    cname = map_chr(country, shorten_name),
                    ccode = map_chr(filenames, make_ccode),
                    path = filenames)

countries

# A tibble: 49 x 4
   country    cname     ccode path                                    
   <chr>      <chr>     <chr> <chr>                                   
 1 Australia  australia AUS   /Users/kjhealy/Documents/data/misc/lexi2 Austria    austria   AUT   /Users/kjhealy/Documents/data/misc/lexi3 Belgium    belgium   BEL   /Users/kjhealy/Documents/data/misc/lexi4 Bulgaria   bulgaria  BGR   /Users/kjhealy/Documents/data/misc/lexi5 Belarus    belarus   BLR   /Users/kjhealy/Documents/data/misc/lexi6 Canada     canada    CAN   /Users/kjhealy/Documents/data/misc/lexi7 SwitzerlaswitzerlCHE   /Users/kjhealy/Documents/data/misc/lexi8 Chile      chile     CHL   /Users/kjhealy/Documents/data/misc/lexi9 Czechia    czechia   CZE   /Users/kjhealy/Documents/data/misc/lexi10 East Germeast_gerDEUTE /Users/kjhealy/Documents/data/misc/lexi# … with 39 more rows

Next we ingest the data as a nested column, clean it a little, and subset it to those countries that we actually have mortality data for from the relevant time period.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35


mortality <- countries %>%
  mutate(data = map(path,
                    ~ read_table(., skip = 2, na = "."))) %>%
  unnest(cols = c(data)) %>%
  clean_names() %>%
  mutate(age = as.integer(recode(age, "110+" = "110"))) %>%
  select(-path) %>%
  nest(data = c(year:total))

## Subset to flu years / countries
flu <- mortality %>% 
  unnest(cols = c(data)) %>%
  group_by(country) %>%
  filter(min(year) < 1918)

flu

# A tibble: 298,923 x 8
# Groups:   country [14]
   country cname   ccode  year   age  female    male   total
   <chr>   <chr>   <chr> <dbl> <int>   <dbl>   <dbl>   <dbl>
 1 Belgium belgium BEL    1841     0 0.152   0.187   0.169  
 2 Belgium belgium BEL    1841     1 0.0749  0.0741  0.0745 
 3 Belgium belgium BEL    1841     2 0.0417  0.0398  0.0408 
 4 Belgium belgium BEL    1841     3 0.0255  0.0233  0.0244 
 5 Belgium belgium BEL    1841     4 0.0185  0.0171  0.0178 
 6 Belgium belgium BEL    1841     5 0.0139  0.0124  0.0132 
 7 Belgium belgium BEL    1841     6 0.0128  0.0102  0.0115 
 8 Belgium belgium BEL    1841     7 0.0109  0.00800 0.00944
 9 Belgium belgium BEL    1841     8 0.00881 0.00701 0.00789
10 Belgium belgium BEL    1841     9 0.00814 0.00696 0.00754
# … with 298,913 more rows

For the purposes of labeling an upcoming plot, we’re going to make a little dummy dataset.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21


dat_text <- data.frame(
  label = c("1918", rep(NA, 5)),
  agegrp = factor(paste("Age", seq(10, 60, 10))),
  year     = c(1920, rep(NA, 5)),
  female     = c(0.04, rep(NA, 5)), 
  flag = rep(NA, 6)
)

dat_text

label agegrp year female flag
1  1918 Age 10 1920   0.04   NA
2  <NA> Age 20   NA     NA   NA
3  <NA> Age 30   NA     NA   NA
4  <NA> Age 40   NA     NA   NA
5  <NA> Age 50   NA     NA   NA
6  <NA> Age 60   NA     NA   NA


And now we filter the data to look only at female mortality between 1900 and 1929 for a series of specific ages: every decade from 10 years old to 60 years old. We’ll use that dummy dataset to label the first (but only the first) panel in the faceted plot we’re going to draw.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28


p0 <- flu %>%
  group_by(country, year) %>%
  filter(year > 1899 & year < 1930, age %in% seq(10, 60, by = 10)) %>%
  mutate(flag = country %in% "Spain", 
         agegrp = paste("Age", age)) %>%
  ggplot(mapping = aes(x = year, y = female, color = flag)) + 
  geom_vline(xintercept = 1918, color = "gray80") + 
  geom_line(mapping = aes(group = country)) 

p1 <- p0 +  geom_text(data = dat_text, 
                mapping = aes(x = year, y = female, label = label), 
                color = "black", 
                show.legend = FALSE, 
                group = 1, 
                size = 3) + 
  scale_color_manual(values = my_colors, 
                     labels = c("Other Countries", "Spain")) + 
  scale_y_continuous(labels = scales::percent) + 
  labs(title = "Female Mortality, Selected Ages and Countries 1900-1929", 
       x = "Year", y = "Female Mortality Rate", color = NULL,
       caption = "@kjhealy / Data: mortality.org") + 
  facet_wrap(~ agegrp, ncol = 1) + 
  theme(legend.position = "top")
  
p1

And thus, Spanish Flu. Though it looks like it was no joke to be an older woman in Spain during any part of the early 20th century.

February 26

A New Baby Boom Poster

I wanted to work through a few examples of more polished graphics done mostly but perhaps not entirely in R. So, I revisited the Baby Boom visualizations I made a while ago and made a new poster with them. This allowed me to play around with a few packages that I either hadn’t made use of or that weren’t available the first time around. The most notable additions are Rob Hyndman’s suite of tidy tools for time series analysis and Thomas Lin Pedersen’s packages ggforce and patchwork. These are all fantastic resources. The time series decomposition was done with the tsibble family of tools. Meanwhile ggforce and patchwork allow for a tremendous degree of flexibility in laying out multiple plots while still being very straightforward to use. Here’s a preview of the result:

OK boomer

OK Boomer

For now, the annotations were done in post-production (as they say in the movie biz) rather than in R, but I think I’ll be looking to see whether it’s worth taking advantage of some other packages to do those in R as well.

The time series decomposition takes the births series and separates it into trend, seasonal, and remainder components. (It’s an STL decomposition; there are a bunch of other alternatives.) Often, the seasonal and remainder components will end up on quite different scales from the trend. The default plotting methods for decompositions will often show variably-sized vertical bars to the left of each panel, to remind the viewer that the scales are different. But ggforce has a facet_col() function that allows the space taken up by a facet to vary in the same way that one can allow the scales on an ordinary facet’s axes to vary. Usually, variable scaling isn’t desirable in a small-multiple, because the point is to make comparisons across panels. But in this case the combination of free scales and free spacing is very helpful.

Here’s the snippet of code that makes the time series line graphs:

1
2
3
4
5
6
7
8

p_trends <- ggplot(data_lon, aes(x = date, y = value)) + 
    geom_line(color = "gray20") + 
    scale_x_date(breaks = break_vec, labels = break_labs, expand = c(0,0)) + 
    facet_col(~ name, space = "free", scales = "free_y") + 
    theme(  strip.background = element_blank(),
            strip.text.x = element_blank()) + 
    labs(y = NULL, x = "Year")

Meanwhile combining the trends plot with the tiled heatmap (called p_tile) is a piece of cake with patchwork:

1
2

(p_tile / p_trends) + plot_layout(heights = c(30, 70)) 

The / convention means stack the plot objects, and plot_layout() proportionally divides the available space.

Chances are that I’ll make some posters of these and other recent visualizations. Because people often ask, I’ve been looking into options for making them available for sale in various formats … hopefully that’ll be sorted out soon and I can join e.g. Waterlilies, The Kiss, and John Belushi on dorm room walls everywhere.

The code for the decomposition and the core plots is on GitHub.

February 18

Dataviz Workshop at RStudio::conf

Workshop materials are available here: https://rstd.io/conf20-dataviz
Consider buying the book; it’s good: Data Visualization: A Practical Introduction / Buy on Amazon

I was delighted to have the opportunity to teach a two-day workshop on Data Visualization using ggplot2 at this year’s rstudio::conf(2020) in January. It was my first time attending the conference and it was a terrific experience. I particularly appreciated the friendly and constructive atmosphere that RStudio so clearly goes out of its way to encourage and sustain.

The workshop focused on learning how to think about good data visualization in principle, and how to do it in practice. After many years of trying and often failing to learn how to make good visualizations myself, I became convinced of two things. First, there is a real need for an approach that effectively combines the why of visualization with the how. A lot of otherwise excellent introductions to data visualization will teach you why some visualizations work better than others, and will present a series of mouth-watering examples of fabulous graphics. Then you sit down in front of an empty .Rmarkdown file and … now what? How do I do that?

Meanwhile, many very good, detailed introductions to writing ggplot2 code may be a little out of reach for beginners or—perhaps more often—will tend to get picked up by users in a recipe-like or piecemeal way. People cast around to find out how to more or less solve a particular problem they are having. But they leave without really having a good grasp on why the code they are using looks the way it does. The result is that even people who are pretty used to working in R and who regularly make graphs from data end up with a hazy idea of what they’re doing when they use ggplot.

The second thing I became convinced of as I developed this material was that data visualization is a fantastic way to introduce people to the world of data analysis with R generally. When visualizing data with R and ggplot, it’s possible to produce satisfying results almost right away. That makes it easier to introduce other tidyverse principles and tools in an organic fashion.

For both of those reasons, I ended up writing a book that approached things in just the way I wanted: a practical introduction to data visualization using ggplot2 that kept both the ideas and the code in view, and tried to do so in an engaging and approachable way. It was this text that formed the core of the workshop.

While teaching over the two days, I was assisted by four TAs:

When I saw the roster, my first reaction was that mine was the only name I didn’t recognize. Having Thomas as a TA, in particular, did rather threaten to cross the line from the merely embarrassing to the faintly absurd. It was a real treat to meet and work with everyone for the first time.

The materials from the workshop are available at the GitHub repository for the course. The repo contains all the code we went through as well as PDFs of all of the slides. The code and the slides also include additional examples and other extensions that we did not have time to cover in over the two days, or that I just mentioned in passing.

One of the benefits of teaching a short course like this is that I get a (sometimes sharp!) reminder of what works best and what needs tweaking across the various topics covered. Revisiting the code, in particular, is always necessary just because the best way to do something will change over time. For example, a few of the small tricks and workarounds that I show for dealing with boxplots will shortly become unneccessary, thanks to the work of Thomas, Dewey, and others on the next version of ggplot. I’m looking forward to incorporating those elements and more into the next version of the workshop.

Data visualization is a powerful way to explore your own data and communicate your results to other people. One of the themes of the book, and the workshop, is that it is in most ways a tool like any other. It won’t magically render you immune to error or make it impossible for you to fool others, or fool yourself. But once you get a feel for how to work with it, it makes your work easier and better in many ways. The great strength of the approach taken by the grammar of graphics in general and ggplot in particular is that it gives people a powerful “flow of action” to follow. It provides a set of concepts—mappings, geoms, scales, facets, layers, and so on—that let you look at other people’s graphics and really see how their component pieces fit together. And it implements those concepts as a series of functions that let you coherently assemble graphics yourself. The goal of the workshop was to bring people to the point where they could comfortably write code that would clearly say what they wanted to see.

2019

November 10

Cleaning the Table

While I’m talking about getting data into R this weekend, here’s another quick example that came up in class this week. The mortality data in the previous example were nice and clean coming in the door. That’s usually not the case. Data can be and usually is messy in all kinds of ways. One of the most common, particularly in the case of summary tables obtained from some source or other, is that the values aren’t directly usable. The following summary table was copied and pasted into Excel from an external source, saved as a CSV file, and arrived looking like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34

library(tidyverse)

rfm_tbl <- read_csv("data/rfm_table.csv")


## Parsed with column specification:
## cols(
##   SEGMENT = col_character(),
##   DESCRIPTION = col_character(),
##   R = col_character(),
##   F = col_character(),
##   M = col_character()
## )


rfm_tbl 


## # A tibble: 23 x 5
##    SEGMENT        DESCRIPTION                             R     F     M    
##    <chr>          <chr>                                   <chr> <chr> <chr>
##  1 <NA>           <NA>                                    <NA>  <NA>  <NA> 
##  2 Champions      Bought recently, buy often and spend t… 4– 5  4– 5  4– 5 
##  3 <NA>           <NA>                                    <NA>  <NA>  <NA> 
##  4 Loyal Custome… Spend good money. Responsive to promot… 2– 5  3– 5  3– 5 
##  5 <NA>           <NA>                                    <NA>  <NA>  <NA> 
##  6 Potential Loy… Recent customers, spent good amount, b… 3– 5  1– 3  1– 3 
##  7 <NA>           <NA>                                    <NA>  <NA>  <NA> 
##  8 New Customers  Bought more recently, but not often     4– 5  <= 1  <= 1 
##  9 <NA>           <NA>                                    <NA>  <NA>  <NA> 
## 10 Promising      Recent shoppers, but haven’t spent much 3– 4  <= 1  <= 1 
## # … with 13 more rows

This is messy and we can’t do anything with the values in R, F, and M. Ultimately we want a table with separate columns containing the low and high values for these variables. If no lower bound is shown, the lower bound is zero. We’re going to use a few tools, notably separate() to get where we want to be. I’ll step through this pipeline one piece at a time, so you can see how the table is being changed from start to finish.

First let’s clean clean the variable names and remove the entirely blank lines.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20

rfm_tbl %>% 
  janitor::clean_names() %>%
  filter_all(any_vars(!is.na(.))) 


## # A tibble: 11 x 5
##    segment        description                             r     f     m    
##    <chr>          <chr>                                   <chr> <chr> <chr>
##  1 Champions      Bought recently, buy often and spend t… 4– 5  4– 5  4– 5 
##  2 Loyal Custome… Spend good money. Responsive to promot… 2– 5  3– 5  3– 5 
##  3 Potential Loy… Recent customers, spent good amount, b… 3– 5  1– 3  1– 3 
##  4 New Customers  Bought more recently, but not often     4– 5  <= 1  <= 1 
##  5 Promising      Recent shoppers, but haven’t spent much 3– 4  <= 1  <= 1 
##  6 Need Attention Above average recency, frequency & mon… 2– 3  2– 3  2– 3 
##  7 About To Sleep Below average recency, frequency & mon… 2– 3  <= 2  <= 2 
##  8 At Risk        Spent big money, purchased often but l… <= 2  2– 5  2– 5 
##  9 Can’t Lose Th… Made big purchases and often, but long… <= 1  4– 5  4– 5 
## 10 Hibernating    Low spenders, low frequency, purchased… 1– 2  1– 2  1– 2 
## 11 Lost           Lowest recency, frequency & monetary s… <= 2  <= 2  <= 2

Next we start work on the values. I thought about different ways of doing this, notably working out a way to apply or map separate() to each of the columns I want to change. I got slightly bogged down doing this, and instead decided to lengthen the r, f, and m variables into a single key-value pair, do the recoding there, and then widen the result again. First, lengthen the data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21

rfm_tbl %>% 
  janitor::clean_names() %>%
  filter_all(any_vars(!is.na(.))) %>%
  pivot_longer(cols = r:m)


## # A tibble: 33 x 4
##    segment         description                                  name  value
##    <chr>           <chr>                                        <chr> <chr>
##  1 Champions       Bought recently, buy often and spend the mo… r     4– 5 
##  2 Champions       Bought recently, buy often and spend the mo… f     4– 5 
##  3 Champions       Bought recently, buy often and spend the mo… m     4– 5 
##  4 Loyal Customers Spend good money. Responsive to promotions   r     2– 5 
##  5 Loyal Customers Spend good money. Responsive to promotions   f     3– 5 
##  6 Loyal Customers Spend good money. Responsive to promotions   m     3– 5 
##  7 Potential Loya… Recent customers, spent good amount, bought… r     3– 5 
##  8 Potential Loya… Recent customers, spent good amount, bought… f     1– 3 
##  9 Potential Loya… Recent customers, spent good amount, bought… m     1– 3 
## 10 New Customers   Bought more recently, but not often          r     4– 5 
## # … with 23 more rows

I’m quite sure that there’s an elegant way to use one of the map() functions to process the r, f, and m columns in sequence. But seeing as I couldn’t quickly figure it out, this alternative strategy works just fine. In fact, as a general approach I think it’s always worth remembering that the tidyverse really “wants” your data to be in long form, and lots of things that are awkward or conceptually tricky can suddenly become much easier if you get the data into the shape that the function toolbox wants it to be in. Lengthening the data you’re working with is very often the right approach, and you know you can widen it later on once you’re done cleaning or otherwise manipulating it.

With our table in long format we can now use separate() on the value column. The separate() function is very handy for pulling apart variables that should be in different columns. Its defaults are good, too. In this case I didn’t have to write a regular expression to specify the characters that are dividing up the values. In the function call we use convert = TRUE to turn the results into integers, and fill = "left" because there’s an implicit zero on the left of each entry that looks like e.g. <= 2.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

rfm_tbl %>% 
  janitor::clean_names() %>%
  filter_all(any_vars(!is.na(.))) %>%
  pivot_longer(cols = r:m) %>% 
  separate(col = value, into = c("lo", "hi"), 
           remove = FALSE, convert = TRUE, 
           fill = "left") 


## # A tibble: 33 x 6
##    segment       description                        name  value    lo    hi
##    <chr>         <chr>                              <chr> <chr> <int> <int>
##  1 Champions     Bought recently, buy often and sp… r     4– 5      4     5
##  2 Champions     Bought recently, buy often and sp… f     4– 5      4     5
##  3 Champions     Bought recently, buy often and sp… m     4– 5      4     5
##  4 Loyal Custom… Spend good money. Responsive to p… r     2– 5      2     5
##  5 Loyal Custom… Spend good money. Responsive to p… f     3– 5      3     5
##  6 Loyal Custom… Spend good money. Responsive to p… m     3– 5      3     5
##  7 Potential Lo… Recent customers, spent good amou… r     3– 5      3     5
##  8 Potential Lo… Recent customers, spent good amou… f     1– 3      1     3
##  9 Potential Lo… Recent customers, spent good amou… m     1– 3      1     3
## 10 New Customers Bought more recently, but not oft… r     4– 5      4     5
## # … with 23 more rows

Before widening the data again we drop the value column. We don’t need it anymore. (It will mess up the widening if we keep it, too: try it and see what happens.)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

rfm_tbl %>% 
  janitor::clean_names() %>%
  filter_all(any_vars(!is.na(.))) %>%
  pivot_longer(cols = r:m) %>% 
  separate(col = value, into = c("lo", "hi"), 
           remove = FALSE, convert = TRUE, 
           fill = "left") %>%
  select(-value) 


## # A tibble: 33 x 5
##    segment        description                             name     lo    hi
##    <chr>          <chr>                                   <chr> <int> <int>
##  1 Champions      Bought recently, buy often and spend t… r         4     5
##  2 Champions      Bought recently, buy often and spend t… f         4     5
##  3 Champions      Bought recently, buy often and spend t… m         4     5
##  4 Loyal Custome… Spend good money. Responsive to promot… r         2     5
##  5 Loyal Custome… Spend good money. Responsive to promot… f         3     5
##  6 Loyal Custome… Spend good money. Responsive to promot… m         3     5
##  7 Potential Loy… Recent customers, spent good amount, b… r         3     5
##  8 Potential Loy… Recent customers, spent good amount, b… f         1     3
##  9 Potential Loy… Recent customers, spent good amount, b… m         1     3
## 10 New Customers  Bought more recently, but not often     r         4     5
## # … with 23 more rows

Now we can widen the data, with pivot_wider().

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

rfm_tbl %>% 
  janitor::clean_names() %>%
  filter_all(any_vars(!is.na(.))) %>%
  pivot_longer(cols = r:m) %>% 
  separate(col = value, into = c("lo", "hi"), 
           remove = FALSE, convert = TRUE, 
           fill = "left") %>%
  select(-value) %>%
  pivot_wider(names_from = name, 
              values_from = lo:hi) 


## # A tibble: 11 x 8
##    segment     description               lo_r  lo_f  lo_m  hi_r  hi_f  hi_m
##    <chr>       <chr>                    <int> <int> <int> <int> <int> <int>
##  1 Champions   Bought recently, buy of…     4     4     4     5     5     5
##  2 Loyal Cust… Spend good money. Respo…     2     3     3     5     5     5
##  3 Potential … Recent customers, spent…     3     1     1     5     3     3
##  4 New Custom… Bought more recently, b…     4    NA    NA     5     1     1
##  5 Promising   Recent shoppers, but ha…     3    NA    NA     4     1     1
##  6 Need Atten… Above average recency, …     2     2     2     3     3     3
##  7 About To S… Below average recency, …     2    NA    NA     3     2     2
##  8 At Risk     Spent big money, purcha…    NA     2     2     2     5     5
##  9 Can’t Lose… Made big purchases and …    NA     4     4     1     5     5
## 10 Hibernating Low spenders, low frequ…     1     1     1     2     2     2
## 11 Lost        Lowest recency, frequen…    NA    NA    NA     2     2     2

Finally we put back those implicit zeros using replace_na() and reorder the columns to our liking. Using replace_na() is fine here because we know that every missing value should in fact be a zero.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

rfm_tbl %>% 
  janitor::clean_names() %>%
  filter_all(any_vars(!is.na(.))) %>%
  pivot_longer(cols = r:m) %>% 
  separate(col = value, into = c("lo", "hi"), 
           remove = FALSE, convert = TRUE, 
           fill = "left") %>%
  select(-value) %>%
  pivot_wider(names_from = name, 
              values_from = lo:hi) %>%
  mutate_if(is.integer, replace_na, 0) %>%
  select(segment, 
         lo_r, hi_r, 
         lo_f, hi_f, 
         lo_m, hi_m, 
         description)


## # A tibble: 11 x 8
##    segment      lo_r  hi_r  lo_f  hi_f  lo_m  hi_m description             
##    <chr>       <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr>                   
##  1 Champions       4     5     4     5     4     5 Bought recently, buy of…
##  2 Loyal Cust…     2     5     3     5     3     5 Spend good money. Respo…
##  3 Potential …     3     5     1     3     1     3 Recent customers, spent…
##  4 New Custom…     4     5     0     1     0     1 Bought more recently, b…
##  5 Promising       3     4     0     1     0     1 Recent shoppers, but ha…
##  6 Need Atten…     2     3     2     3     2     3 Above average recency, …
##  7 About To S…     2     3     0     2     0     2 Below average recency, …
##  8 At Risk         0     2     2     5     2     5 Spent big money, purcha…
##  9 Can’t Lose…     0     1     4     5     4     5 Made big purchases and …
## 10 Hibernating     1     2     1     2     1     2 Low spenders, low frequ…
## 11 Lost            0     2     0     2     0     2 Lowest recency, frequen…

Much nicer.