Partly because it grew out of a few code-throughs I was doing, but mostly as a classroom exercise, I pulled together a small data package for R called covdata, available at https://kjhealy.github.io/covdata/. It contains COVID-19 data from three sources:
State-level and county-level data for the United States from the New York Times.
I’ll keep it up to date for at least the near future. If I get a chance I’ll write up a little walkthrough about the process of making a package like this. I find that making data packages for R is both intrinsically useful for data that will be used more than once, and also generally a very accessible and handy way to introduce students to the mechanics of R packaging. It’s much more common for regular users of R to have some data that would benefit from packaging than for them to have some set of functions that might usefully be packaged up for other people.
Here are what the three tables look like, plus a figure at the end.
US State-Level Data from the COVID Tracking Project
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
covus#> # A tibble: 27,216 x 11#> date state fips measure count pos_neg death_increase hospitalized_in… negative_increa… positive_increa…#> <date> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>#> 1 2020-04-09 AK 02 positi… 235 7223 0 0 146 9#> 2 2020-04-09 AK 02 negati… 6988 7223 0 0 146 9#> 3 2020-04-09 AK 02 pending NA 7223 0 0 146 9#> 4 2020-04-09 AK 02 hospit… NA 7223 0 0 146 9#> 5 2020-04-09 AK 02 hospit… 27 7223 0 0 146 9#> 6 2020-04-09 AK 02 in_icu… NA 7223 0 0 146 9#> 7 2020-04-09 AK 02 in_icu… NA 7223 0 0 146 9#> 8 2020-04-09 AK 02 on_ven… NA 7223 0 0 146 9#> 9 2020-04-09 AK 02 on_ven… NA 7223 0 0 146 9#> 10 2020-04-09 AK 02 recove… 49 7223 0 0 146 9#> # … with 27,206 more rows, and 1 more variable: total_test_results_increase <dbl>
State-Level and County-Level (Cumulative) Data from the New York Times
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
nytcovstate#> # A tibble: 2,105 x 5#> date state fips cases deaths#> <date> <chr> <chr> <dbl> <dbl>#> 1 2020-01-21 Washington 53 1 0#> 2 2020-01-22 Washington 53 1 0#> 3 2020-01-23 Washington 53 1 0#> 4 2020-01-24 Illinois 17 1 0#> 5 2020-01-24 Washington 53 1 0#> 6 2020-01-25 California 06 1 0#> 7 2020-01-25 Illinois 17 1 0#> 8 2020-01-25 Washington 53 1 0#> 9 2020-01-26 Arizona 04 1 0#> 10 2020-01-26 California 06 2 0#> # … with 2,095 more rows
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
nytcovcounty#> # A tibble: 45,880 x 6#> date county state fips cases deaths#> <date> <chr> <chr> <chr> <dbl> <dbl>#> 1 2020-01-21 Snohomish Washington 53061 1 0#> 2 2020-01-22 Snohomish Washington 53061 1 0#> 3 2020-01-23 Snohomish Washington 53061 1 0#> 4 2020-01-24 Cook Illinois 17031 1 0#> 5 2020-01-24 Snohomish Washington 53061 1 0#> 6 2020-01-25 Orange California 06059 1 0#> 7 2020-01-25 Cook Illinois 17031 1 0#> 8 2020-01-25 Snohomish Washington 53061 1 0#> 9 2020-01-26 Maricopa Arizona 04013 1 0#> 10 2020-01-26 Los Angeles California 06037 1 0#> # … with 45,870 more rows
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
nytcovcounty%>%mutate(uniq_name=paste(county,state))%>%# Can't use FIPS because of how the NYT bundled citiesgroup_by(uniq_name)%>%mutate(days_elapsed=date-min(date))%>%ggplot(aes(x=days_elapsed,y=cases,group=uniq_name))+geom_line(size=0.25,color="gray20")+scale_y_log10(labels=scales::label_number_si())+guides(color=FALSE)+facet_wrap(~state,ncol=5)+labs(title="COVID-19 Cumulative Recorded Cases by US County",subtitle=paste("New York is bundled into a single area in this data.\nData as of",format(max(nytcovcounty$date),"%A, %B %e, %Y")),x="Days since first case",y="Count of Cases (log 10 scale)",caption="Data: The New York Times | Graph: @kjhealy")+theme_minimal()#> Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.#> Warning: Transformation introduced infinite values in continuous y-axis
County-level plot faceted by state, from the New York Times data.