EPI 590R - Descriptive tables with {gtsummary}

What is `{gtsummary}`?

Create tables that are publication-ready
Highly customizable
Descriptive tables, regression tables, etc.

`gtsummary::tbl_summary()`

library(gtsummary)

tbl_summary(
  nlsy,
  by = sex_cat,
  include = c(sex_cat, race_eth_cat, region_cat,
              eyesight_cat, glasses, age_bir))

Characteristic	Male, N = 6,403¹	Female, N = 6,283¹
race_eth_cat
Hispanic	1,000 (16%)	1,002 (16%)
Black	1,613 (25%)	1,561 (25%)
Non-Black, Non-Hispanic	3,790 (59%)	3,720 (59%)
region_cat
Northeast	1,296 (21%)	1,254 (20%)
North Central	1,488 (24%)	1,446 (23%)
South	2,251 (36%)	2,317 (38%)
West	1,253 (20%)	1,142 (19%)
Unknown	115	124
eyesight_cat
Excellent	1,582 (38%)	1,334 (31%)
Very good	1,470 (35%)	1,500 (35%)
Good	792 (19%)	1,002 (23%)
Fair	267 (6.4%)	365 (8.5%)
Poor	47 (1.1%)	85 (2.0%)
Unknown	2,245	1,997
glasses	1,566 (38%)	2,328 (54%)
Unknown	2,241	1,995
age_bir	25 (21, 29)	22 (19, 27)
Unknown	3,652	3,091
¹ n (%); Median (IQR)

You can also refer to variables using helper functions

library(gtsummary)

tbl_summary(
  nlsy,
  by = sex_cat,
  include = c(ends_with("cat"), glasses, age_bir))

Characteristic	Male, N = 6,403¹	Female, N = 6,283¹
region_cat
Northeast	1,296 (21%)	1,254 (20%)
North Central	1,488 (24%)	1,446 (23%)
South	2,251 (36%)	2,317 (38%)
West	1,253 (20%)	1,142 (19%)
Unknown	115	124
race_eth_cat
Hispanic	1,000 (16%)	1,002 (16%)
Black	1,613 (25%)	1,561 (25%)
Non-Black, Non-Hispanic	3,790 (59%)	3,720 (59%)
eyesight_cat
Excellent	1,582 (38%)	1,334 (31%)
Very good	1,470 (35%)	1,500 (35%)
Good	792 (19%)	1,002 (23%)
Fair	267 (6.4%)	365 (8.5%)
Poor	47 (1.1%)	85 (2.0%)
Unknown	2,245	1,997
glasses	1,566 (38%)	2,328 (54%)
Unknown	2,241	1,995
age_bir	25 (21, 29)	22 (19, 27)
Unknown	3,652	3,091
¹ n (%); Median (IQR)

We probably want to name the variables

tbl_summary(
  nlsy,
  by = sex_cat,
  include = c(sex_cat, race_eth_cat, region_cat,
              eyesight_cat, glasses, age_bir),
  label = list(
    race_eth_cat ~ "Race/ethnicity",
    region_cat ~ "Region",
    eyesight_cat ~ "Eyesight",
    glasses ~ "Wears glasses",
    age_bir ~ "Age at first birth"
  ),
  missing_text = "Missing")

Characteristic	Male, N = 6,403¹	Female, N = 6,283¹
Race/ethnicity
Hispanic	1,000 (16%)	1,002 (16%)
Black	1,613 (25%)	1,561 (25%)
Non-Black, Non-Hispanic	3,790 (59%)	3,720 (59%)
Region
Northeast	1,296 (21%)	1,254 (20%)
North Central	1,488 (24%)	1,446 (23%)
South	2,251 (36%)	2,317 (38%)
West	1,253 (20%)	1,142 (19%)
Missing	115	124
Eyesight
Excellent	1,582 (38%)	1,334 (31%)
Very good	1,470 (35%)	1,500 (35%)
Good	792 (19%)	1,002 (23%)
Fair	267 (6.4%)	365 (8.5%)
Poor	47 (1.1%)	85 (2.0%)
Missing	2,245	1,997
Wears glasses	1,566 (38%)	2,328 (54%)
Missing	2,241	1,995
Age at first birth	25 (21, 29)	22 (19, 27)
Missing	3,652	3,091
¹ n (%); Median (IQR)

And do a million other things

tbl_summary(
  nlsy,
  by = sex_cat,
  include = c(sex_cat, race_eth_cat,
              eyesight_cat, glasses, age_bir),
  label = list(
    race_eth_cat ~ "Race/ethnicity",
    eyesight_cat ~ "Eyesight",
    glasses ~ "Wears glasses",
    age_bir ~ "Age at first birth"
  ),
  missing_text = "Missing") |> 
  add_p(test = list(all_continuous() ~ "t.test", 
                    all_categorical() ~ "chisq.test")) |> 
  add_overall(col_label = "**Total**") |> 
  bold_labels() |> 
  modify_footnote(update = everything() ~ NA) |> 
  modify_header(label = "**Variable**", p.value = "**P**")

Variable	Total	Male, N = 6,403	Female, N = 6,283	P
Race/ethnicity				0.8
Hispanic	2,002 (16%)	1,000 (16%)	1,002 (16%)
Black	3,174 (25%)	1,613 (25%)	1,561 (25%)
Non-Black, Non-Hispanic	7,510 (59%)	3,790 (59%)	3,720 (59%)
Eyesight				<0.001
Excellent	2,916 (35%)	1,582 (38%)	1,334 (31%)
Very good	2,970 (35%)	1,470 (35%)	1,500 (35%)
Good	1,794 (21%)	792 (19%)	1,002 (23%)
Fair	632 (7.5%)	267 (6.4%)	365 (8.5%)
Poor	132 (1.6%)	47 (1.1%)	85 (2.0%)
Missing	4,242	2,245	1,997
Wears glasses	3,894 (46%)	1,566 (38%)	2,328 (54%)	<0.001
Missing	4,236	2,241	1,995
Age at first birth	23 (20, 28)	25 (21, 29)	22 (19, 27)	<0.001
Missing	6,743	3,652	3,091

Additional arguments

We saw include =, by =, label =, missing_text = in the example

statistic =:

The default is list(all_continuous() ~ "{median} ({p25}, {p75})", all_categorical() ~ "{n} ({p}%)")
For categorical variables, you can use {n} (frequency), {N} (denominator), {p} formatted percentage
For continuous variables, you can use {median}, {mean}, {sd}, {var}, {min}, {max}, {sum}, ⁠{p##} (any percentile), or any function {foo}
You can refer to individual variables with their names: list(age ~ "min = {min}; max = {max}")

Additional arguments

digits =:

It will do its best to guess the appropriate number of digits
Otherwise, you can pass a function:
- digits = everything() ~ style_sigfig
Or a value for each statistic shown
- statistic = list(age ~ "min = {min}; max = {max}", year_of_birth = "{median} ({p25}, {p75})") :
- digits = list(age ~ c(1, 1) year_of_birth ~ c(0, 0, 0))

Additional arguments

type =:

One of “continuous”, “continuous2”, “categorical”, “dichotomous”
- If a variable only has 0/1, TRUE/FALSE, or yes/no values, it will be treated as dichotomous
  - You can override this with type = list(``varname``~ "categorical")
  - Dichotomous variables only show one row (i.e., the percentage of 1’s) unless you change to categorical
    - You can change which level to show with value = list(varname ~ "level to show")
- “continuous2” variables can have multiple rows of statistics

missing =:

Show NA values in the table (“no”, “ifany”, “always”)

Additional functions

add_overall(): In a stratified table, add a column for all strata combined
bold_labels(): Bold the variable names (also bold_levels())
add_p(): Add a p-value (required by some journals 🤷‍♀️)
modify_footnote(update = everything() ~ NA): Remove the footnotes (can also add footnotes!)
modify_header(): Change the header column

`tbl_summary()`

Incredibly customizeable
- So many options can be overwhelming
- The FAQ/gallery is an incredible resource
To save, I often just view in the web browser and copy and paste into a Word document
- Can also be used within quarto/R Markdown

Exercises

Download the script with some examples and save in your in-class project directory.
Install {gtsummary} and run the examples.

3-7. You’re on your own! Work with your neighbors, and we’ll come back together to go over these.

Extra time? Start a table using the data you downloaded for your final project! Make sure you switch to that R project!

15:00