[1] "595 (95% CI 538, 652; p<0.001)"
Characteristic | N | Beta | 95% CI1 | p-value |
---|---|---|---|---|
age_bir | 4,773 | 595 | 538, 652 | <0.001 |
sex_cat | 10,195 | |||
Male | — | — | ||
Female | -358 | -844, 128 | 0.15 | |
race_eth_cat | 10,195 | |||
Hispanic | — | — | ||
Black | -1,747 | -2,507, -988 | <0.001 | |
Non-Black, Non-Hispanic | 3,863 | 3,195, 4,530 | <0.001 | |
eyesight_cat | 6,789 | |||
Excellent | — | — | ||
Very good | -578 | -1,319, 162 | 0.13 | |
Good | -1,863 | -2,719, -1,006 | <0.001 | |
Fair | -4,674 | -5,910, -3,439 | <0.001 | |
Poor | -6,647 | -9,154, -4,140 | <0.001 | |
1 CI = Confidence Interval |
We might want to dig in a little more to those regressions
{gtsummary}
is to extract data from the table directlyWe’ll look at this again later!
{gtsummary}
is using the {broom}
package to extract the statistics from the various modelsWe could look at the model summary:
Call:
lm(formula = income ~ sex_cat, data = nlsy)
Residuals:
Min 1Q Median 3Q Max
-14880 -8880 -3943 5477 60478
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14880.3 172.6 86.237 <2e-16 ***
sex_catFemale -357.8 247.8 -1.444 0.149
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12510 on 10193 degrees of freedom
(2491 observations deleted due to missingness)
Multiple R-squared: 0.0002044, Adjusted R-squared: 0.0001064
F-statistic: 2.084 on 1 and 10193 DF, p-value: 0.1488
If we want to do something with the various values, we could extract each statistic individually:
(Intercept) sex_catFemale
14880.3152 -357.8029
2.5 % 97.5 %
(Intercept) 14542.079 15218.5512
sex_catFemale -843.608 128.0022
[1] 0.0002044429
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14880.3152 172.5521 86.236672 0.0000000
sex_catFemale -357.8029 247.8349 -1.443715 0.1488499
{broom}
has three main functions: augment()
, glance()
, tidy()
augment()
adds fitted values, residuals, and other statistics to the original data
# A tibble: 10,195 × 9
.rownames income sex_cat .fitted .resid .hat .sigma .cooksd .std.resid
<chr> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 30000 Female 14523. 15477. 0.000202 12506. 1.55e-4 1.24
2 2 20000 Female 14523. 5477. 0.000202 12507. 1.94e-5 0.438
3 3 22390 Female 14523. 7867. 0.000202 12507. 4.01e-5 0.629
4 4 22390 Female 14523. 7867. 0.000202 12507. 4.01e-5 0.629
5 5 36000 Male 14880. 21120. 0.000190 12505. 2.72e-4 1.69
6 6 35000 Male 14880. 20120. 0.000190 12505. 2.46e-4 1.61
7 7 8502 Male 14880. -6378. 0.000190 12507. 2.48e-5 -0.510
8 8 7227 Female 14523. -7296. 0.000202 12507. 3.44e-5 -0.583
9 9 17000 Male 14880. 2120. 0.000190 12507. 2.74e-6 0.170
10 10 3548 Female 14523. -10975. 0.000202 12506. 7.79e-5 -0.878
# ℹ 10,185 more rows
{broom}
has three main functions: augment()
, glance()
, tidy()
glance()
creates a table of statistics that pertain to the entire model
{broom}
has three main functions: augment()
, glance()
, tidy()
tidy()
is the most useful to me and probably you!
It extracts coefficients and confidence intervals from models
tidy()
works on over 100 statistical methods in R!Anova, ARIMA, Cox, factor analysis, fixed effects, GAM, GEE, IV, kappa, kmeans, multinomial, proportional odds, principal components, survey methods, …
For example, we might want exponentiated coefficients:
logistic_model <- glm(glasses ~ eyesight_cat + sex_cat + income,
data = nlsy, family = binomial())
tidy(logistic_model, conf.int = TRUE, exponentiate = TRUE)
# A tibble: 7 × 7
term estimate std.error statistic p.value conf.low conf.high
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.499 5.96e-2 -11.7 1.74e-31 0.444 0.560
2 eyesight_catVery good 0.920 5.96e-2 -1.39 1.64e- 1 0.819 1.03
3 eyesight_catGood 0.916 6.91e-2 -1.27 2.04e- 1 0.800 1.05
4 eyesight_catFair 0.802 1.00e-1 -2.20 2.77e- 2 0.658 0.976
5 eyesight_catPoor 1.03 2.01e-1 0.147 8.83e- 1 0.694 1.53
6 sex_catFemale 2.04 5.00e-2 14.2 5.46e-46 1.85 2.25
7 income 1.00 1.93e-6 7.49 6.95e-14 1.00 1.00
# we already made mod_sex_cat
mod_race_eth_cat <- lm(income ~ race_eth_cat, data = nlsy)
mod_eyesight_cat <- lm(income ~ eyesight_cat, data = nlsy)
mod_age_bir <- lm(income ~ age_bir, data = nlsy)
tidy_sex_cat <- tidy(mod_sex_cat, conf.int = TRUE)
tidy_race_eth_cat <- tidy(mod_race_eth_cat, conf.int = TRUE)
tidy_eyesight_cat <- tidy(mod_eyesight_cat, conf.int = TRUE)
tidy_age_bir <- tidy(mod_age_bir, conf.int = TRUE)
# A tibble: 12 × 8
model term estimate std.error statistic p.value conf.low conf.high
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 sex_cat (Inter… 14880. 173. 86.2 0 14542. 15219.
2 sex_cat Female -358. 248. -1.44 1.49e- 1 -844. 128.
3 race_eth_cat (Inter… 12867. 302. 42.7 0 12276. 13459.
4 race_eth_cat Black -1747. 387. -4.51 6.58e- 6 -2507. -988.
5 race_eth_cat Non-Bl… 3863. 341. 11.3 1.20e-29 3195. 4530.
6 eyesight_cat (Inter… 17683. 270. 65.6 0 17155. 18212.
7 eyesight_cat Very g… -578. 378. -1.53 1.26e- 1 -1319. 162.
8 eyesight_cat Good -1863. 437. -4.26 2.05e- 5 -2719. -1006.
9 eyesight_cat Fair -4674. 630. -7.42 1.35e-13 -5910. -3439.
10 eyesight_cat Poor -6647. 1279. -5.20 2.07e- 7 -9154. -4140.
11 age_bir (Inter… 1707. 733. 2.33 1.99e- 2 270. 3143.
12 age_bir age_bir 595. 29.1 20.4 3.71e-89 538. 652.
We could instead clean up the names and add reference rows with the {tidycat}
package:
tidy(logistic_model, conf.int = TRUE, exponentiate = TRUE) |>
tidycat::tidy_categorical(logistic_model, exponentiate = TRUE) |>
dplyr::select(-c(3:5))
# A tibble: 9 × 8
term estimate conf.low conf.high variable level effect reference
<chr> <dbl> <dbl> <dbl> <chr> <fct> <chr> <chr>
1 (Intercept) 0.499 0.444 0.560 (Interc… (Int… main Non-Base…
2 <NA> 1 1 1 eyesigh… Exce… main Baseline…
3 eyesight_catVery … 0.920 0.819 1.03 eyesigh… Very… main Non-Base…
4 eyesight_catGood 0.916 0.800 1.05 eyesigh… Good main Non-Base…
5 eyesight_catFair 0.802 0.658 0.976 eyesigh… Fair main Non-Base…
6 eyesight_catPoor 1.03 0.694 1.53 eyesigh… Poor main Non-Base…
7 <NA> 1 1 1 sex_cat Male main Baseline…
8 sex_catFemale 2.04 1.85 2.25 sex_cat Fema… main Non-Base…
9 income 1.00 1.00 1.00 income inco… main Non-Base…
library(ggplot2)
tidy(logistic_model, conf.int = TRUE, exponentiate = TRUE) |>
tidycat::tidy_categorical(logistic_model, exponentiate = TRUE) |>
dplyr::slice(-1) |> # remove intercept
ggplot(mapping = aes(x = level, y = estimate,
ymin = conf.low, ymax = conf.high)) +
geom_point() +
geom_errorbar() +
facet_grid(cols = vars(variable), scales = "free", space = "free") +
scale_y_log10()
broom::tidy()
to extract the results of the Poisson regression with robust standard errors and combine them with the results of the log-binomial regression.15:00