{targets}
{targets}
?“a Make-like pipeline tool for statistics and data science in R”
01-data.R
library(tidyverse)
data <- read_csv("data.csv", col_types = cols()) %>%
filter(!is.na(Ozone))
write_rds(data, "data.rds")
02-model.R
library(tidyverse)
data <- read_rds("data.rds")
model <- lm(Ozone ~ Temp, data) %>%
coefficients()
write_rds(model, "model.rds")
03-plot.R
{targets}
workflowR/functions.R
get_data <- function(file) {
read_csv(file, col_types = cols()) %>%
filter(!is.na(Ozone))
}
fit_model <- function(data) {
lm(Ozone ~ Temp, data) %>%
coefficients()
}
plot_model <- function(model, data) {
ggplot(data) +
geom_point(aes(x = Temp, y = Ozone)) +
geom_abline(intercept = model[1], slope = model[2])
}
{targets}
workflow_targets.R
library(targets)
tar_source()
tar_option_set(packages = c("tidyverse"))
list(
tar_target(file, "data.csv", format = "file"),
tar_target(data, get_data(file)),
tar_target(model, fit_model(data)),
tar_target(plot, plot_model(model, data))
)
Run tar_make()
to run pipeline
{targets}
workflowTargets are “hidden” away where you don’t need to manage them
├── _targets.R
├── data.csv
├── R/
│ ├── functions.R
├── _targets/
│ ├── objects
│ ├── data
│ ├── model
│ ├── plot
{targets}
R/
.use_targets()
and edit _targets.R
accordingly, so that I list the data file as a target and clean_data
as the output of the cleaning function.tar_make()
.tar_load(clean_data)
so that I can work on the next step of my workflow._targets.R
tips and tricks_targets.R
tips and tricks_targets.R
tips and tricks{targets}
functionsuse_targets()
gets you started with a _targets.R
script to fill intar_make()
runs the pipeline and saves the results in _targets/objects/
tar_make_future()
runs the pipeline in parallel1tar_load()
loads the results of a target into the global environmenttar_load(clean_data)
)tar_read()
reads the results of a target into the global environmentdat <- tar_read(clean_data)
)tar_visnetwork()
creates a network diagram of the pipelinetar_outdated()
checks which targets need to be updatedtar_prune()
deletes targets that are no longer in _targets.R
tar_destroy()
deletes the .targets/
directory if you need to burn everything down and start again{targets}
{tarchetypes}
: reportsRender documents that depend on targets loaded with tar_load()
or tar_read()
.
tar_render()
renders an R Markdown documenttar_quarto()
renders a Quarto document (or project)report.qmd
look like?---
title: "My report"
---
```{r}
library(targets)
tar_load(results)
tar_load(plots)
```
There were `r results$n` observations with a mean age of `r results$mean_age`.
```{r}
library(ggplot2)
plots$age_plot
```
Because report.qmd
depends on results
and plots
, it will only be re-rendered if either of those targets change.
{tarchetypes}
: branchingUsing data from the National Longitudinal Survey of Youth,
_targets.R
we want to investigate the relationship between age at first birth and hours of sleep on weekdays and weekends among moms and dads separately
Create (and name) a separate target for each combination of sleep variable ("sleep_wkdy"
, "sleep_wknd"
) and sex (male: 1
, female: 2
):
targets_1 <- list(
tar_target(
model_1,
model_function(outcome_var = "sleep_wkdy", sex_val = 1, dat = dat)
),
tar_target(
coef_1,
coef_function(model_1)
)
)
… and so on…
Use tarchetypes::tar_map()
to map over the combinations for you (static branching):
targets_2 <- tar_map(
values = tidyr::crossing(
outcome = c("sleep_wkdy", "sleep_wknd"),
sex = 1:2
),
tar_target(
model_2,
model_function(outcome_var = outcome, sex_val = sex, dat = dat)
),
tar_target(
coef_2,
coef_function(model_2)
)
)
tar_load(starts_with("coef_2"))
c(coef_2_sleep_wkdy_1, coef_2_sleep_wkdy_2, coef_2_sleep_wknd_1, coef_2_sleep_wknd_2)
Use tarchetypes::tar_combine()
to combine the results of a call to tar_map()
:
combined <- tar_combine(
combined_coefs_2,
targets_2[["coef_2"]],
command = vctrs::vec_c(!!!.x),
)
tar_read(combined_coefs_2)
command = vctrs::vec_c(!!!.x)
is the default, but you can supply your own function to combine the results
Use the pattern =
argument of tar_target()
(dynamic branching):
targets_3 <- list(
tar_target(
outcome_target,
c("sleep_wkdy", "sleep_wknd")
),
tar_target(
sex_target,
1:2
),
tar_target(
model_3,
model_function(outcome_var = outcome_target, sex_val = sex_target, dat = dat),
pattern = cross(outcome_target, sex_target)
),
tar_target(
coef_3,
coef_function(model_3),
pattern = map(model_3)
)
)
tar_read(coef_3)
Dynamic | Static |
---|---|
Pipeline creates new targets at runtime. | All targets defined in advance. |
Cryptic target names. | Friendly target names. |
Scales to hundreds of branches. | Does not scale as easily for tar_visnetwork() etc. |
No metaprogramming required. | Familiarity with metaprogramming is helpful. |
tar_map(values = ..., tar_target(..., pattern = map(...)))
{tarchetypes}
: repetitiontar_rep()
repeats a target multiple times with the same arguments
targets_4 <- list(
tar_rep(
bootstrap_coefs,
dat |>
dplyr::slice_sample(prop = 1, replace = TRUE) |>
model_function(outcome_var = "sleep_wkdy", sex_val = 1, dat = _) |>
coef_function(),
batches = 10,
reps = 10
)
)
The pipeline gets split into batches
x reps
chunks, each with its own random seed
{tarchetypes}
: mapping over iterationssensitivity_scenarios <- tibble::tibble(
error = c("small", "medium", "large"),
mean = c(1, 2, 3),
sd = c(0.5, 0.75, 1)
)
tar_map_rep()
repeats a target multiple times with different arguments
{tarchetypes}
: mapping over iterationsIdeal for sensitivity analyses that require multiple iterations of the same pipeline with different parameters
{targets}
is a great tool for managing complex workflows{tarchetypes}
makes it even more powerfulWe’ll clone a repo with {targets}
already set up and add some additional steps to the analysis.