1---
2title: "Programming with tidyr"
3output: rmarkdown::html_vignette
4description: |
5  Notes on programming with tidy evaluation as it relates to tidyr.
6vignette: >
7  %\VignetteIndexEntry{Programming with tidyr}
8  %\VignetteEngine{knitr::rmarkdown}
9  %\usepackage[utf8]{inputenc}
10---
11
12```{r setup, echo = FALSE, message = FALSE}
13knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
14options(tibble.print_min = 6L, tibble.print_max = 6L)
15set.seed(1014)
16
17# Manually "import"; only needed for old dplyr which uses old tidyselect
18# which doesn't attach automatically in tidy-select contexts
19all_of <- tidyselect::all_of
20```
21
22## Introduction
23
24Most tidyr verbs use **tidy evaluation** to make interactive data exploration fast and fluid. Tidy evaluation is a special type of non-standard evaluation used throughout the tidyverse. Here's some typical tidyr code:
25
26```{r}
27library(tidyr)
28
29iris %>%
30  nest(data = !Species)
31```
32
33Tidy evaluation is why we can use `!Species` to say "all the columns except `Species`", without having to quote the column name (`"Species"`) or refer to the enclosing data frame (`iris$Species`).
34
35Two basic forms of tidy evaluation are used in tidyr:
36
37* **Tidy selection**: `drop_na()`, `fill()`, `pivot_longer()`/`pivot_wider()`,
38  `nest()`/`unnest()`, `separate()`/`extract()`, and `unite()` let you select
39  variables based on position, name, or type (e.g. `1:3`, `starts_with("x")`, or `is.numeric`). Literally, you can use all the same techniques as with
40  `dplyr::select()`.
41
42* **Data masking**: `expand()`, `crossing()` and `nesting()` let you refer to
43  use data variables as if they were variables in the environment (i.e. you
44  write `my_variable` not `df$myvariable`).
45
46We focus on tidy selection here, since it's the most common. You can learn more about data masking in the equivalent vignette in dplyr: <https://dplyr.tidyverse.org/dev/articles/programming.html>. For other considerations when writing tidyr code in packages, please see `vignette("in-packages")`.
47
48We've pointed out that tidyr's tidy evaluation interface is optimized for interactive exploration. The flip side is that this adds some challenges to indirect use, i.e. when you're working inside a `for` loop or a function. This vignette shows you how to overcome those challenges. We'll first go over the basics of tidy selection and data masking, talk about how to use them indirectly, and then show you a number of recipes to solve common problems.
49
50Before we go on, we reveal the version of tidyr we're using and make a small dataset to use in examples.
51
52```{r}
53packageVersion("tidyr")
54
55mini_iris <- as_tibble(iris)[c(1, 2, 51, 52, 101, 102), ]
56mini_iris
57```
58
59## Tidy selection
60
61Underneath all functions that use tidy selection is the [tidyselect](https://tidyselect.r-lib.org/) package. It provides a miniature domain specific language that makes it easy to select columns by name, position, or type. For example:
62
63* `select(df, 1)` selects the first column;
64  `select(df, last_col())` selects the last column.
65
66* `select(df, c(a, b, c))` selects columns `a`, `b`, and `c`.
67
68* `select(df, starts_with("a"))` selects all columns whose name starts with "a";
69  `select(df, ends_with("z"))` selects all columns whose name ends with "z".
70
71* `select(df, where(is.numeric))` selects all numeric columns.
72
73You can see more details in `?tidyr_tidy_select`.
74
75### Indirection
76
77Tidy selection makes a common task easier at the cost of making a less common task harder. When you want to use tidy select indirectly with the column specification stored in an intermediate variable, you'll need to learn some new tools. There are three main cases where this comes up:
78
79*   When you have the tidy-select specification in a function argument, you
80    must **embrace** the argument by surrounding it in doubled braces.
81
82    ```{r}
83    nest_egg <- function(df, cols) {
84      nest(df, egg = {{ cols }})
85    }
86
87    nest_egg(mini_iris, !Species)
88    ```
89
90*   When you have a character vector of variable names, you must use `all_of()`
91    or `any_of()` depending on whether you want the function to error if a
92    variable is not found. These functions allow you to write for loops or a
93    function that takes variable names as a character vector.
94
95    ```{r}
96    nest_egg <- function(df, cols) {
97      nest(df, egg = all_of(cols))
98    }
99
100    vars <- c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width")
101    nest_egg(mini_iris, vars)
102    ```
103
104*   In more complicated cases, you might want to use tidyselect directly:
105
106    ```{r}
107    sel_vars <- function(df, cols) {
108      tidyselect::eval_select(rlang::enquo(cols), df)
109    }
110    sel_vars(mini_iris, !Species)
111    ```
112
113    Learn more in `vignette("tidyselect")`.
114
115Note that many tidyr functions use `...` so you can easily select many variables, e.g. `fill(df, x, y, z)`. I now believe that the disadvantages of this approach outweigh the benefits, and that this interface would have been better as `fill(df, c(x, y, z))`. For new functions that select columns, please just use a single argument and not `...`.
116