1# tidyselect 1.1.1 2 3* Fix for CRAN checks. 4 5* tidyselect has been re-licensed as MIT (#217). 6 7 8# tidyselect 1.1.0 9 10* Predicate functions must now be wrapped with `where()`. 11 12 ```{r} 13 iris %>% select(where(is.factor)) 14 ``` 15 16 We made this change to avoid puzzling error messages when a variable 17 is unexpectedly missing from the data frame and there is a 18 corresponding function in the environment: 19 20 ```{r} 21 # Attempts to invoke `data()` function 22 data.frame(x = 1) %>% select(data) 23 ``` 24 25 Now tidyselect will correctly complain about a missing variable 26 rather than trying to invoke a function. 27 28 For compatibility we will support predicate functions starting with 29 `is` for 1 version. 30 31* `eval_select()` gains an `allow_rename` argument. If set to `FALSE`, 32 renaming variables with the `c(foo = bar)` syntax is an error. 33 This is useful to implement purely selective behaviour (#178). 34 35* Fixed issue preventing repeated deprecation messages when 36 `tidyselect_verbosity` is set to `"verbose"` (#184). 37 38* `any_of()` now preserves the order of the input variables (#186). 39 40* The return value of `eval_select()` is now always named, even when 41 inputs are constant (#173). 42 43 44# tidyselect 1.0.0 45 46This is the 1.0.0 release of tidyselect. It features a more solidly 47defined and implemented syntax, support for predicate functions, new 48boolean operators, and much more. 49 50 51## Documentation 52 53* New Get started vignette for client packages. Read it with 54 `vignette("tidyselect")` or at 55 <https://tidyselect.r-lib.org/articles/tidyselect.html>. 56 57* The definition of the tidyselect language has been consolidated. A 58 technical description is now available: 59 <https://tidyselect.r-lib.org/articles/syntax.html>. 60 61 62## Breaking changes 63 64* Selecting non-column variables with bare names now triggers an 65 informative message suggesting to use `all_of()` instead. Referring 66 to contextual objects with a bare name is brittle because it might 67 be masked by a data frame column. Using `all_of()` is safe (#76). 68 69tidyselect now uses vctrs for validating inputs. These changes may 70reveal programming errors that were previously silent. They may also 71cause failures if your unit tests make faulty assumptions about the 72content of error messages created in tidyselect: 73 74* Out-of-bounds errors are thrown when a name doesn't exist or a 75 location is too large for the input. 76 77* Logical vectors now fail properly. 78 79* Selected variables now must be unique. It was previously possible to 80 return duplicate selections in some circumstances. 81 82* The input names can no longer contain `NA` values. 83 84Note that we recommend `testthat::verify_output()` for monitoring 85error messages thrown from packages that you don't control. Unlike 86`expect_error()`, `verify_output()` does not cause CMD check failures 87when error messages have changed. See 88<https://www.tidyverse.org/blog/2019/11/testthat-2-3-0/> for more 89information. 90 91 92## Syntax 93 94* The boolean operators can now be used to create selections (#106). 95 96 - `!` negates a selection. 97 - `|` takes the union of two selections. 98 - `&` takes the intersection of two selections. 99 100 These patterns can currently be achieved using `-`, `c()` and 101 `intersect()` respectively. The boolean operators should be more 102 intuitive to use. 103 104 Many thanks to Irene Steves (@isteves) for suggesting this UI. 105 106* You can now use predicate functions in selection contexts: 107 108 ```r 109 iris %>% select(is.factor) 110 iris %>% select(is.factor | is.numeric) 111 ``` 112 113 This feature is not available in functions that use the legacy 114 interface of tidyselect. These need to be updated to use 115 the new `eval_select()` function instead of `vars_select()`. 116 117* Unary `-` inside nested `c()` is now consistently syntax for set 118 difference (#130). 119 120* Improved support for named elements. It is now possible to assign 121 the same name to multiple elements, if the input data structure 122 doesn't require unique names (i.e. anything but a data frame). 123 124* The selection engine has been rewritten to support a clearer 125 separation between data-expressions (calls to `:`, `-`, and `c`) and 126 env-expressions (anything else). This means you can now safely use 127 expressions of the type: 128 129 ```r 130 data %>% select(1:ncol(data)) 131 data %>% pivot_longer(1:ncol(data)) 132 ``` 133 134 Even if the data frame `data` contains a column also named `data`, 135 the subexpression `ncol(data)` is still correctly evaluated. 136 The `data:ncol(data)` expression is equivalent to `2:3` because 137 `data` is looked up in the relevant context without ambiguity: 138 139 ```r 140 data <- tibble(foo = 1, data = 2, bar = 3) 141 data %>% dplyr::select(data:ncol(data)) 142 #> # A tibble: 1 x 2 143 #> data bar 144 #> <dbl> <dbl> 145 #> 1 2 3 146 ``` 147 148 While this example above is a bit contrived, there are many realistic 149 cases where these changes make it easier to write safe code: 150 151 ```{r} 152 select_from <- function(data, var) { 153 data %>% dplyr::select({{ var }} : ncol(data)) 154 } 155 data %>% select_from(data) 156 #> # A tibble: 1 x 2 157 #> data bar 158 #> <dbl> <dbl> 159 #> 1 2 3 160 ``` 161 162 163## User-facing improvements 164 165* The new selection helpers `all_of()` and `any_of()` are strict 166 variants of `one_of()`. The former always fails if some variables 167 are unknown, while the latter does not. `all_of()` is safer to use 168 when you expect all selected variables to exist. `any_of()` is 169 useful in other cases, for instance to ensure variables are selected 170 out: 171 172 ``` 173 vars <- c("Species", "Genus") 174 iris %>% dplyr::select(-any_of(vars)) 175 ``` 176 177 Note that `all_of()` and `any_of()` are a bit more conservative in 178 their function signature than `one_of()`: they do not accept dots. 179 The equivalent of `one_of("a", "b")` is `all_of(c("a", "b"))`. 180 181* Selection helpers like `all_of()` and `starts_with()` are now 182 available in all selection contexts, even when they haven't been 183 attached to the search path. The most visible consequence of this 184 change is that it is now easier to use selection functions without 185 attaching the host package: 186 187 ```r 188 # Before 189 dplyr::select(mtcars, dplyr::starts_with("c")) 190 191 # After 192 dplyr::select(mtcars, starts_with("c")) 193 ``` 194 195 It is still recommended to export the helpers from your package so 196 that users can easily look up the documentation with `?`. 197 198* `starts_with()`, `ends_with()`, `contains()`, and `matches()` now 199 accept vector inputs (#50). For instance these are now equivalent 200 ways of selecting all variables that start with either `"a"` or `"b"`: 201 202 ```{r} 203 starts_with(c("a", "b")) 204 starts_with("a") | starts_with("b") 205 ``` 206 207* `matches()` has new argument `perl` to allow for Perl-like regular 208 expressions (@fmichonneau, #71) 209 210* Better support for selecting with S3 vectors. For instance, factors 211 are treated as characters. 212 213 214## API 215 216New `eval_select()` and `eval_rename()` functions for client 217packages. These replace `vars_select()` and `vars_rename()`, which are 218now deprecated. These functions: 219 220* Take the full data rather than just names. This makes it possible to 221 use function predicates in selection context. 222 223* Return a numeric vector of locations rather than a vector of 224 names. This makes it possible to use tidyselect with inputs that 225 support duplicate names, like regular vectors. 226 227 228## Other features and fixes 229 230* The `.strict` argument of `vars_select()` now works more robustly 231 and consistently. 232 233* Using arithmetic operators in selection context now fails more 234 informatively (#84). 235 236* It is now possible to select columns in data frames containing 237 duplicate variables (#94). However, the duplicates can't be part of 238 the final selection. 239 240* `eval_rename()` no longer ignore the names of unquoted character 241 vectors of length 1 (#79). 242 243* `eval_rename()` now fails when a variable is renamed to an existing 244 name (#70). 245 246* `eval_rename()` has better support for existing duplicates (but 247 creating new duplicates is an error). 248 249* `eval_select()`, `eval_rename()` and `vars_pull()` now detect 250 missing values uniformly (#72). 251 252* `vars_pull()` now includes the faulty expression in error messages. 253 254* The performance issues of `eval_rename()` with many arguments have 255 been fixed. This make `dplyr::rename_all()` with many columns much 256 faster (@zkamvar, #92). 257 258* tidyselect is now much faster with many columns, thanks to a 259 performance fix in `rlang::env_bind()` as well as internal fixes. 260 261* `vars_select()` ignores vectors with only zeros (#82). 262 263 264# tidyselect 0.2.5 265 266This is a maintenance release for compatibility with rlang 0.3.0. 267 268 269# tidyselect 0.2.4 270 271* Fixed a warning that occurred when a vector of column positions was 272 supplied to `vars_select()` or functions depending on it such as 273 `tidyr::gather()` (#43 and tidyverse/tidyr#374). 274 275* Fixed compatibility issue with rlang 0.2.0 (#51). 276 277 278# tidyselect 0.2.3 279 280* Internal fixes in prevision of using `tidyselect` within `dplyr`. 281 282* `vars_select()` and `vars_rename()` now correctly support unquoting 283 character vectors that have names. 284 285* `vars_select()` now ignores missing variables. 286 287 288# tidyselect 0.2.2 289 290* `dplyr` is now correctly mentioned as suggested package. 291 292 293# tidyselect 0.2.1 294 295* `-` now supports character vectors in addition to strings. This 296 makes it easy to unquote column names to exclude from the set: 297 298 ```{r} 299 vars <- c("cyl", "am", "disp", "drat") 300 vars_select(names(mtcars), - !!vars) 301 ``` 302 303* `last_col()` now issues an error when the variable vector is empty. 304 305* `last_col()` now returns column positions rather than column names 306 for consistency with other helpers. This also makes it compatible 307 with functions like `seq()`. 308 309* `c()` now supports character vectors the same way as `-` and `seq()`. 310 (#37 @gergness) 311 312 313# tidyselect 0.2.0 314 315The main point of this release is to revert a troublesome behaviour 316introduced in tidyselect 0.1.0. It also includes a few features. 317 318 319## Evaluation rules 320 321The special evaluation semantics for selection have been changed 322back to the old behaviour because the new rules were causing too 323much trouble and confusion. From now on data expressions (symbols 324and calls to `:` and `c()`) can refer to both registered variables 325and to objects from the context. 326 327However the semantics for context expressions (any calls other than 328to `:` and `c()`) remain the same. Those expressions are evaluated 329in the context only and cannot refer to registered variables. 330 331If you're writing functions and refer to contextual objects, it is 332still a good idea to avoid data expressions. Since registered 333variables are change as a function of user input and you never know 334if your local objects might be shadowed by a variable. Consider: 335 336``` 337n <- 2 338vars_select(letters, 1:n) 339``` 340 341Should that select up to the second element of `letters` or up to 342the 14th? Since the variables have precedence in a data expression, 343this will select the 14 first letters. This can be made more robust 344by turning the data expression into a context expression: 345 346``` 347vars_select(letters, seq(1, n)) 348``` 349 350You can also use quasiquotation since unquoted arguments are 351guaranteed to be evaluated without any user data in scope. While 352equivalent because of the special rules for context expressions, 353this may be clearer to the reader accustomed to tidy eval: 354 355```{r} 356vars_select(letters, seq(1, !! n)) 357``` 358 359Finally, you may want to be more explicit in the opposite direction. 360If you expect a variable to be found in the data but not in the 361context, you can use the `.data` pronoun: 362 363```{r} 364vars_select(names(mtcars), .data$cyl : .data$drat) 365``` 366 367## New features 368 369* The new select helper `last_col()` is helpful to select over a 370 custom range: `vars_select(vars, 3:last_col())`. 371 372* `:` and `-` now handle strings as well. This makes it easy to 373 unquote a column name: `(!!name) : last_col()` or `- !!name`. 374 375* `vars_select()` gains a `.strict` argument similar to 376 `rename_vars()`. If set to `FALSE`, errors about unknown variables 377 are ignored. 378 379* `vars_select()` now treats `NULL` as empty inputs. This follows a 380 trend in the tidyverse tools. 381 382* `vars_rename()` now handles variable positions (integers or round 383 doubles) just like `vars_select()` (#20). 384 385* `vars_rename()` is now implemented with the tidy eval framework. 386 Like `vars_select()`, expressions are evaluated without any user 387 data in scope. In addition a variable context is now established so 388 you can write rename helpers. Those should return a single round 389 number or a string (variable position or variable name). 390 391* `has_vars()` is a predicate that tests whether a variable context 392 has been set (#21). 393 394* The selection helpers are now exported in a list 395 `vars_select_helpers`. This is intended for APIs that embed the 396 helpers in the evaluation environment. 397 398 399## Fixes 400 401* `one_of()` argument `vars` has been renamed to `.vars` to avoid 402 spurious matching. 403 404 405# tidyselect 0.1.1 406 407tidyselect is the new home for the legacy functions 408`dplyr::select_vars()`, `dplyr::rename_vars()` and 409`dplyr::select_var()`. 410 411 412## API changes 413 414We took this opportunity to make a few changes to the API: 415 416* `select_vars()` and `rename_vars()` are now `vars_select()` and 417 `vars_rename()`. This follows the tidyverse convention that a prefix 418 corresponds to the input type while suffixes indicate the output 419 type. Similarly, `select_var()` is now `vars_pull()`. 420 421* The arguments are now prefixed with dots to limit argument matching 422 issues. While the dots help, it is still a good idea to splice a 423 list of captured quosures to make sure dotted arguments are never 424 matched to `vars_select()`'s named arguments: 425 426 ``` 427 vars_select(vars, !!! quos(...)) 428 ``` 429 430* Error messages can now be customised. For consistency with dplyr, 431 error messages refer to "columns" by default. This assumes that the 432 variables being selected come from a data frame. If this is not 433 appropriate for your DSL, you can now add an attribute `vars_type` 434 to the `.vars` vector to specify alternative names. This must be a 435 character vector of length 2 whose first component is the singular 436 form and the second is the plural. For example, `c("variable", 437 "variables")`. 438 439 440## Establishing a variable context 441 442tidyselect provides a few more ways of establishing a variable 443context: 444 445* `scoped_vars()` sets up a variable context along with an an exit 446 hook that automatically restores the previous variables. It is the 447 preferred way of changing the variable context. 448 449 `with_vars()` takes variables and an expression and evaluates the 450 latter in the context of the former. 451 452* `poke_vars()` establishes a new variable context. It returns the 453 previous context invisibly and it is your responsibility to restore 454 it after you are done. This is for expert use only. 455 456 `current_vars()` has been renamed to `peek_vars()`. This naming is a 457 reference to [peek and poke](https://en.wikipedia.org/wiki/PEEK_and_POKE) 458 from legacy languages. 459 460 461## New evaluation semantics 462 463The evaluation semantics for selecting verbs have changed. Symbols are 464now evaluated in a data-only context that is isolated from the calling 465environment. This means that you can no longer refer to local variables 466unless you are explicitly unquoting these variables with `!!`, which 467is mostly for expert use. 468 469Note that since dplyr 0.7, helper calls (like `starts_with()`) obey 470the opposite behaviour and are evaluated in the calling context 471isolated from the data context. To sum up, symbols can only refer to 472data frame objects, while helpers can only refer to contextual 473objects. This differs from usual R evaluation semantics where both 474the data and the calling environment are in scope (with the former 475prevailing over the latter). 476