1% Generated by roxygen2: do not edit by hand 2% Please edit documentation in R/doc-params.R 3\name{dplyr_data_masking} 4\alias{dplyr_data_masking} 5\title{Argument type: data-masking} 6\description{ 7This page describes the \verb{<data-masking>} argument modifier which 8indicates that the argument uses tidy evaluation with \strong{data masking}. 9If you've never heard of tidy evaluation before, start with 10\code{vignette("programming")}. 11} 12\section{Key terms}{ 13The primary motivation for tidy evaluation in dplyr is that it provides 14\strong{data masking}, which blurs the distinction between two types of variables: 15\itemize{ 16\item \strong{env-variables} are "programming" variables and live in an environment. 17They are usually created with \verb{<-}. Env-variables can be any type of R 18object. 19\item \strong{data-variables} are "statistical" variables and live in a data frame. 20They usually come from data files (e.g. \code{.csv}, \code{.xls}), or are created by 21manipulating existing variables. Data-variables live inside data frames, 22so must be vectors. 23} 24} 25 26\section{General usage}{ 27Data masking allows you to refer to variables in the "current" data frame 28(usually supplied in the \code{.data} argument), without any other prefix. 29It's what allows you to type (e.g.) \code{filter(diamonds, x == 0 & y == 0 & z == 0)} 30instead of \code{diamonds[diamonds$x == 0 & diamonds$y == 0 & diamonds$z == 0, ]}. 31} 32 33\section{Indirection}{ 34The main challenge of data masking arises when you introduce some 35indirection, i.e. instead of directly typing the name of a variable you 36want to supply it in a function argument or character vector. 37 38There are two main cases: 39\itemize{ 40\item If you want the user to supply the variable (or function of variables) 41in a function argument, embrace the argument, e.g. \code{filter(df, {{ var }})}.\preformatted{dist_summary <- function(df, var) \{ 42 df \%>\% 43 summarise(n = n(), min = min(\{\{ var \}\}), max = max(\{\{ var \}\})) 44\} 45mtcars \%>\% dist_summary(mpg) 46mtcars \%>\% group_by(cyl) \%>\% dist_summary(mpg) 47} 48\item If you have the column name as a character vector, use the \code{.data} 49pronoun, e.g. \code{summarise(df, mean = mean(.data[[var]]))}.\preformatted{for (var in names(mtcars)) \{ 50 mtcars \%>\% count(.data[[var]]) \%>\% print() 51\} 52 53lapply(names(mtcars), function(var) mtcars \%>\% count(.data[[var]])) 54} 55} 56} 57 58\section{Dot-dot-dot (...)}{ 59When this modifier is applied to \code{...}, there is one other useful technique 60which solves the problem of creating a new variable with a name supplied by 61the user. Use the interpolation syntax from the glue package: \code{"{var}" := expression}. (Note the use of \verb{:=} instead of \code{=} to enable this syntax).\preformatted{var_name <- "l100km" 62mtcars \%>\% mutate("\{var_name\}" := 235 / mpg) 63} 64 65Note that \code{...} automatically provides indirection, so you can use it as is 66(i.e. without embracing) inside a function:\preformatted{grouped_mean <- function(df, var, ...) \{ 67 df \%>\% 68 group_by(...) \%>\% 69 summarise(mean = mean(\{\{ var \}\})) 70\} 71} 72} 73 74\keyword{internal} 75