在dplyr 0.7.0+中正确使用dplyr :: select,使用字符向量选择列 [英] Correct usage of dplyr::select in dplyr 0.7.0+, selecting columns using character vector

查看:491
本文介绍了在dplyr 0.7.0+中正确使用dplyr :: select,使用字符向量选择列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我们有一个字符向量cols_to_select,其中包含一些我们想从数据帧df中选择的列,例如

Suppose we have a character vector cols_to_select containing some columns we want to select from a dataframe df, e.g.

df <- tibble::data_frame(a=1:3, b=1:3, c=1:3, d=1:3, e=1:3)
cols_to_select <- c("b", "d")

假设我们还想使用dplyr::select,因为它是使用%>%的操作的一部分,因此使用select可使代码易于阅读.

Suppose also we want to use dplyr::select because it's part of an operation that uses %>% so using select makes the code easy to read.

似乎有许多方法可以实现,但是有些方法比其他方法更健壮.请您让我知道哪个是正确的"版本,为什么?也许还有另一种更好的方法?

There seem to be a number of ways which this can be achieved, but some are more robust than others. Please could you let me know which is the 'correct' version and why? Or perhaps there is another, better way?

dplyr::select(df, cols_to_select) #Fails if 'cols_to_select' happens to be the name of a column in df 
dplyr::select(df, !!cols_to_select) # i.e. using UQ()
dplyr::select(df, !!!cols_to_select) # i.e. using UQS()

cols_to_select_syms <- rlang::syms(c("b", "d"))  #See [here](https://stackoverflow.com/questions/44656993/how-to-pass-a-named-vector-to-dplyrselect-using-quosures/44657171#44657171)
dplyr::select(df, !!!cols_to_select_syms)

p.s.我意识到这可以使用df[,cols_to_select]

p.s. I realise this can be achieved in base R using simply df[,cols_to_select]

推荐答案

dplyr::select的示例/tidy-evaluation.html"rel =" noreferrer> https://cran.r-project.org/web/packages/rlang/vignettes/tidy-evaluation.html 使用:

There is an example with dplyr::select in https://cran.r-project.org/web/packages/rlang/vignettes/tidy-evaluation.html that uses:

dplyr::select(df, !!cols_to_select)

为什么?让我们探索您提到的选项:

Why? Let's explore the options you mention:

dplyr::select(df, cols_to_select)

正如您所说,如果cols_to_select恰好是df中的列名,则此操作将失败,所以这是错误的.

As you say this fails if cols_to_select happens to be the name of a column in df, so this is wrong.

cols_to_select_syms <- rlang::syms(c("b", "d"))  
dplyr::select(df, !!!cols_to_select_syms)

这看起来比其他解决方案更复杂.

This looks more convoluted than the other solutions.

dplyr::select(df, !!cols_to_select)
dplyr::select(df, !!!cols_to_select)

在这种情况下,这两个解决方案提供相同的结果.您可以通过执行以下操作来查看!!cols_to_select!!!cols_to_select的输出:

These two solutions provide the same results in this case. You can see the output of !!cols_to_select and !!!cols_to_select by doing:

dput(rlang::`!!`(cols_to_select)) # c("b", "d")
dput(rlang::`!!!`(cols_to_select)) # pairlist("b", "d")

!!UQ()运算符会在上下文中立即评估其参数,这就是您想要的.

The !! or UQ() operator evaluates its argument immediately in its context, and that is what you want.

!!!UQS()运算符用于一次将多个参数传递给一个函数.

The !!! or UQS() operator are used to pass multiple arguments at once to a function.

对于像示例中的字符列名称,将它们指定为长度为2的单个向量(使用!!)还是使用长度为1的两个向量的列表(使用!!!)都没有关系.对于更复杂的用例,您将需要使用多个参数作为列表:(使用!!!)

For character column names like in your example it does not matter if you give them as a single vector of length 2 (using !!) or as a list with two vectors of length one (using !!!). For more complex use cases you will need to use multiple arguments as a list: (using !!!)

a <- quos(contains("c"), dplyr::starts_with("b"))
dplyr::select(df, !!a) # does not work
dplyr::select(df, !!!a) # does work

这篇关于在dplyr 0.7.0+中正确使用dplyr :: select,使用字符向量选择列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆