当您具有colnames的字符向量时,如何不使用select()dplyr选择列? [英] How NOT to select columns using select() dplyr when you have character vector of colnames?
问题描述
我正在尝试使用dplyr取消选择数据集中的列,但自昨晚以来一直无法实现。
I am trying to unselect columns in my dataset using dplyr, but I am not able to achieve that since last night.
我很清楚周围的工作,但是我正在严格尝试通过dplyr查找答案。
I am well aware of work around but I am being strictly trying to find answer just through dplyr.
library(dplyr)
df <- tibble(x = c(1,2,3,4), y = c('a','b','c','d'))
df %>% select(-c('x'))
给我一个错误:-c( x)错误:一元运算符
Gives me an error : Error in -c("x") : invalid argument to unary operator
现在,我知道select接受未加引号的值,但是我无法以这种方式进行子选择。
Now, I know that select takes in unquoted values but I am not able to sub-select in this fashion.
请注意,上面的数据集只是一个示例,我们可以有很多列。
Please note the above dataset is just an example, we can have many columns.
谢谢,
Prerit
推荐答案
编辑:OP的实际问题是关于如何使用字符向量从数据框中选择或取消选择列。为此使用 one_of()
辅助函数:
OP's actual question was about how to use a character vector to select or deselect columns from a dataframe. Use the one_of()
helper function for that:
colnames(iris)
# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
cols <- c("Petal.Length", "Sepal.Length")
select(iris, one_of(cols)) %>% colnames
# [1] "Petal.Length" "Sepal.Length"
select(iris, -one_of(cols)) %>% colnames
# [1] "Sepal.Width" "Petal.Width" "Species"
您应该查看选择的辅助对象(键入 ?select_helpers
),因为它们非常有用。从文档中:
You should have a look at the select helpers (type ?select_helpers
) because they're incredibly useful. From the docs:
starts_with()
:以前缀开头
ends_with()
:以前缀结尾
contains()
:包含文字字符串
matches()
:匹配正则表达式
matches()
: matches a regular expression
num_range()
:一个数值范围,例如x01,x02,x03。
num_range()
: a numerical range like x01, x02, x03.
one_of()
:字符向量中的变量。
everything()
:所有变量。
给出带有列的数据框命名a:z,使用 select
像这样:
Given a dataframe with columns names a:z, use select
like this:
select(-a, -b, -c, -d, -e)
# OR
select(-c(a, b, c, d, e))
# OR
select(-(a:e))
# OR if you want to keep b
select(-a, -(c:e))
# OR a different way to keep b, by just putting it back in
select(-(a:e), b)
所以我想省略两个t他来自 iris
数据集的列,我可以说:
So if I wanted to omit two of the columns from the iris
dataset, I could say:
colnames(iris)
# [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"
select(iris, -c(Sepal.Length, Petal.Length)) %>% colnames()
# [1] "Sepal.Width" "Petal.Width" "Species"
当然,最好的和最简洁的方法是使用 select $ c之一$ c>的帮助函数:
But of course, the best and most concise way to achieve that is using one of select
's helper functions:
select(iris, -ends_with(".Length")) %>% colnames()
# [1] "Sepal.Width" "Petal.Width" "Species"
PS您将引用的值传递给 dplyr
很奇怪,它的一大优点是您不必总是在所有输入时间。如您所见,裸值与 dplyr
和 ggplot2
可以很好地工作。
P.S. It's weird that you are passing quoted values to dplyr
, one of its big niceties is that you don't have to keep typing out quotes all the time. As you can see, bare values work fine with dplyr
and ggplot2
.
这篇关于当您具有colnames的字符向量时,如何不使用select()dplyr选择列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!