在dplyr中的字符串列上过滤多个值 [英] Filter multiple values on a string column in dplyr
问题描述
我有一个 data.frame
在其中一列中包含字符数据。
我想过滤同一列 data.frame
中的多个选项。
I have a data.frame
with character data in one of the columns.
I would like to filter multiple options in the data.frame
from the same column. Is there an easy way to do this that I'm missing?
示例:
数据,是否有简单的方法让我丢失?框架
名称= dat
days name
88 Lynn
11 Tom
2 Chris
5 Lisa
22 Kyla
1 Tom
222 Lynn
2 Lynn
我想过滤出 Tom
和<例如code> Lynn 。
当我这样做时:
I'd like to filter out Tom
and Lynn
for example.
When I do:
target <- c("Tom", "Lynn")
filt <- filter(dat, name == target)
我收到此错误:
longer object length is not a multiple of shorter object length
推荐答案
您需要%in %
而不是 ==
:
library(dplyr)
target <- c("Tom", "Lynn")
filter(dat, name %in% target) # equivalently, dat %>% filter(name %in% target)
生产
days name
1 88 Lynn
2 11 Tom
3 1 Tom
4 222 Lynn
5 2 Lynn
要了解原因,请考虑此处发生的情况:
To understand why, consider what happens here:
dat$name == target
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
基本上,我们将两个长度 target
的向量回收四次,以匹配 dat $ name $ c的长度$ c>。换句话说,我们正在做:
Basically, we're recycling the two length target
vector four times to match the length of dat$name
. In other words, we are doing:
Lynn == Tom
Tom == Lynn
Chris == Tom
Lisa == Lynn
... continue repeating Tom and Lynn until end of data frame
在这种情况下,我们没有收到错误,因为我怀疑您的数据框实际上有不同数量的行,这些行不允许回收,但是您提供的示例却有(8行)。如果样本的行数为奇数,我将得到与您相同的错误。但是即使进行回收,显然也不是您想要的。基本上,语句 dat $ name == target
等同于说:
In this case we don't get an error because I suspect your data frame actually has a different number of rows that don't allow recycling, but the sample you provide does (8 rows). If the sample had had an odd number of rows I would have gotten the same error as you. But even when recycling works, this is clearly not what you want. Basically, the statement dat$name == target
is equivalent to saying:
对于等于 Tom的每个奇数值或等于 Lynn的每个偶数值返回
TRUE
。
碰巧您的示例数据框中的最后一个值是偶数且等于 Lynn,因此上面的 TRUE
。
It so happens that the last value in your sample data frame is even and equal to "Lynn", hence the one TRUE
above.
相反, dat $ name%in%目标
表示:
对于
dat $ name
中的每个值,检查其是否存在于target $ c中$ c>。
for each value in
dat$name
, check that it exists intarget
.
非常不同。结果如下:
[1] TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE
请注意,您的问题与 dplyr
无关,只是误用了 ==
。
Note your problem has nothing to do with dplyr
, just the mis-use of ==
.
这篇关于在dplyr中的字符串列上过滤多个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!