在 dplyr 中的字符串列上过滤多个值 [英] Filter multiple values on a string column in dplyr

查看:28
本文介绍了在 dplyr 中的字符串列上过滤多个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 data.frame 在其中一列中包含字符数据.我想从同一列中过滤 data.frame 中的多个选项.有没有一种简单的方法可以做到这一点,但我缺少这种方法?

I have a data.frame with character data in one of the columns. I would like to filter multiple options in the data.frame from the same column. Is there an easy way to do this that I'm missing?

示例:data.frame name = dat

days      name
88        Lynn
11          Tom
2           Chris
5           Lisa
22        Kyla
1          Tom
222      Lynn
2         Lynn

例如,我想过滤掉 TomLynn.
当我这样做时:

I'd like to filter out Tom and Lynn for example.
When I do:

target <- c("Tom", "Lynn")
filt <- filter(dat, name == target)

我收到此错误:

longer object length is not a multiple of shorter object length

推荐答案

你需要 %in% 而不是 ==:

library(dplyr)
target <- c("Tom", "Lynn")
filter(dat, name %in% target)  # equivalently, dat %>% filter(name %in% target)

生产

  days name
1   88 Lynn
2   11  Tom
3    1  Tom
4  222 Lynn
5    2 Lynn

要了解原因,请考虑此处发生的情况:

To understand why, consider what happens here:

dat$name == target
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

基本上,我们将两个长度的 target 向量循环四次以匹配 dat$name 的长度.换句话说,我们正在做:

Basically, we're recycling the two length target vector four times to match the length of dat$name. In other words, we are doing:

 Lynn == Tom
  Tom == Lynn
Chris == Tom
 Lisa == Lynn
 ... continue repeating Tom and Lynn until end of data frame

在这种情况下,我们不会收到错误消息,因为我怀疑您的数据框实际上有不同数量的不允许回收的行,但您提供的示例确实如此(8 行).如果样本有奇数行,我会得到和你一样的错误.但即使回收工作,这显然不是你想要的.基本上,声明 dat$name == target 相当于说:

In this case we don't get an error because I suspect your data frame actually has a different number of rows that don't allow recycling, but the sample you provide does (8 rows). If the sample had had an odd number of rows I would have gotten the same error as you. But even when recycling works, this is clearly not what you want. Basically, the statement dat$name == target is equivalent to saying:

为每个等于Tom"的奇数值或每个等于Lynn"的偶数值返回TRUE.

return TRUE for every odd value that is equal to "Tom" or every even value that is equal to "Lynn".

碰巧样本数据框中的最后一个值是偶数且等于Lynn",因此是上面的 TRUE.

It so happens that the last value in your sample data frame is even and equal to "Lynn", hence the one TRUE above.

相比之下,dat$name %in% target 说:

对于 dat$name 中的每个值,检查它是否存在于 target 中.

for each value in dat$name, check that it exists in target.

很不一样.结果如下:

[1]  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE

注意你的问题与dplyr无关,只是==的误用.

Note your problem has nothing to do with dplyr, just the mis-use of ==.

这篇关于在 dplyr 中的字符串列上过滤多个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆