过滤多个条件dplyr [英] Filter multiple conditions dplyr

查看:292
本文介绍了过滤多个条件dplyr的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 data.frame ,其中一列有字符数据。
我想从同一列中的 data.frame 中过滤多个选项。有没有一个简单的方法来做到这一点我失踪了?

I have a data.frame with character data in one of the columns. I would like to filter multiple options in the data.frame from the same column. Is there an easy way to do this that I'm missing?

示例:
数据。框架 name = dat

Example: data.frame name = dat

days      name
88        Lynn
11          Tom
2           Chris
5           Lisa
22        Kyla
1          Tom
222      Lynn
2         Lynn

我想过滤掉 Tom code> Lynn 例如。

当我这样做:

I'd like to filter out Tom and Lynn for example.
When I do:

target <- c("Tom", "Lynn")
filt <- filter(dat, name == target)

我收到此错误:

longer object length is not a multiple of shorter object length


推荐答案

您需要% %而不是 ==

library(dplyr)
target <- c("Tom", "Lynn")
filter(dat, name %in% target)  # equivalently, dat %>% filter(name %in% target)

生产

  days name
1   88 Lynn
2   11  Tom
3    1  Tom
4  222 Lynn
5    2 Lynn

要了解为什么,请考虑以下情况:

To understand why, consider what happens here:

dat$name == target
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

基本上,我们回收两个长度目标向量四次,以匹配 dat的长度$名称。换句话说,我们在做:

Basically, we're recycling the two length target vector four times to match the length of dat$name. In other words, we are doing:

 Lynn == Tom
  Tom == Lynn
Chris == Tom
 Lisa == Lynn
 ... continue repeating Tom and Lynn until end of data frame

在这种情况下,我们没有收到错误,因为我怀疑你的数据框实际上有不同数量的行,不允许回收,但你提供的样本(8行)。如果样本有奇数行,我会得到与您相同的错误。但即使回收工作,这显然不是你想要的。基本上,语句 dat $ name == target 相当于说:

In this case we don't get an error because I suspect your data frame actually has a different number of rows that don't allow recycling, but the sample you provide does (8 rows). If the sample had had an odd number of rows I would have gotten the same error as you. But even when recycling works, this is clearly not what you want. Basically, the statement dat$name == target is equivalent to saying:


对于等于Tom的每个奇数值或等于Lynn的每个偶数值,返回 TRUE

这样,你的样本数据框中的最后一个值就是等于Lynn,因此一个 TRUE

It so happens that the last value in your sample data frame is even and equal to "Lynn", hence the one TRUE above.

相比之下,%target 中的 dat $ name%说:

To contrast, dat$name %in% target says:


,检查它是否存在于 target

非常不同。这是结果:

[1]  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE

请注意,您的问题与 dplyr 无关,只是错误使用 ==

Note your problem has nothing to do with dplyr, just the mis-use of ==.

这篇关于过滤多个条件dplyr的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆