在dplyr中的字符串列上过滤多个值 [英] Filter multiple values on a string column in dplyr

查看:80
本文介绍了在dplyr中的字符串列上过滤多个值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 data.frame 在其中一列中包含字符数据。
我想过滤同一列 data.frame 中的多个选项。

I have a data.frame with character data in one of the columns. I would like to filter multiple options in the data.frame from the same column. Is there an easy way to do this that I'm missing?

示例:
数据,是否有简单的方法让我丢失?框架名称= dat

days      name
88        Lynn
11          Tom
2           Chris
5           Lisa
22        Kyla
1          Tom
222      Lynn
2         Lynn

我想过滤出 Tom 和<例如code> Lynn 。

当我这样做时:

I'd like to filter out Tom and Lynn for example.
When I do:

target <- c("Tom", "Lynn")
filt <- filter(dat, name == target)

我收到此错误:

longer object length is not a multiple of shorter object length


推荐答案

您需要%in %而不是 ==

library(dplyr)
target <- c("Tom", "Lynn")
filter(dat, name %in% target)  # equivalently, dat %>% filter(name %in% target)

生产

  days name
1   88 Lynn
2   11  Tom
3    1  Tom
4  222 Lynn
5    2 Lynn

要了解原因,请考虑此处发生的情况:

To understand why, consider what happens here:

dat$name == target
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

基本上,我们将两个长度 target 的向量回收四次,以匹配 dat $ name 。换句话说,我们正在做:

Basically, we're recycling the two length target vector four times to match the length of dat$name. In other words, we are doing:

 Lynn == Tom
  Tom == Lynn
Chris == Tom
 Lisa == Lynn
 ... continue repeating Tom and Lynn until end of data frame

在这种情况下,我们没有收到错误,因为我怀疑您的数据框实际上有不同数量的行,这些行不允许回收,但是您提供的示例却有(8行)。如果样本的行数为奇数,我将得到与您相同的错误。但是即使进行回收,显然也不是您想要的。基本上,语句 dat $ name == target 等同于说:

In this case we don't get an error because I suspect your data frame actually has a different number of rows that don't allow recycling, but the sample you provide does (8 rows). If the sample had had an odd number of rows I would have gotten the same error as you. But even when recycling works, this is clearly not what you want. Basically, the statement dat$name == target is equivalent to saying:


对于等于 Tom的每个奇数值或等于 Lynn的每个偶数值返回 TRUE

碰巧您的示例数据框中的最后一个值是偶数且等于 Lynn,因此上面的 TRUE

It so happens that the last value in your sample data frame is even and equal to "Lynn", hence the one TRUE above.

相反, dat $ name%in%目标表示:


对于 dat $ name 中的每个值,检查其是否存在于 target

for each value in dat$name, check that it exists in target.

非常不同。结果如下:

[1]  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE

请注意,您的问题与 dplyr 无关,只是误用了 ==

Note your problem has nothing to do with dplyr, just the mis-use of ==.

这篇关于在dplyr中的字符串列上过滤多个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆