在dplyr中的字符串列上过滤多个值 [英] Filter multiple values on a string column in dplyr

查看：80 发布时间：2020/10/26 2:26:43 r dplyr string-matching multiple-conditions

本文介绍了在dplyr中的字符串列上过滤多个值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 data.frame 在其中一列中包含字符数据。
我想过滤同一列 data.frame 中的多个选项。

I have a data.frame with character data in one of the columns. I would like to filter multiple options in the data.frame from the same column. Is there an easy way to do this that I'm missing?

示例：
数据，是否有简单的方法让我丢失？框架名称= dat

days      name
88        Lynn
11          Tom
2           Chris
5           Lisa
22        Kyla
1          Tom
222      Lynn
2         Lynn

我想过滤出 Tom 和<例如code> Lynn 。

当我这样做时：

I'd like to filter out Tom and Lynn for example.
When I do:

target <- c("Tom", "Lynn")
filt <- filter(dat, name == target)

我收到此错误：

longer object length is not a multiple of shorter object length

推荐答案

您需要％in ％而不是 == ：

library(dplyr)
target <- c("Tom", "Lynn")
filter(dat, name %in% target)  # equivalently, dat %>% filter(name %in% target)

生产

  days name
1   88 Lynn
2   11  Tom
3    1  Tom
4  222 Lynn
5    2 Lynn

要了解原因，请考虑此处发生的情况：

To understand why, consider what happens here:

dat$name == target
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

基本上，我们将两个长度 target 的向量回收四次，以匹配 dat $ name 。换句话说，我们正在做：


Basically, we're recycling the two length target vector four times to match the length of dat$name.  In other words, we are doing:
 Lynn == Tom
  Tom == Lynn
Chris == Tom
 Lisa == Lynn
 ... continue repeating Tom and Lynn until end of data frame

在这种情况下，我们没有收到错误，因为我怀疑您的数据框实际上有不同数量的行，这些行不允许回收，但是您提供的示例却有（8行）。如果样本的行数为奇数，我将得到与您相同的错误。但是即使进行回收，显然也不是您想要的。基本上，语句 dat $ name == target 等同于说：
In this case we don't get an error because I suspect your data frame actually has a different number of rows that don't allow recycling, but the sample you provide does (8 rows).  If the sample had had an odd number of rows I would have gotten the same error as you.  But even when recycling works, this is clearly not what you want.  Basically, the statement dat$name == target is equivalent to saying:
 
 对于等于 Tom的每个奇数值或等于 Lynn的每个偶数值返回 TRUE 。
碰巧您的示例数据框中的最后一个值是偶数且等于 Lynn，因此上面的 TRUE  。
It so happens that the last value in your sample data frame is even and equal to "Lynn", hence the one TRUE above.
相反， dat $ name％in％目标表示：
 
 对于 dat $ name 中的每个值，检查其是否存在于 target 。


  for each value in dat$name, check that it exists in target.
非常不同。结果如下：
[1]  TRUE  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE

请注意，您的问题与 dplyr 无关，只是误用了 == 。
Note your problem has nothing to do with dplyr, just the mis-use of ==.

                        这篇关于在dplyr中的字符串列上过滤多个值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

在dplyr中的字符串列上过滤多个值 [英] Filter multiple values on a string column in dplyr

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在dplyr中的字符串列上过滤多个值 [英] Filter multiple values on a string column in dplyr

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭