在dplyr中的filter中使用filter会产生意外结果 [英] Using filter inside filter in dplyr gives unexpected results

查看：123 发布时间：2020/10/26 4:07:39 r dplyr

本文介绍了在dplyr中的filter中使用filter会产生意外结果的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用 R 3.1.2 ， dplyr 0.4.0 。

我正在尝试在过滤器过滤器 c>，听起来很简单，我不明白为什么它没有给我我期望的结果。这是我大约6个月前编写的代码，我相当确定它能正常工作，因此由于更新的R版本或 dplyr 或某些其他依赖性而使它停止工作。无论如何，这是一些简单的代码，可根据在df2的列上使用 filter 找到的条件从df1过滤行。

I'm trying to use a filter within a filter, which sounds very simple and I don't understand why it doesn't give me the result I expect. This is code I wrote about 6 months ago and I'm fairly certain it worked, so either it stopped working because of an updated R version or dplyr or some other dependency. Anyway, here is some simple code that filters rows from df1 based on a condition that is found with a filter on a column in df2.

df1 <- data.frame(x = c("A", "B"), stringsAsFactors = FALSE)
df2 <- data.frame(x = "A", y = TRUE, stringsAsFactors = FALSE)
dplyr::filter(df1, x %in% (dplyr::filter(df2, y)$x))

我希望它显示 df1 的第一行，但是我得到

I expect this to show the first row of df1, but instead I get

# [1] x
# <0 rows> (or 0-length row.names)

我不确定该怎么做。为什么它返回一个向量和一个空的data.frame？

which I'm not sure what to make of. Why is it returning a vector AND an empty data.frame?

如果我将过滤器代码分成两个独立的语句，我将得到期望的结果

If I break up the filter code into two separate statements, I get what I expect

xval <- dplyr::filter(df2, y)$x
dplyr::filter(df1, x %in% xval)

#   x
# 1 A

有人可以帮忙吗我弄清楚为什么会发生这种现象？我并不是说这是一个错误，但我不明白。

Can anyone help me figure out why this behaviour is happening? I'm not saying it's a bug, but I don't understand it.

推荐答案

这是一个有效的问题，为什么您要采用这种方法不起作用（显然已经不再起作用）。我不能回答这个问题，但是我会建议采用另一种方法，如上所述，它避免了嵌套函数调用（ filter inside 另一个 filter ），这就是IMO的dplyr的用途：通过易于阅读和理解的语法从左到右，从上到下进行表达。

It's a valid question, why your approach doesn't work (any more, apparently). I can't answer that but I would suggest a different approach, as commented above, which avoids nested function calls (filter inside another filter) which, IMO, is what dplyr is made for: being expressive by easy to read and understand syntax, from left to right, top to bottom.

因此，在您的示例中，由于您感兴趣的列都被命名为 x，因此您可以执行以下操作：

So for your example, because the columns you are interested in are both named "x" you can do:

filter(df2, y) %>% select(x) %>% inner_join(df1)

按 y列过滤df2数据

仅选择 x列

在公共列（ x）上用df1执行一个inner_join。 inner_join的意思是：从y中有匹配值的x中返回所有行，从x和y中返回所有列。

如果它们不同，例如 z和 x，则可以使用：

And if they were different, for example "z" and "x" you could use:

filter(df2, y) %>% select(x) %>% inner_join(df1, by = c("z" = "x"))

如Hadley在下面的评论中所述，使用 semi_join 代替 inner_join 在这里。文档说：

As noted by Hadley in his comment below, it would be safer to use a semi_join instead of inner_join here. The documentation says:

semi_join：返回x中所有在y中有匹配值的行，
仅保留x中的列。

semi_join: return all rows from x where there are matching values in y, keeping just columns from x.

半联接不同于内部联接，因为内部联接将
为y的每个匹配行返回x的x行，其中半联接将
永远不会重复x的行。

A semi join differs from an inner join because an inner join will return one row of x for each matching row of y, where a semi join will never duplicate rows of x.

因此，您可以为以下示例做此操作：

Hence, you could do for the example case:

filter(df2, y) %>% select(x) %>% semi_join(df1)

这篇关于在dplyr中的filter中使用filter会产生意外结果的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在dplyr中的filter中使用filter会产生意外结果 [英] Using filter inside filter in dplyr gives unexpected results

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在dplyr中的filter中使用filter会产生意外结果 [英] Using filter inside filter in dplyr gives unexpected results

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭