R：如何有效地找出data.frame A是否包含在data.frame B中？ [英] R: How to efficiently find out whether data.frame A is contained in data.frame B?

查看：317 发布时间：2017/3/26 3:01:15 r dataframe subset set-intersection

本文介绍了R：如何有效地找出data.frame A是否包含在data.frame B中？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

为了找出数据框 df.a 是数据框的一个子集 df.b I执行以下操作：

In order to find out whether data frame df.a is a subset of data frame df.b I did the following:

df.a <- data.frame( x=1:5, y=6:10 )
df.b <- data.frame( x=1:7, y=6:12 )
inds.x <- as.integer( lapply( df.a$x, function(x) which(df.b$x == x) ))
inds.y <- as.integer( lapply( df.a$y, function(y) which(df.b$y == y) ))
identical( inds.x, inds.y )

最后一行给了 TRUE ，因此 df.a 包含在 df.b 。

The last line gave TRUE, hence df.a is contained in df.b.

现在我想知道是否有更优雅的方式来解决这个问题？

Now I wonder whether there is a more elegant - and possibly more efficient - way to answer this question?

此任务也很容易扩展，以找到两个给定数据帧之间的交集，可能仅基于列的一个子集。

This task also is easily extended to find the intersection between two given data frames, possibly based on only a subset of columns.

帮助将不胜感激。

推荐答案

我将在一个答案中冒险猜测。

I am going to hazard a guess at an answer.

我认为 semi_join 从 dplyr 将会做你想要的，即使考虑到重复的行。

I think semi_join from dplyr will do what you want, even taking into account duplicated rows.

首先注意帮助文件？semi_join ：

从x返回所有行，其中y中有匹配值，只保留x的列。

return all rows from x where there are matching values in y, keeping just columns from x.

半连接与内部连接不同，因为内部
加入将为y的每个匹配行返回一行x，
其中半连接将
从不重复x的行。

A semi join differs from an inner join because an inner join will return one row of x for each matching row of y, where a semi join will never duplicate rows of x.

好的，这表明以下内容应该正确失败：

Ok, this suggests that the following should correctly fail:

df.a <- data.frame( x=c(1:5,1), y=c(6:10,6) )
df.b <- data.frame( x=1:7, y=6:12 )
identical(semi_join(df.b, df.a),  semi_join(df.a, df.a))

其中 FALSE ，因为

> semi_join(df.b, df.a)
Joining by: c("x", "y")
  x  y
1 1  6
2 2  7
3 3  8
4 4  9
5 5 10

但是，通过：

df.c <- data.frame( x=c(1:7, 1), y= c(6:12, 6) )
identical(semi_join(df.c, df.a), semi_join(df.a, df.a))

，它的确是 TRUE 。

需要第二个 semi_join（df.a，df.a）来获取 df.a 上的规范排序。

The second semi_join(df.a, df.a) is required to get the canonical sorting on df.a.

这篇关于R：如何有效地找出data.frame A是否包含在data.frame B中？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R：如何有效地找出data.frame A是否包含在data.frame B中？ [英] R: How to efficiently find out whether data.frame A is contained in data.frame B?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R：如何有效地找出data.frame A是否包含在data.frame B中？ [英] R: How to efficiently find out whether data.frame A is contained in data.frame B?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭