比较两个数据帧,列顺序无关,以获取非重复行 [英] Compare two data frames, column-order-independent, to get non-duplicated rows

查看:132
本文介绍了比较两个数据帧,列顺序无关,以获取非重复行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要比较两个数据框,并检查是否有重复的行。
我们假设列的顺序不重要,所以如果df1看起来像这样:

  V2 V3 
71 78
90 13
12 67
56 32

和df2喜欢:

  V2 V3 
89 45
77 88
78 71
90 13

然后,来自两个df的非重复行将是:

  12 67 
56 32
89 45
77 88



如何以简单的方式实现这个目标?

解决方案

这里是一个dplyr的解决方案,对大型数据集可能会非常快。

  df1<  -  data_frame (71,90,12,56),v2 = c(78,13,67,32))
df2
df3< - bind_rows(df1,df2)

df3%>%
rowwise()%& %
mutate(key = paste0(min(v1,v2),max(v1,v2)))%>%
group_by(key)%>%
mutate n())%>%
filter(size == 1)

仅适用于两个分组变量,将其扩展到多个变量,您基本上只需要调整如何制作密钥。



编辑:我按照下面的注释误解了问题。


I want to compare two data frames and check if there are duplicated rows. We assume that the order of columns doesn't matter so if df1 looks like that:

 V2 V3
 71 78
 90 13
 12 67
 56 32

and df2 like that:

V2 V3
89 45
77 88
78 71
90 13

Then the non duplicated rows from both df will be:

12 67
56 32
89 45
77 88

How can I achieve this goal in easy way?

解决方案

Here's a dplyr solution which will probably be pretty quick on larger datasets

df1 <- data_frame( v1 = c(71,90,12,56), v2 = c(78,13,67,32))
df2 <- data_frame( v1 = c(89,77,78,90), v2 = c(45,88,71,13) )

df3 <- bind_rows(df1, df2)

df3 %>%
  rowwise() %>% 
  mutate(key = paste0( min(v1, v2), max(v1, v2))) %>% 
  group_by(key) %>% 
  mutate( size = n()) %>% 
  filter( size == 1)

This solution only works for two grouping variables, to extend it to multiple variables you basically just need to adjust how to manufacture the key.

Edit: I misunderstood the problem as per comments below.

这篇关于比较两个数据帧,列顺序无关,以获取非重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆