比较两个数据帧,列顺序无关,以获取非重复行 [英] Compare two data frames, column-order-independent, to get non-duplicated rows
问题描述
我要比较两个数据框,并检查是否有重复的行。
我们假设列的顺序不重要,所以如果df1看起来像这样:
V2 V3
71 78
90 13
12 67
56 32
和df2喜欢:
V2 V3
89 45
77 88
78 71
90 13
然后,来自两个df的非重复行将是:
12 67
56 32
89 45
77 88
如何以简单的方式实现这个目标?
解决方案这里是一个dplyr的解决方案,对大型数据集可能会非常快。
df1< - data_frame (71,90,12,56),v2 = c(78,13,67,32))
df2
df3< - bind_rows(df1,df2)
df3%>%
rowwise()%& %
mutate(key = paste0(min(v1,v2),max(v1,v2)))%>%
group_by(key)%>%
mutate n())%>%
filter(size == 1)
仅适用于两个分组变量,将其扩展到多个变量,您基本上只需要调整如何制作密钥。
编辑:我按照下面的注释误解了问题。
I want to compare two data frames and check if there are duplicated rows. We assume that the order of columns doesn't matter so if df1 looks like that:
V2 V3 71 78 90 13 12 67 56 32
and df2 like that:
V2 V3 89 45 77 88 78 71 90 13
Then the non duplicated rows from both df will be:
12 67 56 32 89 45 77 88
How can I achieve this goal in easy way?
解决方案Here's a dplyr solution which will probably be pretty quick on larger datasets
df1 <- data_frame( v1 = c(71,90,12,56), v2 = c(78,13,67,32)) df2 <- data_frame( v1 = c(89,77,78,90), v2 = c(45,88,71,13) ) df3 <- bind_rows(df1, df2) df3 %>% rowwise() %>% mutate(key = paste0( min(v1, v2), max(v1, v2))) %>% group_by(key) %>% mutate( size = n()) %>% filter( size == 1)
This solution only works for two grouping variables, to extend it to multiple variables you basically just need to adjust how to manufacture the key.
Edit: I misunderstood the problem as per comments below.
这篇关于比较两个数据帧,列顺序无关,以获取非重复行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!