基于2个唯一标识符,找到2个数据帧元素的差异 [英] How to find differences in elements of 2 data frames based on 2 unique identifiers
问题描述
我有2个非常大的数据框,类似于以下内容:
I have 2 very large data frames similar to the following:
df1<-data.frame(DS.ID=c(123,214,543,325,123,214),OP.ID=c("xxab","xxac","xxad","xxae","xxaf","xxaq"),P.ID=c("AAC","JGK","DIF","ADL","AAC","JGR"))
> df1
DS.ID OP.ID P.ID
1 123 xxab AAC
2 214 xxac JGK
3 543 xxad DIF
4 325 xxae ADL
5 123 xxaf AAC
6 214 xxaq JGR
df2<-data.frame(DS.ID=c(123,214,543,325,123,214),OP.ID=c("xxab","xxac","xxad","xxae","xxaf","xxaq"),P.ID=c("AAC","JGK","DIF","ADL","AAC","JGS"))
> df2
DS.ID OP.ID P.ID
1 123 xxab AAC
2 214 xxac JGK
3 543 xxad DIF
4 325 xxae ADL
5 123 xxaf AAC
6 214 xxaq JGS
唯一的ID是基于DS.ID和OP.ID,以便可以重复DS.ID,但DS.ID和OP.ID的组合不会。我想找到P.ID更改的实例。此外,DS.ID和OP.ID的组合不一定在同一行。
The unique id is based on the combination of the DS.ID and the OP.ID, so that DS.ID can be repeated but the combination of DS.ID and OP.ID will not. I want to find the instances where P.ID changes. Also, the combination of DS.ID and OP.ID will not necessarily be in the same row.
在上面的示例中,它将返回第6行,因为P .ID已更改。我想要将初始值和最终值都写入数据框。
In the example above, it would return row 6, as the P.ID changed. I'd want to write both the initial and final values to a data frame.
我有一种感觉,初始步骤将是
I have a feeling the initial step would be
rbind.fill(df1,df2)
( .fill
,因为在我试图循环的数据框中添加了列。)
(.fill
because there's added columns in the data frames I'm trying to loop through).
编辑:假设有其他列也有不同的值。因此,重复的操作将无法正常工作,除非将它们隔离到自己的数据框架中。但是,我会为许多列和许多数据帧做这个,所以我宁愿不要用这种方法快速的。
Assume there's other columns that have different values as well. Thus, duplicated would not work unless you isolated them to their own data frame. But, I'll be doing this for many columns and many data frames, so I'd rather not go with that method for speed sake.
推荐答案
如果ident在以下代码中为0,那么可能两者之间有区别:
If ident is 0 in the following code, then probably, there is difference between two:
ll<-merge(df1,df2,by=c("DS.ID", "OP.ID"))
library(plyr)
ddply(ll,.(DS.ID, OP.ID),summarize,ident=match(P.ID.x, P.ID.y,nomatch=0))
DS.ID OP.ID ident
1 123 xxab 1
2 123 xxaf 1
3 214 xxac 1
4 214 xxaq 0
5 325 xxae 1
6 543 xxad 1
这篇关于基于2个唯一标识符,找到2个数据帧元素的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!