查找重复项,比较条件,擦除一行r [英] find duplicate, compare a condition, erase one row r
问题描述
使用以下可重现的示例:
Using the following reproducible example:
ID1<-c("a1","a4","a6","a6","a5", "a1" )
ID2<-c("b8","b99","b5","b5","b2","b8" )
Value1<-c(2,5,6,6,2,7)
Value2<- c(23,51,63,64,23,23)
Year<- c(2004,2004,2004,2004,2005,2004)
df<-data.frame(ID1,ID2,Value1,Value2,Year)
我想选择ID1和ID2与Year在其各自列中具有相同值的行.对于此行,我想比较重复行中的Value1和Value2,如果值不相同,则用较小的值擦除行.
I want to select rows where ID1 and ID2 and Year have the same value in their respective columns. For this rows I want to compare Value1 and Value2 in the duplicates rows and IF the values are not the same erase the row with the smaller value.
预期结果:
ID1 ID2 Value1 Value2 Year new
2 a4 b99 5 51 2004 a4_b99_2004
4 a6 b5 6 64 2004 a6_b5_2004
5 a5 b2 2 23 2005 a5_b2_2005
6 a1 b8 7 23 2004 a1_b8_2004
我尝试了以下操作: 查找我感兴趣的条件的唯一标识符
I tried the following: Find a unique identifier for the conditions I am interested
df$new<-paste(df$ID1,df$ID2, df$Year, sep="_")
我可以使用唯一标识符来查找包含重复项的数据库行
I can use the unique identifier to find the rows of the database that contain the duplicates
IND<-which(duplicated(df$new) | duplicated(df$new, fromLast = TRUE))
在for循环中,如果唯一标识符重复,则比较这些值并擦除行,但是循环太复杂了,我无法解决.
In a for loop if unique identifier has duplicate compare the values and erase the rows, but the loop is too complicated and I cannot solve it.
for (i in df$new) {
if(sum(df$new == i)>1)
{
ind<-which(df$new==i)
m= min(df$Value1[ind])
df<-df[-which.min(df$Value1[ind]),]
m= min(df$Value2[ind])
df<-df[-which.min(df$Value2[ind]),]
}
}
推荐答案
考虑aggregate
通过分组, ID1 , ID2 和年份:
df_new <- aggregate(.~ID1 + ID2 + Year, df, max)
df_new
# ID1 ID2 Year Value1 Value2
# 1 a6 b5 2004 6 64
# 2 a1 b8 2004 7 23
# 3 a4 b99 2004 5 51
# 4 a5 b2 2005 2 23
这篇关于查找重复项,比较条件,擦除一行r的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!