查找重复项,比较条件,擦除一行r [英] find duplicate, compare a condition, erase one row r

查看:120
本文介绍了查找重复项,比较条件,擦除一行r的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用以下可重现的示例:

Using the following reproducible example:

ID1<-c("a1","a4","a6","a6","a5", "a1" )
ID2<-c("b8","b99","b5","b5","b2","b8" )
Value1<-c(2,5,6,6,2,7)
Value2<- c(23,51,63,64,23,23)
Year<- c(2004,2004,2004,2004,2005,2004)
df<-data.frame(ID1,ID2,Value1,Value2,Year)

我想选择ID1和ID2与Year在其各自列中具有相同值的行.对于此行,我想比较重复行中的Value1和Value2,如果值不相同,则用较小的值擦除行.

I want to select rows where ID1 and ID2 and Year have the same value in their respective columns. For this rows I want to compare Value1 and Value2 in the duplicates rows and IF the values are not the same erase the row with the smaller value.

预期结果:

  ID1 ID2 Value1 Value2 Year         new

2  a4 b99      5     51 2004 a4_b99_2004

4  a6  b5      6     64 2004  a6_b5_2004
5  a5  b2      2     23 2005  a5_b2_2005
6  a1  b8      7     23 2004  a1_b8_2004

我尝试了以下操作: 查找我感兴趣的条件的唯一标识符

I tried the following: Find a unique identifier for the conditions I am interested

df$new<-paste(df$ID1,df$ID2, df$Year, sep="_")

我可以使用唯一标识符来查找包含重复项的数据库行

I can use the unique identifier to find the rows of the database that contain the duplicates

IND<-which(duplicated(df$new) | duplicated(df$new, fromLast = TRUE))

在for循环中,如果唯一标识符重复,则比较这些值并擦除行,但是循环太复杂了,我无法解决.

In a for loop if unique identifier has duplicate compare the values and erase the rows, but the loop is too complicated and I cannot solve it.

for (i in df$new) {

  if(sum(df$new == i)>1)
           {
  ind<-which(df$new==i)
  m= min(df$Value1[ind])
  df<-df[-which.min(df$Value1[ind]),]
  m= min(df$Value2[ind])
  df<-df[-which.min(df$Value2[ind]),]

  }
}

推荐答案

考虑aggregate通过分组, ID1 ID2 年份:

df_new <- aggregate(.~ID1 + ID2 + Year, df, max)
df_new

#   ID1 ID2 Year Value1 Value2
# 1  a6  b5 2004      6     64
# 2  a1  b8 2004      7     23
# 3  a4 b99 2004      5     51
# 4  a5  b2 2005      2     23

这篇关于查找重复项,比较条件,擦除一行r的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆