删除每个ID的重复项 [英] Removing duplicates for each ID

查看:178
本文介绍了删除每个ID的重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我的数据框(mydata)中有三个变量:1)id,2)case和3)value。

  mydata<  -  data.frame(id = c(1,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4) ,case = c(a,b,c,c,b,a,b,c,c,a,b c,c,a,b,c,a),值= c(1,34,56,23,34,546,34,67,23,65,23,65, 23,87,34,321,87))


mydata
id个案值
1 1 a 1
2 1 b 34
3 1 c 56
4 1 c 23
5 1 b 34
6 2 a 546
7 2 b 34
8 2 c 67
9 2 c 23
10 3 a 65
11 3 b 23
12 3 c 65
13 3 c 23
14 4 a 87
15 4 b 34
16 4 c 321
17 4 a 87

对于每个ID,我们可以具有类似的case字符,它们的值可以相同或不同。所以基本上,如果他们的价值观是一样的,我只需要保留一个并删除重复。



我的最终数据将是

  id案例值
1 1 a 1
2 1 b 34
3 1 c 56
4 1 c 23
5 2 a 546
6 2 b 34
7 2 c 67
8 2 c 23
9 3 a 65
10 3 b 23
11 3 c 65
12 3 c 23
13 4 a 87
14 4 b 34
15 4 c 321


解决方案

您可以尝试复制

  mydata [!duplicateated(mydata [,c('id','case','value')])]] 
# id case value
#1 1 a 1
#2 1 b 34
#3 1 c 56
#4 1 c 23
#6 2 a 546
#7 2 b 34
#8 2 c 67
#9 2 c 23
#10 3 a 65
#11 3 b 23
#12 3c 65
#13 3 c 23
#14 4 a 87
#15 4 b 34
#16 4 c 321
/ pre>

或使用选项从 data.table

  library(data.table)
set.seed(25)
mydata1 < - cbind(mydata,value1 = rnorm(17))
DT< - as.data.table(mydata1)
unique(DT,by = c('id','case','value'))
#id case value value1
#1:1 a 1 -0.21183360
#2:1 b 34 -1.04159113
#3:1 c 56 -1.15330756
#4:1 c 23 0.32153150
#5:2 a 546 -0.44553326
#6:2 b 34 1.73404543
# 7:2 c 67 0.51129562
#8:2 c 23 0.09964504
#9:3 a 65 -0.05789111
#10:3 b 23 -1.74278763
#11:3 c 65 -1.32495298
#12:3 c 23 -0.54793388
#13:4 a 87 -1.45638428
#14:4 b 34 0.08268682
#15:4 c 32 1 0.92757895


Suppose that there are three variables in my data frame (mydata): 1) id, 2) case, and 3) value.

mydata <- data.frame(id=c(1,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4), case=c("a","b","c","c","b","a","b","c","c","a","b","c","c","a","b","c","a"), value=c(1,34,56,23,34,546,34,67,23,65,23,65,23,87,34,321,87))


mydata 
    id case value
1   1    a     1
2   1    b    34
3   1    c    56
4   1    c    23
5   1    b    34
6   2    a   546
7   2    b    34
8   2    c    67
9   2    c    23
10  3    a    65
11  3    b    23
12  3    c    65
13  3    c    23
14  4    a    87
15  4    b    34
16  4    c   321
17  4    a    87

For each id, we could have similar ‘case’ characters, and their values could be the same or different. So basically, if their values are the same, I only need to keep one and remove the duplicate.

My final data then would be

    id case value
1   1    a     1
2   1    b    34
3   1    c    56
4   1    c    23
5   2    a   546
6   2    b    34
7   2    c    67
8   2    c    23
9   3    a    65
10  3    b    23
11  3    c    65
12  3    c    23
13  4    a    87
14  4    b    34
15  4    c   321

解决方案

You could try duplicated

 mydata[!duplicated(mydata[,c('id', 'case', 'value')]),]
 #     id case value
 #1   1    a     1
 #2   1    b    34
 #3   1    c    56
 #4   1    c    23
 #6   2    a   546
 #7   2    b    34
 #8   2    c    67
 #9   2    c    23
 #10  3    a    65
 #11  3    b    23
 #12  3    c    65
 #13  3    c    23
 #14  4    a    87
 #15  4    b    34
 #16  4    c   321

Or use unique with by option from data.table

 library(data.table)
 set.seed(25)
 mydata1 <- cbind(mydata, value1=rnorm(17))
 DT <- as.data.table(mydata1)
 unique(DT, by=c('id', 'case', 'value'))
 #   id case value      value1
 #1:  1    a     1 -0.21183360
 #2:  1    b    34 -1.04159113
 #3:  1    c    56 -1.15330756
 #4:  1    c    23  0.32153150
 #5:  2    a   546 -0.44553326
 #6:  2    b    34  1.73404543
 #7:  2    c    67  0.51129562
 #8:  2    c    23  0.09964504
 #9:  3    a    65 -0.05789111
 #10: 3    b    23 -1.74278763
 #11: 3    c    65 -1.32495298
 #12: 3    c    23 -0.54793388
 #13: 4    a    87 -1.45638428
 #14: 4    b    34  0.08268682
 #15: 4    c   321  0.92757895

这篇关于删除每个ID的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆