具体删除与R的所有重复 [英] Specific removing all duplicates with R

查看:116
本文介绍了具体删除与R的所有重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,我有两个列:

  Var1 Var2 
1 12
1 65
2 68
2 98
3 49
3 24
4 8
5 67
6 12



我需要显示列Var1唯一的值:

  Var1 Var2 
4 8
5 67
6 12


b $ b

我可以这样做:

  mydata = mydata [!unique(mydata $ Var1) 

但是当我对大型数据集使用相同的公式,大约100万条观察值时,没有任何反应 - 样本大小仍然相同。



谢谢!

解决方案

使用 data.table (因为它似乎标记了它)我会做

  indx <-setDT(DT)[,.I [.N == 1],by = Var1] $ V1 
DT [indx]
#Var1 Var2
# 1:4 8
#2:5 67
#3:6 12

or ... as @eddi提醒我,你可以简单地做

  DT [,if(.N == 1) .SD,by = Var1] 

或者(根据提及的重复项)与 v> = 1.9.5 您也可以执行

  setDT(DT,key =Var1)[!(duplicate(DT)| duplicated(DT,fromLast = TRUE))] 


For example I have two columns:

 Var1 Var2
 1     12
 1     65
 2     68
 2     98
 3     49
 3     24
 4      8
 5     67
 6     12

And I need to display only values which are unique for column Var1:

 Var1 Var2
 4      8
 5     67
 6     12

I can do you like this:

 mydata=mydata[!unique(mydata$Var1),]

But when I use the same formula for my large data set with about 1 million observations, nothing happens - the sample size is still the same. Could you please explain my why?

Thank you!

解决方案

With data.table (as it seem to be tagged with it) I would do

indx <- setDT(DT)[, .I[.N == 1], by = Var1]$V1 
DT[indx]
#    Var1 Var2
# 1:    4    8
# 2:    5   67
# 3:    6   12

Or... as @eddi reminded me, you can simply do

DT[, if(.N == 1) .SD, by = Var1]

Or (per the mentioned duplicates) with v >= 1.9.5 you could also do something like

setDT(DT, key = "Var1")[!(duplicated(DT) | duplicated(DT, fromLast = TRUE))]

这篇关于具体删除与R的所有重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆