具体删除与R的所有重复 [英] Specific removing all duplicates with R
问题描述
例如,我有两个列:
Var1 Var2
1 12
1 65
2 68
2 98
3 49
3 24
4 8
5 67
6 12
我需要显示列Var1唯一的值:
Var1 Var2
4 8
5 67
6 12
b $ b我可以这样做:
mydata = mydata [!unique(mydata $ Var1)
但是当我对大型数据集使用相同的公式,大约100万条观察值时,没有任何反应 - 样本大小仍然相同。
谢谢!
解决方案使用
data.table
(因为它似乎标记了它)我会做indx <-setDT(DT)[,.I [.N == 1],by = Var1] $ V1
DT [indx]
#Var1 Var2
# 1:4 8
#2:5 67
#3:6 12
or ... as @eddi提醒我,你可以简单地做
DT [,if(.N == 1) .SD,by = Var1]
或者(根据提及的重复项)与 v> = 1.9.5 您也可以执行
setDT(DT,key =Var1)[!(duplicate(DT)| duplicated(DT,fromLast = TRUE))]
For example I have two columns:
Var1 Var2 1 12 1 65 2 68 2 98 3 49 3 24 4 8 5 67 6 12
And I need to display only values which are unique for column Var1:
Var1 Var2 4 8 5 67 6 12
I can do you like this:
mydata=mydata[!unique(mydata$Var1),]
But when I use the same formula for my large data set with about 1 million observations, nothing happens - the sample size is still the same. Could you please explain my why?
Thank you!
解决方案With
data.table
(as it seem to be tagged with it) I would doindx <- setDT(DT)[, .I[.N == 1], by = Var1]$V1 DT[indx] # Var1 Var2 # 1: 4 8 # 2: 5 67 # 3: 6 12
Or... as @eddi reminded me, you can simply do
DT[, if(.N == 1) .SD, by = Var1]
Or (per the mentioned duplicates) with v >= 1.9.5 you could also do something like
setDT(DT, key = "Var1")[!(duplicated(DT) | duplicated(DT, fromLast = TRUE))]
这篇关于具体删除与R的所有重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!