数据表DT [i,j,by]不使用i中的标准来选择组 [英] data table DT[i, j, by] does not select groups using criteria in i
问题描述
我不知道这里的数据是什么,可能是我缺少一些简单的东西。
对于我的数据集,我有一个id na集群点分配
例如
BS:100021只分配了一个聚类点
$ b b
其中
BS:100135已分配6个群集点
重现我的例子
set.seed (1)
xx < - cbind(rep(BS:100021,30),rep(1,30))
yy < ),$ reps(1:6,10))
mm< - as.data.table(rbind(xx,yy))
names(mm)集群)
现在如果我想过滤掉BS:100021
n类似,我正在尝试mm [ 1,.SD,dSc]
我还在得到
dSc集群
1:BS:100021 1
2:BS:100021 1
3:BS:100021 1
4:BS:100021 1
5:BS:100021 1
6:BS:100021 1
7:BS:100021 1
8:BS:100021 1
9:BS:100021 1
10:BS:100021 1
11:BS:100021 1
12:BS:100021 1
13:BS:100021 1
14:BS:100021 1
15:BS:100021 1
16:BS:100021 1
17:BS:100021 1
18:BS:100021 1
19:BS:100021 1
20:BS:100021 1
21:BS:100021 1
22:BS:100021 1
23:BS:100021 1
24:BS:100021 1
25:BS:100021 1
26:BS:100021 1
27:BS:100021 1
28:BS:100021 1
29:BS:100021 1
30:BS:100021 1
31:BS:100135 1
32:BS:100135 2
33:BS:100135 3
34:BS:100135 4
35:BS:100135 5
36:BS:100135 6
37:BS:100135 1
38:BS:100135 2
39:BS:100135 3
40:BS:100135 4
41:BS:100135 5
42:BS:100135 6
43:BS:100135 1
44:BS:100135 2
45:BS:100135 3
46:BS:100135 4
47:BS:100135 5
48:BS:100135 6
49:BS:100135 1
50:BS:100135 2
51:BS:100135 3
52:BS:100135 4
53:BS:100135 5
54:BS:100135 6
55:BS:100135 1
56:BS:100135 2
57:BS:100135 3
58:BS:100135 4
59:BS:100135 5
60:BS:100135 6
61:BS:100135 1
62:BS:100135 2
63:BS:100135 3
64:BS:100135 4
65:BS:100135 5
66:BS:100135 6
67:BS:100135 1
68:BS:100135 2
69:BS:100135 3
70:BS:100135 4
71:BS:100135 5
72:BS:100135 6
73:BS:100135 1
74:BS:100135 2
75:BS:100135 3
76:BS:100135 4
77:BS:100135 5
78:BS:100135 6
79:BS:100135 1
80:BS:100135 2
81:BS:100135 3
82:BS:100135 4
83:BS:100135 5
84:BS:100135 6
85:BS:100135 1
86:BS:100135 2
87:BS:100135 3
88:BS:100135 4
89:BS:100135 5
90:BS:100135 6
dSc集群
解决方案您可以使用:
list.of.dSc< - NULL
for(i in seq_along(unique(mm $ dSc))){
if(length(unique(mm [mm $ dSc == as.character(unique(mm $ dSc)[i ]),Cluster]))== 1){
list.of.dSc <-c(list.of.dSc,unique(as.character(mm $ dSc))[i])
}
}
mm [!(mm $ dSc%in%list.of.dSc),]
dSc集群
BS:100135 1
32 BS:100135 2
33 BS:100135 3
34 BS:100135 4
35 BS:100135 5
...
您可以找到所有只有一个集群类的实例(或每个dSc),将它们添加到列表
list.of.dSc
),然后使用%in%
过滤mm <
这不是很漂亮,但我认为它解决了你的问题。
I am not sure whats going on here with my data ,may be I am missing some simple things .
For my dataset I have a id n a cluster point assigned to each id and I want to filter out the the ids which have only have a single cluster assigned to them .
for e.g
BS:100021 has only 1 cluster point assigned to it
where as
BS:100135 has 6 cluster point assigned to it
to reproduce my example
set.seed(1) xx <- cbind(rep("BS:100021",30),rep(1,30)) yy <- cbind(rep("BS:100135",60),rep(1:6,10)) mm <- as.data.table(rbind(xx,yy)) names(mm ) <- c("dSc","Cluster")
Now if I want to filter out
"BS:100021"
n similar and I am tryingmm[length(unique(Cluster)) > 1,.SD,dSc]
I am still getting
dSc Cluster 1: BS:100021 1 2: BS:100021 1 3: BS:100021 1 4: BS:100021 1 5: BS:100021 1 6: BS:100021 1 7: BS:100021 1 8: BS:100021 1 9: BS:100021 1 10: BS:100021 1 11: BS:100021 1 12: BS:100021 1 13: BS:100021 1 14: BS:100021 1 15: BS:100021 1 16: BS:100021 1 17: BS:100021 1 18: BS:100021 1 19: BS:100021 1 20: BS:100021 1 21: BS:100021 1 22: BS:100021 1 23: BS:100021 1 24: BS:100021 1 25: BS:100021 1 26: BS:100021 1 27: BS:100021 1 28: BS:100021 1 29: BS:100021 1 30: BS:100021 1 31: BS:100135 1 32: BS:100135 2 33: BS:100135 3 34: BS:100135 4 35: BS:100135 5 36: BS:100135 6 37: BS:100135 1 38: BS:100135 2 39: BS:100135 3 40: BS:100135 4 41: BS:100135 5 42: BS:100135 6 43: BS:100135 1 44: BS:100135 2 45: BS:100135 3 46: BS:100135 4 47: BS:100135 5 48: BS:100135 6 49: BS:100135 1 50: BS:100135 2 51: BS:100135 3 52: BS:100135 4 53: BS:100135 5 54: BS:100135 6 55: BS:100135 1 56: BS:100135 2 57: BS:100135 3 58: BS:100135 4 59: BS:100135 5 60: BS:100135 6 61: BS:100135 1 62: BS:100135 2 63: BS:100135 3 64: BS:100135 4 65: BS:100135 5 66: BS:100135 6 67: BS:100135 1 68: BS:100135 2 69: BS:100135 3 70: BS:100135 4 71: BS:100135 5 72: BS:100135 6 73: BS:100135 1 74: BS:100135 2 75: BS:100135 3 76: BS:100135 4 77: BS:100135 5 78: BS:100135 6 79: BS:100135 1 80: BS:100135 2 81: BS:100135 3 82: BS:100135 4 83: BS:100135 5 84: BS:100135 6 85: BS:100135 1 86: BS:100135 2 87: BS:100135 3 88: BS:100135 4 89: BS:100135 5 90: BS:100135 6 dSc Cluster
解决方案You could use:
list.of.dSc <- NULL for(i in seq_along(unique(mm$dSc))){ if(length(unique(mm[mm$dSc == as.character(unique(mm$dSc)[i]), "Cluster"])) == 1){ list.of.dSc <- c(list.of.dSc, unique(as.character(mm$dSc))[i]) } } mm[!(mm$dSc %in% list.of.dSc),] dSc Cluster 31 BS:100135 1 32 BS:100135 2 33 BS:100135 3 34 BS:100135 4 35 BS:100135 5 ...
You find all instances (or, each dSc) where there is only one cluster class, add them to a list (
list.of.dSc
), and then use%in%
to filtermm
for all entries that are not in that list.It's not very pretty, but I think it solves your question.
这篇关于数据表DT [i,j,by]不使用i中的标准来选择组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!