数据表DT [i,j,by]不使用i中的标准来选择组 [英] data table DT[i, j, by] does not select groups using criteria in i

查看:110
本文介绍了数据表DT [i,j,by]不使用i中的标准来选择组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不知道这里的数据是什么,可能是我缺少一些简单的东西。



对于我的数据集,我有一个id na集群点分配



例如

  BS:100021只分配了一个聚类点


$ b b

其中

  BS:100135已分配6个群集点



重现我的例子

  set.seed (1)
xx < - cbind(rep(BS:100021,30),rep(1,30))
yy < ),$ reps(1:6,10))
mm< - as.data.table(rbind(xx,yy))
names(mm)集群)



现在如果我想过滤掉BS:100021 n类似,我正在尝试

  mm [ 1,.SD,dSc] 

我还在得到

  dSc集群
1:BS:100021 1
2:BS:100021 1
3:BS:100021 1
4:BS:100021 1
5:BS:100021 1
6:BS:100021 1
7:BS:100021 1
8:BS:100021 1
9:BS:100021 1
10:BS:100021 1
11:BS:100021 1
12:BS:100021 1
13:BS:100021 1
14:BS:100021 1
15:BS:100021 1
16:BS:100021 1
17:BS:100021 1
18:BS:100021 1
19:BS:100021 1
20:BS:100021 1
21:BS:100021 1
22:BS:100021 1
23:BS:100021 1
24:BS:100021 1
25:BS:100021 1
26:BS:100021 1
27:BS:100021 1
28:BS:100021 1
29:BS:100021 1
30:BS:100021 1
31:BS:100135 1
32:BS:100135 2
33:BS:100135 3
34:BS:100135 4
35:BS:100135 5
36:BS:100135 6
37:BS:100135 1
38:BS:100135 2
39:BS:100135 3
40:BS:100135 4
41:BS:100135 5
42:BS:100135 6
43:BS:100135 1
44:BS:100135 2
45:BS:100135 3
46:BS:100135 4
47:BS:100135 5
48:BS:100135 6
49:BS:100135 1
50:BS:100135 2
51:BS:100135 3
52:BS:100135 4
53:BS:100135 5
54:BS:100135 6
55:BS:100135 1
56:BS:100135 2
57:BS:100135 3
58:BS:100135 4
59:BS:100135 5
60:BS:100135 6
61:BS:100135 1
62:BS:100135 2
63:BS:100135 3
64:BS:100135 4
65:BS:100135 5
66:BS:100135 6
67:BS:100135 1
68:BS:100135 2
69:BS:100135 3
70:BS:100135 4
71:BS:100135 5
72:BS:100135 6
73:BS:100135 1
74:BS:100135 2
75:BS:100135 3
76:BS:100135 4
77:BS:100135 5
78:BS:100135 6
79:BS:100135 1
80:BS:100135 2
81:BS:100135 3
82:BS:100135 4
83:BS:100135 5
84:BS:100135 6
85:BS:100135 1
86:BS:100135 2
87:BS:100135 3
88:BS:100135 4
89:BS:100135 5
90:BS:100135 6
dSc集群


解决方案

您可以使用:

  list.of.dSc<  -  NULL 

for(i in seq_along(unique(mm $ dSc))){
if(length(unique(mm [mm $ dSc == as.character(unique(mm $ dSc)[i ]),Cluster]))== 1){
list.of.dSc <-c(list.of.dSc,unique(as.character(mm $ dSc))[i])
}
}

mm [!(mm $ dSc%in%list.of.dSc),]

dSc集群
BS:100135 1
32 BS:100135 2
33 BS:100135 3
34 BS:100135 4
35 BS:100135 5
...

您可以找到所有只有一个集群类的实例(或每个dSc),将它们添加到列表 list.of.dSc ),然后使用%in%过滤 mm <



这不是很漂亮,但我认为它解决了你的问题。


I am not sure whats going on here with my data ,may be I am missing some simple things .

For my dataset I have a id n a cluster point assigned to each id and I want to filter out the the ids which have only have a single cluster assigned to them .

for e.g

BS:100021 has only 1 cluster point assigned to it 

where as

BS:100135 has 6 cluster point assigned to it

to reproduce my example

set.seed(1) 
xx  <- cbind(rep("BS:100021",30),rep(1,30))
yy  <- cbind(rep("BS:100135",60),rep(1:6,10))
mm  <- as.data.table(rbind(xx,yy))
names(mm ) <- c("dSc","Cluster")

Now if I want to filter out "BS:100021" n similar and I am trying

mm[length(unique(Cluster)) > 1,.SD,dSc]

I am still getting

          dSc Cluster
 1: BS:100021       1
 2: BS:100021       1
 3: BS:100021       1
 4: BS:100021       1
 5: BS:100021       1
 6: BS:100021       1
 7: BS:100021       1
 8: BS:100021       1
 9: BS:100021       1
10: BS:100021       1
11: BS:100021       1
12: BS:100021       1
13: BS:100021       1
14: BS:100021       1
15: BS:100021       1
16: BS:100021       1
17: BS:100021       1
18: BS:100021       1
19: BS:100021       1
20: BS:100021       1
21: BS:100021       1
22: BS:100021       1
23: BS:100021       1
24: BS:100021       1
25: BS:100021       1
26: BS:100021       1
27: BS:100021       1
28: BS:100021       1
29: BS:100021       1
30: BS:100021       1
31: BS:100135       1
32: BS:100135       2
33: BS:100135       3
34: BS:100135       4
35: BS:100135       5
36: BS:100135       6
37: BS:100135       1
38: BS:100135       2
39: BS:100135       3
40: BS:100135       4
41: BS:100135       5
42: BS:100135       6
43: BS:100135       1
44: BS:100135       2
45: BS:100135       3
46: BS:100135       4
47: BS:100135       5
48: BS:100135       6
49: BS:100135       1
50: BS:100135       2
51: BS:100135       3
52: BS:100135       4
53: BS:100135       5
54: BS:100135       6
55: BS:100135       1
56: BS:100135       2
57: BS:100135       3
58: BS:100135       4
59: BS:100135       5
60: BS:100135       6
61: BS:100135       1
62: BS:100135       2
63: BS:100135       3
64: BS:100135       4
65: BS:100135       5
66: BS:100135       6
67: BS:100135       1
68: BS:100135       2
69: BS:100135       3
70: BS:100135       4
71: BS:100135       5
72: BS:100135       6
73: BS:100135       1
74: BS:100135       2
75: BS:100135       3
76: BS:100135       4
77: BS:100135       5
78: BS:100135       6
79: BS:100135       1
80: BS:100135       2
81: BS:100135       3
82: BS:100135       4
83: BS:100135       5
84: BS:100135       6
85: BS:100135       1
86: BS:100135       2
87: BS:100135       3
88: BS:100135       4
89: BS:100135       5
90: BS:100135       6
          dSc Cluster

解决方案

You could use:

list.of.dSc <- NULL

for(i in seq_along(unique(mm$dSc))){
    if(length(unique(mm[mm$dSc == as.character(unique(mm$dSc)[i]), "Cluster"])) == 1){
        list.of.dSc <- c(list.of.dSc, unique(as.character(mm$dSc))[i])
       }
}

mm[!(mm$dSc %in% list.of.dSc),]

         dSc Cluster
31 BS:100135       1
32 BS:100135       2
33 BS:100135       3
34 BS:100135       4
35 BS:100135       5
...

You find all instances (or, each dSc) where there is only one cluster class, add them to a list (list.of.dSc), and then use %in% to filter mm for all entries that are not in that list.

It's not very pretty, but I think it solves your question.

这篇关于数据表DT [i,j,by]不使用i中的标准来选择组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆