如何使用data.table对具有不同样本量的组进行抽样 [英] How do you sample groups with different sample sizes with data.table
问题描述
我正在尝试使用data.table在相对较大的数据集上加快某些计算的速度.下面的示例复制了这种情况:
I am trying to use data.table to speed some calculations on a relatively large dataset. The example below replicates the situation:
DT = data.table(a=sample(1:2), b=sample(1:1000,20))
> DT
a b
1: 2 440
2: 1 5
3: 2 795
4: 1 138
5: 2 941
6: 1 929
7: 2 759
8: 1 192
9: 2 994
10: 1 176
11: 2 152
12: 1 893
13: 2 28
14: 1 884
15: 2 467
16: 1 761
17: 2 879
18: 1 964
19: 2 802
20: 1 271
我想采样不同数量的重复组a == 1和a == 2,例如,n1 = 5和n2 = 3个重复而无需替换,并获得类似的结果
I want to sample different numbers of replicates groups a==1 and a==2, e.g., n1=5 and n2=3 replicates without replacement, and obtain something like
a b
1: 2 440
2: 2 879
3: 2 994
4: 2 152
5: 2 879
6: 1 884
7: 1 964
8: 1 929
但是我似乎无法使用data.table来解决它,即我无法将不同的样本大小插入到data.table命令中.有什么办法吗?我是data.table和R的新手,所以非常感谢任何有建设性的指导意见
But I cannot seem to be able to get around it with data.table, i.e., I cannot insert the different sample sizes into a data.table commmand. Is there any way to do it? I'm new to data.table and R so any constructive guidance would be greatly apprecieated
推荐答案
一种选择是将'b'列用'a'进行拆分
,将 Map
,并使用相应的"size"获取"b"的 sample
.输出是一个 list
,可以使用 stack
将其转换为具有两列的"data.frame".
One option would be to split
the 'b' column by 'a', pass the 'size' as a vector in Map
and get the sample
of 'b' using the corresponding 'size'. The output is a list
, which can be converted to a 'data.frame' with 2 columns using stack
.
set.seed(24)
stack(Map(sample, split(DT$b, DT$a), size=c(5,3),MoreArgs=list(replace=FALSE)))
# values ind
#1 279 1
#2 93 1
#3 665 1
#4 797 1
#5 317 1
#6 542 2
#7 761 2
#8 893 2
或者使用 data.table
方法,我们融化
通过 Map
获得的 list
输出.
Or using data.table
methods, we melt
the list
output we got with Map
.
set.seed(24)
DT[, melt(Map(sample, split(b, a), size=c(5,3), MoreArgs=list(replace=FALSE)))]
# value L1
#1 279 1
#2 93 1
#3 665 1
#4 797 1
#5 317 1
#6 542 2
#7 761 2
#8 893 2
这篇关于如何使用data.table对具有不同样本量的组进行抽样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!