如何使用data.table对具有不同样本量的组进行抽样 [英] How do you sample groups with different sample sizes with data.table

查看:21
本文介绍了如何使用data.table对具有不同样本量的组进行抽样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用data.table在相对较大的数据集上加快某些计算的速度.下面的示例复制了这种情况:

I am trying to use data.table to speed some calculations on a relatively large dataset. The example below replicates the situation:

DT = data.table(a=sample(1:2), b=sample(1:1000,20))
> DT
   a   b
1:  2 440
2:  1   5
3:  2 795
4:  1 138
5:  2 941
6:  1 929
7:  2 759
8:  1 192
9:  2 994
10: 1 176
11: 2 152
12: 1 893
13: 2  28
14: 1 884
15: 2 467
16: 1 761
17: 2 879
18: 1 964
19: 2 802
20: 1 271

我想采样不同数量的重复组a == 1和a == 2,例如,n1 = 5和n2 = 3个重复而无需替换,并获得类似的结果

I want to sample different numbers of replicates groups a==1 and a==2, e.g., n1=5 and n2=3 replicates without replacement, and obtain something like

 a   b
1: 2 440
2: 2 879
3: 2 994
4: 2 152
5: 2 879
6: 1 884
7: 1 964
8: 1 929

但是我似乎无法使用data.table来解决它,即我无法将不同的样本大小插入到data.table命令中.有什么办法吗?我是data.table和R的新手,所以非常感谢任何有建设性的指导意见

But I cannot seem to be able to get around it with data.table, i.e., I cannot insert the different sample sizes into a data.table commmand. Is there any way to do it? I'm new to data.table and R so any constructive guidance would be greatly apprecieated

推荐答案

一种选择是将'b'列用'a'进行拆分,将作为矢量传递给 Map ,并使用相应的"size"获取"b"的 sample .输出是一个 list ,可以使用 stack 将其转换为具有两列的"data.frame".

One option would be to split the 'b' column by 'a', pass the 'size' as a vector in Map and get the sample of 'b' using the corresponding 'size'. The output is a list, which can be converted to a 'data.frame' with 2 columns using stack.

set.seed(24)
stack(Map(sample, split(DT$b, DT$a), size=c(5,3),MoreArgs=list(replace=FALSE)))
#  values ind
#1    279   1
#2     93   1
#3    665   1
#4    797   1
#5    317   1
#6    542   2
#7    761   2
#8    893   2

或者使用 data.table 方法,我们融化通过 Map 获得的 list 输出.

Or using data.table methods, we melt the list output we got with Map.

set.seed(24)
DT[, melt(Map(sample, split(b, a), size=c(5,3), MoreArgs=list(replace=FALSE)))]
#  value L1
#1   279  1
#2    93  1
#3   665  1
#4   797  1
#5   317  1
#6   542  2
#7   761  2
#8   893  2

这篇关于如何使用data.table对具有不同样本量的组进行抽样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆