在data.table中的每个组中采样随机行 [英] Sample random rows within each group in a data.table
本文介绍了在data.table中的每个组中采样随机行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
您将如何使用data.table有效地对数据帧中每个组内的行进行采样?
How would you use data.table to efficiently take a sample of rows within each group in a data frame?
DT = data.table(a = sample(1:2), b = sample(1:1000,20))
DT
a b
1: 2 562
2: 1 183
3: 2 180
4: 1 874
5: 2 533
6: 1 21
7: 2 57
8: 1 20
9: 2 39
10: 1 948
11: 2 799
12: 1 893
13: 2 993
14: 1 69
15: 2 906
16: 1 347
17: 2 969
18: 1 130
19: 2 118
20: 1 732
我当时想到这样的事情: DT [,sample(??,3),by = a]
将为每个 a返回三行样本(顺序为返回的行并不重要)
I was thinking of something like: DT[ , sample(??, 3), by = a]
that would return a sample of three rows for each "a" (the order of the returned rows isn't significant):
a b
1: 2 180
2: 2 57
3: 2 799
4: 1 69
5: 1 347
6: 1 732
推荐答案
也许是这样吗?
> DT[,.SD[sample(.N, min(3,.N))],by = a]
a b
1: 1 744
2: 1 497
3: 1 167
4: 2 888
5: 2 950
6: 2 343
(感谢乔希所做的更正,如下。)
(Thanks to Josh for the correction, below.)
这篇关于在data.table中的每个组中采样随机行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文