在 data.table 中的每个组中随机抽取行 [英] Sample random rows within each group in a data.table
本文介绍了在 data.table 中的每个组中随机抽取行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
您将如何使用 data.table 有效地对数据框中的每个组中的行进行采样?
How would you use data.table to efficiently take a sample of rows within each group in a data frame?
DT = data.table(a = sample(1:2), b = sample(1:1000,20))
DT
a b
1: 2 562
2: 1 183
3: 2 180
4: 1 874
5: 2 533
6: 1 21
7: 2 57
8: 1 20
9: 2 39
10: 1 948
11: 2 799
12: 1 893
13: 2 993
14: 1 69
15: 2 906
16: 1 347
17: 2 969
18: 1 130
19: 2 118
20: 1 732
我在想类似的东西: DT[ , sample(??, 3), by = a]
这将为每个a"返回一个三行样本(返回的行不重要):
I was thinking of something like: DT[ , sample(??, 3), by = a]
that would return a sample of three rows for each "a" (the order of the returned rows isn't significant):
a b
1: 2 180
2: 2 57
3: 2 799
4: 1 69
5: 1 347
6: 1 732
推荐答案
可能是这样的?
> DT[,.SD[sample(.N, min(3,.N))],by = a]
a b
1: 1 744
2: 1 497
3: 1 167
4: 2 888
5: 2 950
6: 2 343
(感谢 Josh 的更正,如下.)
(Thanks to Josh for the correction, below.)
这篇关于在 data.table 中的每个组中随机抽取行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文