在data.table中的每个组中采样随机行 [英] Sample random rows within each group in a data.table

查看:58
本文介绍了在data.table中的每个组中采样随机行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您将如何使用data.table有效地对数据帧中每个组内的行进行采样?

How would you use data.table to efficiently take a sample of rows within each group in a data frame?

DT = data.table(a = sample(1:2), b = sample(1:1000,20))
DT
    a   b
 1: 2 562
 2: 1 183
 3: 2 180
 4: 1 874
 5: 2 533
 6: 1  21
 7: 2  57
 8: 1  20
 9: 2  39
10: 1 948
11: 2 799
12: 1 893
13: 2 993
14: 1  69
15: 2 906
16: 1 347
17: 2 969
18: 1 130
19: 2 118
20: 1 732

我当时想到这样的事情: DT [,sample(??,3),by = a] 将为每个 a返回三行样本(顺序为返回的行并不重要)

I was thinking of something like: DT[ , sample(??, 3), by = a] that would return a sample of three rows for each "a" (the order of the returned rows isn't significant):

    a   b
 1: 2 180
 2: 2  57
 3: 2 799
 4: 1  69
 5: 1 347
 6: 1 732


推荐答案

也许是这样吗?

> DT[,.SD[sample(.N, min(3,.N))],by = a]
   a   b
1: 1 744
2: 1 497
3: 1 167
4: 2 888
5: 2 950
6: 2 343

(感谢乔希所做的更正,如下。)

(Thanks to Josh for the correction, below.)

这篇关于在data.table中的每个组中采样随机行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆