您如何动态从 data.table 中提取一些随机行 [英] How do you extract a few random rows from a data.table on the fly

查看:20
本文介绍了您如何动态从 data.table 中提取一些随机行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的 data.table(大约 24000 行并且还在增长).我想根据几个标准对该数据表进行子集化,并从该子集(最终大约为 3000 行)中随机抽取 4 行.我不想创建一个命名为 3000 左右的行 data.table,计算其行数,然后根据行号进行采样.我怎样才能在飞行中做到这一点?还是我应该通过创建表格然后对其进行处理、对其进行采样然后使用 rm() 来摆脱它?

I have a large data.table (about 24000 rows and growing). I want to subset that datatable based on a couple of criteria and from that subset (ends up being about 3000 rows) I want to randomly sample just 4 rows. I do not want to create a named 3000 or so row data.table, count its rows and then sample based on row number. How can I do it on the fly? Or should I just suck it up by creating the table and then working on it, sampling it and then using rm() to get rid of it?

让我们模拟一下我的问题

Lets simulate my issue

require(data.table)
random.length  <-  sample(x = 15:30, size = 1)
data.table(city=sample(c("Cape Town", "New York", "Pittsburgh", "Tel Aviv", "Amsterdam"), size=random.length, replace = TRUE), score = sample(x=1:10, size = random.length, replace=TRUE)) 

这会生成一个随机长度表,它模拟了这样一个事实,即根据我的标准和我的起始表,我不知道子集表的长度是多少

That makes a random length table, which simulates the fact that depending on my criteria and depending on my starting table, I do not know what the length of the subsetted table with be

现在,如果我只想要前三行,我可以这样做

Now, if I just wanted the first three rows I could do as so

data.table(city=sample(c("Cape Town", "New York", "Pittsburgh", "Tel Aviv", "Amsterdam"), size=random.length, replace = TRUE), score = sample(x=1:10, size = random.length, replace=TRUE))[1:3]

但是假设我不想要前三行而是随机的 3 行,那么我想做这样的事情......

But let us say I did not want the first three rows but rather a random 3 rows, then I would want to do something such as this...

data.table(city=sample(c("Cape Town", "New York", "Pittsburgh", "Tel Aviv", "Amsterdam"), size=random.length, replace = TRUE), score = sample(x=1:10, size = random.length, replace=TRUE))[sample(x= 1:number of rows of that previous data.table,size = 3 ]

那是行不通的.我如何动态计算初始 data.frame 的长度是多少?

That will not work. How do I compute, on the fly, what the length of the initial data.frame was?

推荐答案

刚刚使 .Ni 中工作.新的自述文件:

Have just made .N work in i. New README item :

.N 现在在 i 中可用,FR#724.间接感谢新手这里和Farrel直接这里.

.N is now available in i, FR#724. Thanks to newbie indirectly here and Farrel directly here.

现在可以了:

DT[...][...][sample(.N,3)]

例如

> random.length  <-  sample(x = 15:30, size = 1)
> data.table(city = sample(c("Cape Town", "New York", "Pittsburgh", "Tel Aviv", "Amsterdam"),size=random.length, replace = TRUE), score = sample(x=1:10, size = random.length, replace=TRUE))[sample(.N, 3)] 
         city score
1:   New York     4
2: Pittsburgh     3
3:  Cape Town     9
> 

这篇关于您如何动态从 data.table 中提取一些随机行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆