如何从数据表中提取几个随机行 [英] How do you extract a few random rows from a data.table on the fly

查看:193
本文介绍了如何从数据表中提取几个随机行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的data.table(约24000行和增长)。我想根据几个条件和该子集(最终约3000行)的子集该数据库我想随机抽样只有4行。我不想创建一个名为3000行的data.table,计数其行,然后基于行号进行抽样。我怎么能在飞行呢?或者我应该通过创建表,然后对其进行处理,抽样,然后使用 rm()来摆脱它呢?

I have a large data.table (about 24000 rows and growing). I want to subset that datatable based on a couple of criteria and from that subset (ends up being about 3000 rows) I want to randomly sample just 4 rows. I do not want to create a named 3000 or so row data.table, count its rows and then sample based on row number. How can I do it on the fly? Or should I just suck it up by creating the table and then working on it, sampling it and then using rm() to get rid of it?

让我模拟我的问题

require(data.table)
random.length  <-  sample(x = 15:30, size = 1)
data.table(city=sample(c("Cape Town", "New York", "Pittsburgh", "Tel Aviv", "Amsterdam"), size=random.length, replace = TRUE), score = sample(x=1:10, size = random.length, replace=TRUE)) 

这是一个随机长度表,模拟了一个事实,根据我的标准,并根据我的起始表,我不知道子集表的长度是

That makes a random length table, which simulates the fact that depending on my criteria and depending on my starting table, I do not know what the length of the subsetted table with be

现在,如果我只想要前三行,我可以这样做。

Now, if I just wanted the first three rows I could do as so

data.table(city=sample(c("Cape Town", "New York", "Pittsburgh", "Tel Aviv", "Amsterdam"), size=random.length, replace = TRUE), score = sample(x=1:10, size = random.length, replace=TRUE))[1:3]


$ b b

但是让我们说我不想要前三行,而是随机的3行,那么我想做一些这样的事情...

But let us say I did not want the first three rows but rather a random 3 rows, then I would want to do something such as this...

data.table(city=sample(c("Cape Town", "New York", "Pittsburgh", "Tel Aviv", "Amsterdam"), size=random.length, replace = TRUE), score = sample(x=1:10, size = random.length, replace=TRUE))[sample(x= 1:number of rows of that previous data.table,size = 3 ]

这将不起作用。如何计算初始数据框架的长度?

That will not work. How do I compute, on the fly, what the length of the initial data.frame was?

推荐答案

刚刚创建的 .N i 中工作。新的README项目:

Have just made .N work in i. New README item :


.N 现在可在 i FR#724 。感谢新手间接这里和Farrel直接此处

.N is now available in i, FR#724. Thanks to newbie indirectly here and Farrel directly here.

现在的工作原理:

DT[...][...][sample(.N,3)]

例如

> random.length  <-  sample(x = 15:30, size = 1)
> data.table(city = sample(c("Cape Town", "New York", "Pittsburgh", "Tel Aviv", "Amsterdam"),size=random.length, replace = TRUE), score = sample(x=1:10, size = random.length, replace=TRUE))[sample(.N, 3)] 
         city score
1:   New York     4
2: Pittsburgh     3
3:  Cape Town     9
> 

这篇关于如何从数据表中提取几个随机行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆