如何从数据表中提取几个随机行 [英] How do you extract a few random rows from a data.table on the fly
问题描述
我有一个大的data.table(约24000行和增长)。我想根据几个条件和该子集(最终约3000行)的子集该数据库我想随机抽样只有4行。我不想创建一个名为3000行的data.table,计数其行,然后基于行号进行抽样。我怎么能在飞行呢?或者我应该通过创建表,然后对其进行处理,抽样,然后使用 rm()
来摆脱它呢?
I have a large data.table (about 24000 rows and growing). I want to subset that datatable based on a couple of criteria and from that subset (ends up being about 3000 rows) I want to randomly sample just 4 rows. I do not want to create a named 3000 or so row data.table, count its rows and then sample based on row number. How can I do it on the fly? Or should I just suck it up by creating the table and then working on it, sampling it and then using rm()
to get rid of it?
让我模拟我的问题
require(data.table)
random.length <- sample(x = 15:30, size = 1)
data.table(city=sample(c("Cape Town", "New York", "Pittsburgh", "Tel Aviv", "Amsterdam"), size=random.length, replace = TRUE), score = sample(x=1:10, size = random.length, replace=TRUE))
这是一个随机长度表,模拟了一个事实,根据我的标准,并根据我的起始表,我不知道子集表的长度是
That makes a random length table, which simulates the fact that depending on my criteria and depending on my starting table, I do not know what the length of the subsetted table with be
现在,如果我只想要前三行,我可以这样做。
Now, if I just wanted the first three rows I could do as so
data.table(city=sample(c("Cape Town", "New York", "Pittsburgh", "Tel Aviv", "Amsterdam"), size=random.length, replace = TRUE), score = sample(x=1:10, size = random.length, replace=TRUE))[1:3]
$ b b
但是让我们说我不想要前三行,而是随机的3行,那么我想做一些这样的事情...
But let us say I did not want the first three rows but rather a random 3 rows, then I would want to do something such as this...
data.table(city=sample(c("Cape Town", "New York", "Pittsburgh", "Tel Aviv", "Amsterdam"), size=random.length, replace = TRUE), score = sample(x=1:10, size = random.length, replace=TRUE))[sample(x= 1:number of rows of that previous data.table,size = 3 ]
这将不起作用。如何计算初始数据框架的长度?
That will not work. How do I compute, on the fly, what the length of the initial data.frame was?
推荐答案
刚刚创建的 .N
在 i
中工作。新的README项目:
Have just made .N
work in i
. New README item :
.N
is now available ini
, FR#724. Thanks to newbie indirectly here and Farrel directly here.
现在的工作原理:
DT[...][...][sample(.N,3)]
例如
> random.length <- sample(x = 15:30, size = 1)
> data.table(city = sample(c("Cape Town", "New York", "Pittsburgh", "Tel Aviv", "Amsterdam"),size=random.length, replace = TRUE), score = sample(x=1:10, size = random.length, replace=TRUE))[sample(.N, 3)]
city score
1: New York 4
2: Pittsburgh 3
3: Cape Town 9
>
这篇关于如何从数据表中提取几个随机行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!