在数据框中每组采样 n 个随机行 [英] Sample n random rows per group in a dataframe
问题描述
来自这些问题 - 来自R 数据帧的子集 &对数据框中的随机行进行采样 我可以很容易地看到如何随机采样(选择)来自 df 的 'n' 行,或源自 df 中特定因子级别的 'n' 行.
From these questions - Random sample of rows from subset of an R dataframe & Sample random rows in dataframe I can easily see how to randomly sample (select) 'n' rows from a df, or 'n' rows that originate from a specific level of a factor within a df.
以下是一些示例数据:
df <- data.frame(matrix(rnorm(80), nrow=40))
df$color <- rep(c("blue", "red", "yellow", "pink"), each=10)
df[sample(nrow(df), 3), ] #samples 3 random rows from df, without replacement.
例如只需从粉红色"颜色中随机抽取 3 行 - 使用 library(kimisc)
:
To e.g. just sample 3 random rows from 'pink' color - using library(kimisc)
:
library(kimisc)
sample.rows(subset(df, color == "pink"), 3)
或编写自定义函数:
sample.df <- function(df, n) df[sample(nrow(df), n), , drop = FALSE]
sample.df(subset(df, color == "pink"), 3)
但是,我想从因子的每个级别中抽取 3(或 n)个随机行.IE.新的 df 将有 12 行(3 行来自蓝色,3 行来自红色,3 行来自黄色,3 行来自粉红色).显然可以多次运行,为每种颜色创建 newdfs,然后将它们绑定在一起,但我正在寻找一个更简单的解决方案.
However, I want to sample 3 (or n) random rows from each level of the factor. I.e. the new df would have 12 rows (3 from blue, 3 from red, 3 from yellow, 3 from pink). It's obviously possible to run this several times, create newdfs for each color, and then bind them together, but I am looking for a simpler solution.
推荐答案
您可以使用 ave
为每个具有特定因子水平的元素分配一个随机 ID.然后可以选择一定范围内的所有随机ID.
You can assign a random ID to each element that has a particular factor level using ave
. Then you can select all random IDs in a certain range.
rndid <- with(df, ave(X1, color, FUN=function(x) {sample.int(length(x))}))
df[rndid<=3,]
如果您感兴趣的话,这样做的好处是可以保留原始行顺序和行名称.此外,您可以重新使用 rndid
向量来相当轻松地创建不同长度的子集.
This has the advantage of preserving the original row order and row names if that's something you are interested in. Plus you can re-use the rndid
vector to create subset of different lengths fairly easily.
这篇关于在数据框中每组采样 n 个随机行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!