在数据帧中的一个因子的所有级别上选择n个随机行 [英] selecting n random rows across all levels of a factor within a dataframe

查看:310
本文介绍了在数据帧中的一个因子的所有级别上选择n个随机行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从这些问题 - 随机抽样行R数据帧的子集& R中数据帧中的随机行我可以很容易地看到如何随机抽样来自df的行,或n行,源自df中的一个因素的特定级别。

From these questions - Random sample of rows from subset of an R dataframe & Random rows in dataframe in R I can easily see how to randomly sample 'n' rows from a df, or 'n' rows that originate from a specific level of a factor within a df.

以下是一些示例数据:

df <- data.frame(matrix(rnorm(80), nrow=40))
df$color <-  rep(c("blue", "red", "yellow", "pink"), each=10)

df[sample(nrow(df), 3), ] #samples 3 random rows from df, without replacement.

只需从粉红色颜色中抽取3个随机行 - 使用库(kimisc):

To e.g. just sample 3 random rows from 'pink' color - using library(kimisc):

library(kimisc)
sample.rows(subset(df, color == "pink"), 3)

或编写自定义函数:

sample.df <- function(df, n) df[sample(nrow(df), n), , drop = FALSE]
sample.df(subset(df, color == "pink"), 3)

但是,我想要做的是创建一个新的df,其中包含来自所有级别的3(或n)个随机行。即新的df将有12行(3从蓝色,3从红色,3从黄色,3从粉红色)。显然可以运行这几次,为每种颜色创建newdf,然后将它们绑定在一起。然而,我正在努力制定一个更简单的解决方案,因为当我需要这么多层次的时候。

However, what I am trying to do is create a new df that contains 3 (or n) random row from all levels of the factor. i.e. the new df would have 12 rows (3 from blue, 3 from red, 3 from yellow, 3 from pink). It's obviously possible to run this several times, create newdfs for each color, and then bind them together. However, I am trying to work out a simpler solution, for when there are many, many levels that I need to do this across.

推荐答案

您可以使用 ave 为具有特定因素级别的每个元素分配一个随机ID。然后,您可以选择一定范围内的所有随机ID。

You can assign a random ID to each element that has a particular factor level using ave. Then you can select all random IDs in a certain range.

rndid <- with(df, ave(X1, color, FUN=function(x) {sample.int(length(x))}))
df[rndid<=3,]

如果这是您感兴趣的东西,则具有保留原始行顺序和行名的优点。此外,您可以重新使用 rndid 矢量以相当容易地创建不同长度的子集。

This has the advantage of preserving the original row order and row names if that's something you are interested in. Plus you can re-use the rndid vector to create subset of different lengths fairly easily.

这篇关于在数据帧中的一个因子的所有级别上选择n个随机行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆