数据帧的分层随机抽样 [英] Stratified random sampling from data frame

查看：22 发布时间：2022/1/30 22:23:34 r random sampling

本文介绍了数据帧的分层随机抽样的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个格式如下的数据框:

I have a data frame in the format:

head(subset)
# ants  0 1 1 0 1 
# age   1 2 2 1 3
# lc    1 1 0 1 0

我需要根据年龄和 lc 使用随机样本创建新数据框.例如，我想要来自 age:1 和 lc:1 的 30 个样本，来自 age:1 和 lc:0 的 30 个样本等.

I need to create new data frame with random samples according to age and lc. For example I want 30 samples from age:1 and lc:1, 30 samples from age:1 and lc:0 etc.

我确实看过随机抽样方法，例如；

I did look at random sampling method like;

newdata <- function(subset, age, 30)

但这不是我想要的代码.

But it is not the code that I want.

推荐答案

我建议使用splitstackshape"包中的 stratified 或dplyr"中的 sample_n" 包装:

I would suggest using either stratified from my "splitstackshape" package, or sample_n from the "dplyr" package:

## Sample data
set.seed(1)
n <- 1e4
d <- data.table(age = sample(1:5, n, T), 
                lc = rbinom(n, 1 , .5),
                ants = rbinom(n, 1, .7))
# table(d$age, d$lc)

对于 stratified，您基本上可以指定数据集、分层列和一个表示您希望从每个组中获得的大小的整数或一个表示您希望返回的分数的小数(例如，.1 表示每组 10%).

For stratified, you basically specify the dataset, the stratifying columns, and an integer representing the size you want from each group OR a decimal representing the fraction you want returned (for example, .1 represents 10% from each group).

library(splitstackshape)
set.seed(1)
out <- stratified(d, c("age", "lc"), 30)
head(out)
#    age lc ants
# 1:   1  0    1
# 2:   1  0    0
# 3:   1  0    1
# 4:   1  0    1
# 5:   1  0    0
# 6:   1  0    1

table(out$age, out$lc)
#    
#      0  1
#   1 30 30
#   2 30 30
#   3 30 30
#   4 30 30
#   5 30 30

对于 sample_n，您首先创建一个分组表(使用 group_by)，然后指定所需的观察次数.如果您想要按比例采样，则应使用 sample_frac.

For sample_n you first create a grouped table (using group_by) and then specify the number of observations you want. If you wanted proportional sampling instead, you should use sample_frac.

library(dplyr)
set.seed(1)
out2 <- d %>%
  group_by(age, lc) %>%
  sample_n(30)

# table(out2$age, out2$lc)

这篇关于数据帧的分层随机抽样的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

数据帧的分层随机抽样 [英] Stratified random sampling from data frame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

数据帧的分层随机抽样 [英] Stratified random sampling from data frame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭