从数据帧进行分层随机抽样 [英] Stratified random sampling from data frame

查看:75
本文介绍了从数据帧进行分层随机抽样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,格式为:

I have a data frame in the format:

head(subset)
# ants  0 1 1 0 1 
# age   1 2 2 1 3
# lc    1 1 0 1 0

我需要根据年龄和lc创建带有随机样本的新数据框.例如,我想要30个来自age:1和lc:1的样本,30个来自age:1和lc:0的样本,等等.

I need to create new data frame with random samples according to age and lc. For example I want 30 samples from age:1 and lc:1, 30 samples from age:1 and lc:0 etc.

我确实看过像这样的随机抽样方法;

I did look at random sampling method like;

newdata <- function(subset, age, 30)

但这不是我想要的代码.

But it is not the code that I want.

推荐答案

我建议使用"splitstackshape"程序包中的stratified或"dplyr"程序包中的sample_n:

I would suggest using either stratified from my "splitstackshape" package, or sample_n from the "dplyr" package:

## Sample data
set.seed(1)
n <- 1e4
d <- data.table(age = sample(1:5, n, T), 
                lc = rbinom(n, 1 , .5),
                ants = rbinom(n, 1, .7))
# table(d$age, d$lc)

对于stratified,您基本上可以指定数据集,分层列,以及一个整数,该整数表示每个组所需的大小,或者一个十进制表示要返回的分数(例如,.1表示每个组的10% ).

For stratified, you basically specify the dataset, the stratifying columns, and an integer representing the size you want from each group OR a decimal representing the fraction you want returned (for example, .1 represents 10% from each group).

library(splitstackshape)
set.seed(1)
out <- stratified(d, c("age", "lc"), 30)
head(out)
#    age lc ants
# 1:   1  0    1
# 2:   1  0    0
# 3:   1  0    1
# 4:   1  0    1
# 5:   1  0    0
# 6:   1  0    1

table(out$age, out$lc)
#    
#      0  1
#   1 30 30
#   2 30 30
#   3 30 30
#   4 30 30
#   5 30 30

对于sample_n,您首先创建一个分组表(使用group_by),然后指定所需的观测值数量.如果要使用比例采样,则应使用sample_frac.

For sample_n you first create a grouped table (using group_by) and then specify the number of observations you want. If you wanted proportional sampling instead, you should use sample_frac.

library(dplyr)
set.seed(1)
out2 <- d %>%
  group_by(age, lc) %>%
  sample_n(30)

# table(out2$age, out2$lc)

这篇关于从数据帧进行分层随机抽样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆