分层抽样,组大小低于 R 中的样本大小 [英] stratified sampling with group size below sample size in R

查看:55
本文介绍了分层抽样,组大小低于 R 中的样本大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下格式的市场响应数据:

I have response data by market in the format:

head(df)
    ID  market  q1  q2
    470 France  1   3
    625 Germany 0   2
    155 Italy   1   6
    648 Spain   0   5
    862 France  1   7
    699 Germany 0   8
    460 Italy   1   6
    333 Spain   1   5
    776 Spain   1   4

以及以下频率:

 table(df$market)
    France  140
    Germany 300
    Italy   50
    Spain   75

我需要创建一个数据框,其中包含每个市场 100 个响应的样本,并且在少于 100 个响应的情况下所有响应均不替换.

I need to create a data frame with a sample of 100 responses per market, and all responses without replacement in cases when there's less than 100 of them.

所以

table(df_new$market)
        France  100
        Germany 100
        Italy   50
        Spain   75

提前致谢!

推荐答案

以下看起来有效:

set.seed(10); DF = data.frame(c1 = sample(LETTERS[1:4], 25, T), c2 = runif(25))
freqs = as.data.frame(table(DF$c1))
freqs$ss = ifelse(freqs$Freq >= 5, 5, freqs$Freq)
#> freqs
#  Var1 Freq ss
#1    A    4  4
#2    B   11  5
#3    C    7  5
#4    D    3  3
res = mapply(function(x, y) DF[sample(which(DF$c1 %in% x), y), ], 
             x = freqs$Var1, y = freqs$ss, SIMPLIFY = F)
do.call(rbind, res)
#   c1        c2
#5   A 0.3558977
#17  A 0.2289039
#6   A 0.5355970
#13  A 0.9546536
#3   B 0.2395891
#25  B 0.8015470
#10  B 0.4226376
#15  B 0.5005032
#19  B 0.7289646
#11  C 0.7477465
#9   C 0.8998325
#12  C 0.8226526
#1   C 0.7066469
#4   C 0.7707715
#23  D 0.4861003
#20  D 0.2498805
#21  D 0.1611833

这篇关于分层抽样,组大小低于 R 中的样本大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆