分层抽样,组大小低于 R 中的样本大小 [英] stratified sampling with group size below sample size in R
本文介绍了分层抽样,组大小低于 R 中的样本大小的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下格式的市场响应数据:
I have response data by market in the format:
head(df)
ID market q1 q2
470 France 1 3
625 Germany 0 2
155 Italy 1 6
648 Spain 0 5
862 France 1 7
699 Germany 0 8
460 Italy 1 6
333 Spain 1 5
776 Spain 1 4
以及以下频率:
table(df$market)
France 140
Germany 300
Italy 50
Spain 75
我需要创建一个数据框,其中包含每个市场 100 个响应的样本,并且在少于 100 个响应的情况下所有响应均不替换.
I need to create a data frame with a sample of 100 responses per market, and all responses without replacement in cases when there's less than 100 of them.
所以
table(df_new$market)
France 100
Germany 100
Italy 50
Spain 75
提前致谢!
推荐答案
以下看起来有效:
set.seed(10); DF = data.frame(c1 = sample(LETTERS[1:4], 25, T), c2 = runif(25))
freqs = as.data.frame(table(DF$c1))
freqs$ss = ifelse(freqs$Freq >= 5, 5, freqs$Freq)
#> freqs
# Var1 Freq ss
#1 A 4 4
#2 B 11 5
#3 C 7 5
#4 D 3 3
res = mapply(function(x, y) DF[sample(which(DF$c1 %in% x), y), ],
x = freqs$Var1, y = freqs$ss, SIMPLIFY = F)
do.call(rbind, res)
# c1 c2
#5 A 0.3558977
#17 A 0.2289039
#6 A 0.5355970
#13 A 0.9546536
#3 B 0.2395891
#25 B 0.8015470
#10 B 0.4226376
#15 B 0.5005032
#19 B 0.7289646
#11 C 0.7477465
#9 C 0.8998325
#12 C 0.8226526
#1 C 0.7066469
#4 C 0.7707715
#23 D 0.4861003
#20 D 0.2498805
#21 D 0.1611833
这篇关于分层抽样,组大小低于 R 中的样本大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文