因子分层抽样 [英] Stratified sampling on factor
本文介绍了因子分层抽样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个1000行的数据集,其结构如下:
I have a dataset of 1000 rows with the following structure:
device geslacht leeftijd type1 type2
1 mob 0 53 C 3
2 tab 1 64 G 7
3 pc 1 50 G 7
4 tab 0 75 C 3
5 mob 1 54 G 7
6 pc 1 58 H 8
7 pc 1 57 A 1
8 pc 0 68 E 5
9 pc 0 66 G 7
10 mob 0 45 C 3
11 tab 1 77 E 5
12 mob 1 16 A 1
我想做一个样本80行,由10行type1 = A,10行type1 = B等组成,依此类推。
I would like to make a sample of 80 rows, composed of 10 rows with type1 = A, 10 rows with type1 = B, and so on. Is there anyone who can help he?
推荐答案
Base R解决方案:
Base R solution:
do.call(rbind,
lapply(split(df, df$type1), function(i)
i[sample(1:nrow(i), size = 10, replace = TRUE),]))
编辑:
@BrodieG建议的其他解决方案
Other solutions suggested by @BrodieG
with(DF, DF[unlist(lapply(split(seq(type), type), sample, 10, TRUE)), ])
with(DF, DF[c(sapply(split(seq(type), type), sample, 10, TRUE)), ])
这篇关于因子分层抽样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文