在 R 中随机选择组(以及每组的所有案例)? [英] Randomly select groups (and all cases per group) in R?
问题描述
我有一个包含两级数据的 R 数据框:id
和 year
.在由 id
定义的组内,年数增加(整个数据集每组具有相同的(数)年,如下所示:
I have an R dataframe with two levels of data: id
and year
. Within groups defined by id
, the years increase (entire dataset has the same (number of) years per group, like so:
id year var1 var2
11A 2001 ... ...
11A 2002 ... ...
11A 2003 ... ...
11A 2004 ... ...
13B 2001 ... ...
13B 2002 ... ...
13B 2003 ... ...
13B 2004 ... ...
22Z 2001 ... ...
我的数据中有大约 20.000 个组,当然太多了,无法绘制出漂亮的增长曲线图.我如何随机选择大约 20 个我的 ID?(所以:还要选择与该 ID 对应的所有 4 行年份?)
I have about 20.000 groups in my data, of couse way too many to make nice plots of growth curves. How do I randomly select about 20 of my id's? (so: also select all 4 rows of years corresponding to that id?)
推荐答案
如果您使用 sample
然后 index.这是一个虚构的示例,看起来与您所提供的相似.它实际上只有两行代码,如果您愿意,可以在一行中完成.
This is pretty straight forward if you use sample
and then index. Here's a made up example that looks similar to what you've presented. It's really only two lines of code and could be done in one if you wanted.
dat <- data.frame(id=paste0(LETTERS[1:8], rep(1:1250, 8)),
year=as.factor(as.character(sample(c(1990:2012, 20000, T)))),
var1=rnorm(20000), var2=rnorm(20000))
#a look at the data
head(dat)
#sample 20 id's randomly
(ids <- sample(unique(dat$id), 20))
#narrow your data set
dat2 <- dat[dat$id %in% ids, ]
这篇关于在 R 中随机选择组(以及每组的所有案例)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!