在 R 中随机选择组(以及每组的所有案例)? [英] Randomly select groups (and all cases per group) in R?

查看:53
本文介绍了在 R 中随机选择组(以及每组的所有案例)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含两级数据的 R 数据框:idyear.在由 id 定义的组内,年数增加(整个数据集每组具有相同的(数)年,如下所示:

I have an R dataframe with two levels of data: id and year. Within groups defined by id, the years increase (entire dataset has the same (number of) years per group, like so:

id    year    var1    var2
11A   2001    ...     ...
11A   2002    ...     ...
11A   2003    ...     ...
11A   2004    ...     ...
13B   2001    ...     ...
13B   2002    ...     ...
13B   2003    ...     ...
13B   2004    ...     ...
22Z   2001    ...     ...

我的数据中有大约 20.000 个组,当然太多了,无法绘制出漂亮的增长曲线图.我如何随机选择大约 20 个我的 ID?(所以:还要选择与该 ID 对应的所有 4 行年份?)

I have about 20.000 groups in my data, of couse way too many to make nice plots of growth curves. How do I randomly select about 20 of my id's? (so: also select all 4 rows of years corresponding to that id?)

推荐答案

如果您使用 sample 然后 index.这是一个虚构的示例,看起来与您所提供的相似.它实际上只有两行代码,如果您愿意,可以在一行中完成.

This is pretty straight forward if you use sample and then index. Here's a made up example that looks similar to what you've presented. It's really only two lines of code and could be done in one if you wanted.

dat <- data.frame(id=paste0(LETTERS[1:8], rep(1:1250, 8)), 
   year=as.factor(as.character(sample(c(1990:2012, 20000, T)))), 
   var1=rnorm(20000), var2=rnorm(20000))

#a look at the data
head(dat)

#sample 20 id's randomly
(ids <- sample(unique(dat$id), 20))

#narrow your data set
dat2 <- dat[dat$id %in% ids, ]

这篇关于在 R 中随机选择组(以及每组的所有案例)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆