在dplyr tidyverse中按组采样不同数量的行 [英] Sampling different numbers of rows by group in dplyr tidyverse
本文介绍了在dplyr tidyverse中按组采样不同数量的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想按组对数据帧中的行进行采样.但是这很重要,我想根据另一个表中的数据对不同数量的记录进行采样.这是我的可复制数据:
I'd like to sample rows from a data frame by group. But here's the catch, I'd like to sample a different number of records based on data from another table. Here is my reproducible data:
df <- data_frame(
Stratum = rep(c("High","Medium","Low"), 10),
id = c(1:30),
Value = runif(30)
)
sampleGuide <- data_frame(
Stratum = c("High","Medium","Low"),
Surveys = c(3,2,5)
)
输出应如下所示:
# A tibble: 10 × 2
Stratum Value
<chr> <dbl>
1 High 0.21504972
2 High 0.71069005
3 High 0.09286843
4 Medium 0.52553056
5 Medium 0.06682459
6 Low 0.38793128
7 Low 0.01285081
8 Low 0.87865734
9 Low 0.09100829
10 Low 0.14851919
这是我的NONWORKING尝试
Here is my NONWORKING attempt
> df %>%
+ left_join(sampleGuide, by = "Stratum") %>%
+ group_by(Stratum) %>%
+ sample_n(unique(Surveys))
Error in unique(Surveys) : object 'Surveys' not found
也
> df %>%
+ group_by(Stratum) %>%
+ nest() %>%
+ left_join(sampleGuide, by = "Stratum") %>%
+ mutate(sample = map(., ~ sample_n(data, Surveys)))
Error in mutate_impl(.data, dots) :
Don't know how to sample from objects of class function
似乎sample_n
要求size
是单个数字.有什么想法吗?
It seems like sample_n
requires the size
to be a single number. Any ideas?
我只是在寻找tidyverse
解决方案. purrr
加分!
I'm only looking for tidyverse
solutions. Extra points for purrr
!
这是一个类似的问题,但是我对接受的答案不满意,因为IRL处理的阶层数很大.
This was a similar problem, but I am not satisfied with the accepted answer because IRL the number of strata I'm dealing with is large.
推荐答案
用purrr
df %>%
nest(-Stratum) %>%
left_join(sampleGuide, by = "Stratum") %>%
mutate(Sample = map2(data, Surveys, sample_n)) %>%
unnest(Sample)
这篇关于在dplyr tidyverse中按组采样不同数量的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文