由dplyr中的多个组引导 [英] bootstrapping by multiple groups in dplyr
问题描述
我正在尝试以整洁的方式引导由多个变量分组的双变量相关性。到目前为止,我已经得到:
I'm trying to bootstrap a bivariate correlation grouped by multiple variables in a tidy fashion. So far I've got:
paks <- c('dplyr','tidyr','broom')
lapply(paks, require, character.only=TRUE)
set.seed(123)
df <- data.frame(
rep(c('group1','group2','group3','group4'),25),
rep(c('subgroup1','subgroup2','subgroup3','subgroup4'),25),
rnorm(25),
rnorm(25)
)
colnames(df) <- c('group','subgroup','v1','v2')
cors_boot <- df %>%
group_by(., group,subgroup) %>%
bootstrap(., 10) %>%
do(tidy(cor.test(.$v1,.$v2)))
cors_boot
这将成功执行10次复制,但不会维持 group_by
的条件。任何帮助将不胜感激。
This will succesffuly run 10 replications, but will not maintain the group_by
conditions. Any help would be appreciated.
推荐答案
一种选择是利用嵌套的小块(使用 nest( )
),并使用purrr包中的函数进行迭代。例如:
One option is to make use of nested tibbles (using nest()
from tidyr) and iterating with functions from the purrr package. Here's an example:
df %>%
nest(-group, -subgroup) %>%
mutate(cors_boot = map(data, ~ bootstrap(., 10) %>% do(tidy(cor.test(.$v1,.$v2))))) %>%
unnest(cors_boot)
#> # A tibble: 40 × 11
#> group subgroup replicate estimate statistic p.value parameter
#> <fctr> <fctr> <int> <dbl> <dbl> <dbl> <int>
#> 1 group1 subgroup1 1 0.30199080 1.5192285 0.14233305 23
#> 2 group1 subgroup1 2 0.24782068 1.2267744 0.23231801 23
#> 3 group1 subgroup1 3 0.05697887 0.2737057 0.78675375 23
#> 4 group1 subgroup1 4 0.14141925 0.6851084 0.50012255 23
#> 5 group1 subgroup1 5 0.14769543 0.7161768 0.48109119 23
#> 6 group1 subgroup1 6 0.23407050 1.1546390 0.26009439 23
#> 7 group1 subgroup1 7 0.09388988 0.4522780 0.65530564 23
#> 8 group1 subgroup1 8 0.38602977 2.0068956 0.05665478 23
#> 9 group1 subgroup1 9 0.20248790 0.9916399 0.33169177 23
#> 10 group1 subgroup1 10 0.27430083 1.3679706 0.18453909 23
#> # ... with 30 more rows, and 4 more variables: conf.low <dbl>,
#> # conf.high <dbl>, method <fctr>, alternative <fctr>
请注意,除了还会加载purrr软件包之外,数据设置完全相同:
Note that data setup is all the same except the purrr package is also loaded:
paks <- c('dplyr','tidyr','broom','purrr')
lapply(paks, require, character.only=TRUE)
set.seed(123)
df <- data.frame(
rep(c('group1','group2','group3','group4'),25),
rep(c('subgroup1','subgroup2','subgroup3','subgroup4'),25),
rnorm(25),
rnorm(25)
)
colnames(df) <- c('group','subgroup','v1','v2')
此外,如果它们是您的新手,我在一些博客文章中也谈到了嵌套的小问题。例如,此处。
Aside, if they're new to you, I've written a little about nested tibbles in some blog posts. E.g., here.
这篇关于由dplyr中的多个组引导的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!