由dplyr中的多个组引导 [英] bootstrapping by multiple groups in dplyr

查看：66 发布时间：2020/6/7 18:42:37 r dplyr broom

本文介绍了由dplyr中的多个组引导的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试以整洁的方式引导由多个变量分组的双变量相关性。到目前为止，我已经得到：

I'm trying to bootstrap a bivariate correlation grouped by multiple variables in a tidy fashion. So far I've got:

paks <- c('dplyr','tidyr','broom')
lapply(paks, require, character.only=TRUE)
set.seed(123)

df <- data.frame(
  rep(c('group1','group2','group3','group4'),25),
  rep(c('subgroup1','subgroup2','subgroup3','subgroup4'),25),
  rnorm(25),
  rnorm(25)
)
colnames(df) <- c('group','subgroup','v1','v2') 

cors_boot <- df %>%
  group_by(., group,subgroup) %>% 
  bootstrap(., 10) %>% 
  do(tidy(cor.test(.$v1,.$v2)))
cors_boot

这将成功执行10次复制，但不会维持 group_by 的条件。任何帮助将不胜感激。

This will succesffuly run 10 replications, but will not maintain the group_by conditions. Any help would be appreciated.

推荐答案

一种选择是利用嵌套的小块（使用 nest（）），并使用purrr包中的函数进行迭代。例如：

One option is to make use of nested tibbles (using nest() from tidyr) and iterating with functions from the purrr package. Here's an example:

df %>% 
  nest(-group, -subgroup) %>% 
  mutate(cors_boot = map(data, ~ bootstrap(., 10) %>% do(tidy(cor.test(.$v1,.$v2))))) %>% 
  unnest(cors_boot)
#> # A tibble: 40 × 11
#>     group  subgroup replicate   estimate statistic    p.value parameter
#>    <fctr>    <fctr>     <int>      <dbl>     <dbl>      <dbl>     <int>
#> 1  group1 subgroup1         1 0.30199080 1.5192285 0.14233305        23
#> 2  group1 subgroup1         2 0.24782068 1.2267744 0.23231801        23
#> 3  group1 subgroup1         3 0.05697887 0.2737057 0.78675375        23
#> 4  group1 subgroup1         4 0.14141925 0.6851084 0.50012255        23
#> 5  group1 subgroup1         5 0.14769543 0.7161768 0.48109119        23
#> 6  group1 subgroup1         6 0.23407050 1.1546390 0.26009439        23
#> 7  group1 subgroup1         7 0.09388988 0.4522780 0.65530564        23
#> 8  group1 subgroup1         8 0.38602977 2.0068956 0.05665478        23
#> 9  group1 subgroup1         9 0.20248790 0.9916399 0.33169177        23
#> 10 group1 subgroup1        10 0.27430083 1.3679706 0.18453909        23
#> # ... with 30 more rows, and 4 more variables: conf.low <dbl>,
#> #   conf.high <dbl>, method <fctr>, alternative <fctr>

请注意，除了还会加载purrr软件包之外，数据设置完全相同：

Note that data setup is all the same except the purrr package is also loaded:

paks <- c('dplyr','tidyr','broom','purrr')
lapply(paks, require, character.only=TRUE)
set.seed(123)

df <- data.frame(
  rep(c('group1','group2','group3','group4'),25),
  rep(c('subgroup1','subgroup2','subgroup3','subgroup4'),25),
  rnorm(25),
  rnorm(25)
)
colnames(df) <- c('group','subgroup','v1','v2')

此外，如果它们是您的新手，我在一些博客文章中也谈到了嵌套的小问题。例如，此处。

Aside, if they're new to you, I've written a little about nested tibbles in some blog posts. E.g., here.

这篇关于由dplyr中的多个组引导的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

由dplyr中的多个组引导 [英] bootstrapping by multiple groups in dplyr

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

由dplyr中的多个组引导 [英] bootstrapping by multiple groups in dplyr

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭