(R,dplyr)选择多个以相同字符串开头的列,并按组汇总均值(90%CI) [英] (R, dplyr) select multiple columns starts with same string and summarise mean (90% CI) by group

查看:64
本文介绍了(R,dplyr)选择多个以相同字符串开头的列,并按组汇总均值(90%CI)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是tidyverse的新手,从概念上讲,我想计算平均值,所有列的90%CI均以"ab"开头,并按"case"分组.尝试了许多方法,但似乎都没有用,我的实际数据有很多列,因此明确列出它们不是一种选择.

I am new to tidyverse, conceptually I would like to calculate mean and 90% CI of all columns starts with "ab", grouped by "case". Tried many ways but none seem to work, my actual data has many columns so explicitly list them out is not an option.

library(tidyverse)

dat <- tibble(case= c("case1", "case1", "case2", "case2", "case3"), 
              abc = c(1, 2, 3, 1, 2), 
              abe = c(1, 3, 2, 3, 4), 
              bca = c(1, 6, 3, 8, 9))

下面的代码是我在概念上想做的,但是显然不起作用

dat %>% group_by(`case`) %>% 
  summarise(mean=mean(select(starts_with("ab"))), 
            qt=quantile(select(starts_with("ab"), prob=c(0.05, 0.95))))

我想要的是下面的东西

case abc_mean abe_mean abc_lb abc_ub abe_lb abe_ub

  <chr>    <dbl>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 case1      1.5      2.0   1.05   1.95   1.10   2.90
2 case2      2.0      2.5   1.10   2.90   2.05   2.95
3 case3      2.0      4.0   2.00   2.00   4.00   4.00

推荐答案

您非常接近,只需将 select 移到 summary 之前即可.然后,我们使用 summarise_all ,并在 funs 中指定适当的功能.

You were very close, just move that select before the summarise. We then use summarise_all, and specify the appropriate functions within funs.

dat %>%
    group_by(case) %>%
    select(starts_with('ab')) %>%
    summarise_all(funs('mean' = mean, 'ub' = quantile(., .95), 'lb' = quantile(., .05)))

# # A tibble: 3 x 7
#    case abc_mean abe_mean abc_ub abe_ub abc_lb abe_lb
#   <chr>    <dbl>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
# 1 case1      1.5      2.0   1.95   2.90   1.05   1.10
# 2 case2      2.0      2.5   2.90   2.95   1.10   2.05
# 3 case3      2.0      4.0   2.00   4.00   2.00   4.00

我们使用 summarise_all 而不是 summarise ,因为我们希望对 multiple 列执行相同的操作.使用 summarise_all 而不是 summarise 调用需要更少的键入操作,在该调用中我们分别指定每一列和每个操作.

We use summarise_all instead of summarise because we wish to perform the same operations on multiple columns. It requires far less typing to use summarise_all instead of a summarise call in which we specify each column and each operation separately.

这篇关于(R,dplyr)选择多个以相同字符串开头的列,并按组汇总均值(90%CI)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆