R-使用dplyr的分组数据的总计(总和) [英] R - Aggregate (sum) totals for grouped data using dplyr

查看:1171
本文介绍了R-使用dplyr的分组数据的总计(总和)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的数据集,其中包含医院的名称,医院的类别以及按月列出的就诊患者数。我正在尝试使用dplyr创建一个摘要,其中包含按医院组汇总的每月在诊患者总数。数据框如下所示:

I have a large dataset containing the names of hospitals, the hospital groups and then the number of presenting patients by month. I'm trying to use dplyr to create a summary that contains the total number of presenting patients each month, aggregated by hospital group. The data frame looks like this:

Hospital | Hospital_group | Jan 03 | Feb 03 | Mar 03 | Apr 03 | .....
---------------------------------------------------------------
Hosp 1   | Group A        |    5   |    5   |    6   |    4   | .....
---------------------------------------------------------------
Hosp 2   | Group A        |    6   |    3   |    8   |    2   | .....
---------------------------------------------------------------
Hosp 3   | Group B        |    5   |    5   |    6   |    4   | .....
---------------------------------------------------------------
Hosp 4   | Group B        |    3   |    7   |    2   |    1   | .....
---------------------------------------------------------------

我正在尝试创建一个新的数据框,如下所示:

I'm trying to create a new dataframe that looks like this:

Hospital_group |Jan 03 | Feb 03 | Mar 03 | Apr 03 | .....
----------------------------------------------------------
Group A        |   11  |    8   |    14  |   6    | .....
----------------------------------------------------------
Group B        |   8   |    12  |     8  |   5    | .....
----------------------------------------------------------

我正在尝试使用dplyr汇总数据,但是有点卡住了(您可能已经猜到了这是一个非常新的想法)。我设法按医院组筛选出第一列(医院名称)和group_,但不确定如何获取每月和每年的累计总和(日期列很多,所以我希望在那里是一种快速简便的方法)。

I'm trying to use dplyr to summarise the data but am a little stuck (am very new at this as you might have guessed). I've managed to filter out the first column (hospital name) and group_by the hospital group but am not sure how to get a cumulative sum total for each month and year (there is a large number of date columns so I'm hoping there is a quick and easy way to do this).

很抱歉发布这样的基本问题-我们将不胜感激任何帮助或建议。

Sorry about posting such a basic question - any help or advice would be greatly appreciated.

格雷格

推荐答案

使用 summarize_all
示例:

Use summarize_all: Example:

df <- tibble(name=c("a","b", "a","b"), colA = c(1,2,3,4), colB=c(5,6,7,8))
df

# A tibble: 4 × 3
   name  colA  colB
  <chr> <dbl> <dbl>
1     a     1     5
2     b     2     6
3     a     3     7
4     b     4     8

df %>% group_by(name) %>% summarize_all(sum)

结果:

# A tibble: 2 × 3
   name  colA  colB
  <chr> <dbl> <dbl>
1     a     4    12
2     b     6    14

编辑:在您的情况下,您的数据框包含一个您不想汇总的列(医院名称。)您可能必须先取消选择医院名称列,或使用 summarize_at(vars(-Hospital),funs( sum))而不是 summarize_all

In your case, your data frame contains one column that you do not want to aggregate (the Hospital name.) You might have to either deselect the hospital name column first, or use summarize_at(vars(-Hospital), funs(sum)) instead of summarize_all.

这篇关于R-使用dplyr的分组数据的总计(总和)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆