按两列分组并汇总多列 [英] Group by two column and summarize multiple columns

查看:40
本文介绍了按两列分组并汇总多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,我想按State"列进行分组和日期"然后像这样总结其他列的值.

I have a data frame and I would like to group by the column "State" and "Date" and then summarize the values of the other columns something like this.

df

State  Female  Male   Date
------------------------------
Texas  2       2     01/01/04
Texas  3        1     01/01/04
Texas  5        4     02/01/04
Cali   1        1     05/06/05
Cali   2        1     05/06/05
Cali   3         1    10/06/05
Cali   1         2     10/06/05
NY    10         5    11/06/05
NY    11         6    12/06/05

预期结果

df

State  Female  Male   Date
------------------------------
Texas  5       3     01/01/04
Texas  5        4     02/01/04
Cali   3        2     05/06/05
Cali   4         3    10/06/05
NY    10         5    11/06/05
NY    11         6    12/06/05

我尝试使用 group by 并进行汇总,但我不知道我对 2 列执行相同操作的方式

I tried with group by and summarize but I don´t exactly how con I do the same for 2 columns

我的尝试

df <- df_homicides %>% 
        group_by(state) %>% 
        summarise(Female = sum(Female))

``
Thanks for your help!

推荐答案

我们可以使用 summariseacrossdplyr 版本 >= 1.00

We can use summarise with across from dplyr version > = 1.00

library(dplyr)
df %>%
   group_by(State, Date) %>%
   summarise(across(everything(), sum, na.rm = TRUE), .groups = 'drop')
# A tibble: 6 x 4
#  State Date       Female  Male
#  <chr> <chr>       <int> <int>
#1 Cali  05/06/2005      3     2
#2 Cali  10/06/2005      4     3
#3 NY    11/06/2005     10     5
#4 NY    12/06/2005     11     6
#5 Texas 01/01/2004      5     3
#6 Texas 02/01/2004      5     4


或者使用 base R

aggregate(.~ State + Date, df, sum, na.rm = TRUE)

数据

df <-  structure(list(State = c("Texas", "Texas", "Texas", "Cali", "Cali", 
"Cali", "Cali", "NY", "NY"), Female = c(2L, 3L, 5L, 1L, 2L, 3L, 
1L, 10L, 11L), Male = c(2L, 1L, 4L, 1L, 1L, 1L, 2L, 5L, 6L), 
    Date = c("01/01/2004", "01/01/2004", "02/01/2004", "05/06/2005", 
    "05/06/2005", "10/06/2005", "10/06/2005", "11/06/2005", "12/06/2005"
    )), class = "data.frame", row.names = c(NA, -9L))

这篇关于按两列分组并汇总多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆