按组汇总值,但保留原始数据 [英] Summarize values by group, but keep original data

查看:73
本文介绍了按组汇总值,但保留原始数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找出如何将因子<$ c归为类别 a和b的个值 $ c>文件,但也保留原始数据。

I am trying to figure out how to sum values belonging to category a and b by factor file, but also keep the original data.

library(dplyr)
df <- data.frame(ID = 1:20, values = runif(20), category = rep(letters[1:5], 4), file = as.factor(sort(rep(1:5, 4)))) 


   ID     values category file
1   1 0.65699229        a    1
2   2 0.70506478        b    1
3   3 0.45774178        c    1
4   4 0.71911225        d    1
5   5 0.93467225        e    1
6   6 0.25542882        a    2
7   7 0.46229282        b    2
8   8 0.94001452        c    2
9   9 0.97822643        d    2
10 10 0.11748736        e    2
11 11 0.47499708        a    3
12 12 0.56033275        b    3
13 13 0.90403139        c    3
14 14 0.13871017        d    3
15 15 0.98889173        e    3
16 16 0.94666823        a    4
17 17 0.08243756        b    4
18 18 0.51421178        c    4
19 19 0.39020347        d    4
20 20 0.90573813        e    4

,以便将


  • df [1,2] 添加到 df [2,2] 到文件1的类别'ab'

  • df [6,2] 将添加到 df [7,2] 到类别文件2的'ab'

  • 等。

  • df[1,2] will be added to df[2,2] to category 'ab' for file 1
  • df[6,2] will be added to df[7,2] to category 'ab' for file 2
  • etc.

到目前为止,我有:

df %>% 
    filter(category %in% c('a' , 'b')) %>%
    group_by(file) %>% 
    summarise(values = sum(values))



问题



我想将求和值的类别更改为 ab,并将其附加到同一管道中的原始数据帧中。

Problem

I would like to change the category of the summed values to "ab" and append it to the original data frame in the same pipeline.

所需输出

   ID     values category file
1   1 0.65699229        a    1
2   2 0.70506478        b    1
3   3 0.45774178        c    1
4   4 0.71911225        d    1
5   5 0.93467225        e    1
6   6 0.25542882        a    2
7   7 0.46229282        b    2
8   8 0.94001452        c    2
9   9 0.97822643        d    2
10 10 0.11748736        e    2
11 11 0.47499708        a    3
12 12 0.56033275        b    3
13 13 0.90403139        c    3
14 14 0.13871017        d    3
15 15 0.98889173        e    3
16 16 0.94666823        a    4
17 17 0.08243756        b    4
18 18 0.51421178        c    4
19 19 0.39020347        d    4
20 20 0.90573813        e    4
21 21 1.25486225       ab    1
22 22 1.87216325       ab    2
23 23 1.36548126       ab    3


推荐答案

这将为您提供结果

df %>% bind_rows(
  df %>% 
    filter(category %in% c('a' , 'b')) %>%
    group_by(file) %>% 
    mutate(values = sum(values), category = paste0(category,collapse='')) %>% 
    filter(row_number() == 1 & n() > 1)
) %>% mutate(ID = row_number())

BTW the代码示例中生成的数据帧就是这样的:

BTW the code pro produce the dataframe in the example is this one:

df <- data.frame(ID = 1:20, values = runif(20), category = rep(letters[1:5], 4), file = as.factor(sort(rep(1:4, 5)))) 

现在让您说要对多列求和,您需要在向量中提供列表:

now lets say you want to sum multiple columns, you need to provide the list in a vector:

cols = c("values") # columns to be sum

df %>% bind_rows(
  df %>% 
    filter(category %in% c('a' , 'b')) %>%
    group_by(file) %>% 
    mutate_at(vars(cols), sum) %>% 
    mutate(category = paste0(category,collapse='')) %>% 
    filter(row_number() == 1 & n() > 1)
) %>% mutate(ID = row_number())

这篇关于按组汇总值,但保留原始数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆