汇总具有不同功能的不同列 [英] Summarize different Columns with different Functions

查看：70 发布时间：2020/10/16 21:41:25 r dataframe group-by dplyr statistics

本文介绍了汇总具有不同功能的不同列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我遇到以下问题：在数据框中，我有很多行和列，第一行是日期。对于每个日期，我都有1个以上的观察值，并希望对其进行总结。

我的df看起来像是这样（为了便于使用，日期已替换为ID）：

  df：
 ID现金价格权重... 
 1 0.4 0 0 
 1 0.2 0 82 ... 
 1 0 1 0 ... 
 1 0 3.2 80 ... 
 2 0.3 1 70 ... 
 ... ... ...。 .. ...

我想将它们按第一列分组，然后用不同的功能：

函数Cash和Price应该是 sum ，这样我就得到了每个ID的Cash和Price的总和。权重上的函数应该是 max ，所以我只获得ID的最大权重。

因为我有太多列，所以我无法写手动完成所有功能，但我只有2列应由 max 总结，其余应由 sum 总结。

所以我正在寻找一个按ID分组的函数，用 sum 汇总所有内容，除了2个不同的列，我需要 max 值。

我尝试将 dplyr 软件包用于：

  df％>％group_by（ID = tolower（ID））％>％summarise_each（funs（sum））

但是我需要加法运算，而不是求和，但最多要指定2个指定列，

要清楚，示例df的输出应为：

  ID现金价格权重
 1 0.6 4.2 82 
 2 0.3 1 70

解决方案

我们可以使用

  df％>％
 group_by（ID）％>％
 summarise（现金=总和（现金），价格=总和（价格），重量=最大（重量））

如果我们有很多列，一种方法是分别执行此操作，然后将输出<< c $ c>合并。

  df1<-df％&％;％
 group_by（ID）％>％
 summarise_each（funs（sum），Cash：Price）
 df2<-df％>％
 group_by（ID）％&％;％
 summarise_each（funs（max），Weight）
 inner_join（df1，df2，by = ID ）
＃ID现金价格权重
＃（int）（dbl）（dbl）（int）
＃1 1 0.6 4.2 82 
＃2 2 0.3 1.0 70

I have the following Problem: In a data frame I have a lot of rows and columns with the first row being the date. For each date I have more than 1 observation and I want to summarize them.

My df looks like that (date replaced by ID for ease of use):

df:
ID     Cash    Price    Weight   ...
1      0.4     0        0
1      0.2     0        82       ...
1      0       1        0        ...
1      0       3.2      80       ...
2      0.3     1        70       ...
...    ...     ...      ...      ...

I want to group them by the first column and then summarize all rows BUT with different functions:

The function Cash and Price should be sum so I get the sum of Cash and Price for each ID. The function on Weight should be max so I only get the maximum weight for the ID.

Because I have so many columns I can not write a all functions by hand, but I have only 2 columns which should be summarized by max the rest should be summarized by sum.

So I am looking for a function to group by ID, summarize all with sum except 2 different columns which I need the max value.

I tried to use the dplyr package with:

df %>% group_by(ID = tolower(ID)) %>% summarise_each(funs(sum))

But I need the addition to not sum but max the 2 specified columns, any Ideas?

To be clear, the output of the example df should be:

ID     Cash     Price    Weight
1       0.6        4.2       82     
2       0.3        1          70

解决方案

We can use

 df %>%
    group_by(ID) %>%
    summarise(Cash = sum(Cash), Price = sum(Price), Weight = max(Weight))

If we have many columns, one way would be to do this separately and then join the output together.

 df1 <- df %>% 
          group_by(ID) %>% 
          summarise_each(funs(sum), Cash:Price)
 df2 <- df %>%
          group_by(ID) %>% 
          summarise_each(funs(max), Weight)
 inner_join(df1, df2, by = "ID")
 #      ID  Cash Price Weight
 #   (int) (dbl) (dbl)  (int)
 #1     1   0.6   4.2     82
 #2     2   0.3   1.0     70

这篇关于汇总具有不同功能的不同列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

汇总具有不同功能的不同列 [英] Summarize different Columns with different Functions

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

汇总具有不同功能的不同列 [英] Summarize different Columns with different Functions

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭