汇总具有不同功能的不同列 [英] Summarize different Columns with different Functions

查看:70
本文介绍了汇总具有不同功能的不同列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到以下问题:在数据框中,我有很多行和列,第一行是日期。对于每个日期,我都有1个以上的观察值,并希望对其进行总结。



我的df看起来像是这样(为了便于使用,日期已替换为ID):

  df:
ID现金价格权重...
1 0.4 0 0
1 0.2 0 82 ...
1 0 1 0 ...
1 0 3.2 80 ...
2 0.3 1 70 ...
... ... ...。 .. ...

我想将它们按第一列分组,然后用不同的功能:



函数Cash和Price应该是 sum ,这样我就得到了每个ID的Cash和Price的总和。权重上的函数应该是 max ,所以我只获得ID的最大权重。



因为我有太多列,所以我无法写手动完成所有功能,但我只有2列应由 max 总结,其余应由 sum 总结。



所以我正在寻找一个按ID分组的函数,用 sum 汇总所有内容,除了2个不同的列,我需要 max 值。



我尝试将 dplyr 软件包用于:

  df%>%group_by(ID = tolower(ID))%>%summarise_each(funs(sum))

但是我需要加法运算,而不是求和,但最多要指定2个指定列,

要清楚,示例df的输出应为:

  ID现金价格权重
1 0.6 4.2 82
2 0.3 1 70


解决方案

我们可以使用

  df%>%
group_by(ID)%>%
summarise(现金=总和(现金),价格=总和(价格),重量=最大(重量))

如果我们有很多列,一种方法是分别执行此操作,然后将输出<< c $ c>合并。

  df1<-df%&%;%
group_by(ID)%>%
summarise_each(funs(sum),Cash:Price)
df2<-df%>%
group_by(ID)%&%;%
summarise_each(funs(max),Weight)
inner_join(df1,df2,by = ID )
#ID现金价格权重
#(int)(dbl)(dbl)(int)
#1 1 0.6 4.2 82
#2 2 0.3 1.0 70


I have the following Problem: In a data frame I have a lot of rows and columns with the first row being the date. For each date I have more than 1 observation and I want to summarize them.

My df looks like that (date replaced by ID for ease of use):

df:
ID     Cash    Price    Weight   ...
1      0.4     0        0
1      0.2     0        82       ...
1      0       1        0        ...
1      0       3.2      80       ...
2      0.3     1        70       ...
...    ...     ...      ...      ...

I want to group them by the first column and then summarize all rows BUT with different functions:

The function Cash and Price should be sum so I get the sum of Cash and Price for each ID. The function on Weight should be max so I only get the maximum weight for the ID.

Because I have so many columns I can not write a all functions by hand, but I have only 2 columns which should be summarized by max the rest should be summarized by sum.

So I am looking for a function to group by ID, summarize all with sum except 2 different columns which I need the max value.

I tried to use the dplyr package with:

df %>% group_by(ID = tolower(ID)) %>% summarise_each(funs(sum))

But I need the addition to not sum but max the 2 specified columns, any Ideas?

To be clear, the output of the example df should be:

ID     Cash     Price    Weight
1       0.6        4.2       82     
2       0.3        1          70

解决方案

We can use

 df %>%
    group_by(ID) %>%
    summarise(Cash = sum(Cash), Price = sum(Price), Weight = max(Weight))

If we have many columns, one way would be to do this separately and then join the output together.

 df1 <- df %>% 
          group_by(ID) %>% 
          summarise_each(funs(sum), Cash:Price)
 df2 <- df %>%
          group_by(ID) %>% 
          summarise_each(funs(max), Weight)
 inner_join(df1, df2, by = "ID")
 #      ID  Cash Price Weight
 #   (int) (dbl) (dbl)  (int)
 #1     1   0.6   4.2     82
 #2     2   0.3   1.0     70

这篇关于汇总具有不同功能的不同列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆