汇总具有不同功能的不同列 [英] Summarize different Columns with different Functions
问题描述
我遇到以下问题:在数据框中,我有很多行和列,第一行是日期。对于每个日期,我都有1个以上的观察值,并希望对其进行总结。
我的df看起来像是这样(为了便于使用,日期已替换为ID):
df:
ID现金价格权重...
1 0.4 0 0
1 0.2 0 82 ...
1 0 1 0 ...
1 0 3.2 80 ...
2 0.3 1 70 ...
... ... ...。 .. ...
我想将它们按第一列分组,然后用不同的功能:
函数Cash和Price应该是 sum ,这样我就得到了每个ID的Cash和Price的总和。权重上的函数应该是 max ,所以我只获得ID的最大权重。
因为我有太多列,所以我无法写手动完成所有功能,但我只有2列应由 max 总结,其余应由 sum 总结。
所以我正在寻找一个按ID分组的函数,用 sum 汇总所有内容,除了2个不同的列,我需要 max 值。
我尝试将 dplyr 软件包用于:
df%>%group_by(ID = tolower(ID))%>%summarise_each(funs(sum))
但是我需要加法运算,而不是求和,但最多要指定2个指定列,
要清楚,示例df的输出应为:
ID现金价格权重
1 0.6 4.2 82
2 0.3 1 70
我们可以使用
df%>%
group_by(ID)%>%
summarise(现金=总和(现金),价格=总和(价格),重量=最大(重量))
如果我们有很多列,一种方法是分别执行此操作,然后将输出<< c $ c>合并。
df1<-df%&%;%
group_by(ID)%>%
summarise_each(funs(sum),Cash:Price)
df2<-df%>%
group_by(ID)%&%;%
summarise_each(funs(max),Weight)
inner_join(df1,df2,by = ID )
#ID现金价格权重
#(int)(dbl)(dbl)(int)
#1 1 0.6 4.2 82
#2 2 0.3 1.0 70
I have the following Problem: In a data frame I have a lot of rows and columns with the first row being the date. For each date I have more than 1 observation and I want to summarize them.
My df looks like that (date replaced by ID for ease of use):
df:
ID Cash Price Weight ...
1 0.4 0 0
1 0.2 0 82 ...
1 0 1 0 ...
1 0 3.2 80 ...
2 0.3 1 70 ...
... ... ... ... ...
I want to group them by the first column and then summarize all rows BUT with different functions:
The function Cash and Price should be sum so I get the sum of Cash and Price for each ID. The function on Weight should be max so I only get the maximum weight for the ID.
Because I have so many columns I can not write a all functions by hand, but I have only 2 columns which should be summarized by max the rest should be summarized by sum.
So I am looking for a function to group by ID, summarize all with sum except 2 different columns which I need the max value.
I tried to use the dplyr package with:
df %>% group_by(ID = tolower(ID)) %>% summarise_each(funs(sum))
But I need the addition to not sum but max the 2 specified columns, any Ideas?
To be clear, the output of the example df should be:
ID Cash Price Weight
1 0.6 4.2 82
2 0.3 1 70
We can use
df %>%
group_by(ID) %>%
summarise(Cash = sum(Cash), Price = sum(Price), Weight = max(Weight))
If we have many columns, one way would be to do this separately and then join
the output together.
df1 <- df %>%
group_by(ID) %>%
summarise_each(funs(sum), Cash:Price)
df2 <- df %>%
group_by(ID) %>%
summarise_each(funs(max), Weight)
inner_join(df1, df2, by = "ID")
# ID Cash Price Weight
# (int) (dbl) (dbl) (int)
#1 1 0.6 4.2 82
#2 2 0.3 1.0 70
这篇关于汇总具有不同功能的不同列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!