如何在R中按汇总分组? (如SQL) [英] How to do Group By Rollup in R? (Like SQL)

查看:110
本文介绍了如何在R中按汇总分组? (如SQL)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,我想要执行类似按汇总分组的操作,就像在SQL中对汇总值进行操作一样。

I have a dataset and I want to perform something like Group By Rollup like we have in SQL for aggregate values.

下面是一个可复制的例子。我知道聚合确实能很好地工作,如此处所述,但不是

Below is a reproducible example. I know aggregate works really well as explained here but not a satisfactory fit for my case.

year<- c('2016','2016','2016','2016','2017','2017','2017','2017')
month<- c('1','1','1','1','2','2','2','2')
region<- c('east','west','east','west','east','west','east','west')
sales<- c(100,200,300,400,200,400,600,800)
df<- data.frame(year,month,region,sales)
df


year month region sales
1 2016     1   east   100
2 2016     1   west   200
3 2016     1   east   300
4 2016     1   west   400
5 2017     2   east   200
6 2017     2   west   400
7 2017     2   east   600
8 2017     2   west   800

现在我要做的是聚合(按年-月-区域求和),然后在现有数据框
中添加新的聚合行,例如

now what I want to do is aggregation (sum- by year-month-region) and add the new aggregate row in the existing dataframe e.g. there should be two additional rows like below with a new name for region as 'USA' for the aggreagted rows

year month region sales
1 2016     1   east   400
2 2016     1   west   600
3 2016     1    USA  1000
4 2017     2   east   800
5 2017     2   west  1200
6 2017     2    USA  2000

我想出了一种方法(如下),但是我非常确定

I have figured out a way (below) but I am very sure that there exists an optimum solution for this OR a better workaround than mine

df1<- setNames(aggregate(df$sales, by=list(df$year,df$month, df$region), FUN=sum),
    c('year','month','region', 'sales'))


df2<- setNames(aggregate(df$sales, by=list(df$year,df$month), FUN=sum),
               c('year','month', 'sales'))

df2$region<- 'USA'                  ## added a new column- region- for total USA
df2<- df2[,  c('year','month','region', 'sales')]  ## reordering the columns of df2

df3<- rbind(df1,df2)

df3<- df3[order(df3$year,df3$month,df3$region),]  ## order by
rownames(df3)<- NULL  ## renumbered the rows after order by

df3

感谢支持!

推荐答案

融化 / < reshape2软件包中的code> dcast 可以汇总。运行 dcast 之后,我们使用将月份的列中的(all) 替换为月份动物园软件包中的na.locf

melt/dcast in the reshape2 package can do subtotalling. After running dcast we replace "(all)" in the month column with the month using na.locf from the zoo package:

library(reshape2)
library(zoo)

m <- melt(df, measure.vars = "sales")
dout <- dcast(m, year + month + region ~ variable, fun.aggregate = sum, margins = "month")

dout$month <- na.locf(replace(dout$month, dout$month  == "(all)", NA))

给予:

> dout
  year month region sales
1 2016     1   east   400
2 2016     1   west   600
3 2016     1  (all)  1000
4 2017     2   east   800
5 2017     2   west  1200
6 2017     2  (all)  2000

这篇关于如何在R中按汇总分组? (如SQL)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆