在dplyr中，每组都有一个cumsum [英] r cumsum per group in dplyr

查看：101 发布时间：2018/4/24 20:36:19 r ggplot2 dplyr

本文介绍了在dplyr中，每组都有一个cumsum的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我开始享受 dplyr ，但我陷入了一个用例。我希望能够在包中使用数据框中的每个组应用 cumsum ，但是我似乎无法做到。

对于演示数据框，我生成了以下数据：

  set.seed（123）
 
 len = 10 
 dates = as.Date（'2014-01-01'）+ 1：len 
 grp_a = data.frame（日期=日期，组='A'，销售= rnorm（len））
 grp_b = data.frame（日期=日期，group ='B'，销售= rnorm（len））
 grp_c = data.frame（日期=日期，组='C'，销售= rnorm（len））
 df = rbind（grp_a，grp_b，grp_c）

这会创建一个如下所示的数据框：

  dates group sales 
 1 2014-01-02 A -0.56047565 
 2 2014-01-03 A -0.23017749 
 3 2014-01-04 A 1.55870831 
 4 2014-01-05 A 0.07050839 
 5 2014-01-06 A 0.12928774 
 6 2014-01-02 B 1.71506499 
 7 2014-01-03 B 0.46091621 
 8 2014-01-04 B -1.26506123 
 9 2014-01-05 B -0.68685285 
 10 2014-01-06 B -0.44566197 
 11 2014-01-02 C 1.22408180 
 12 2014-01-03 C 0.35981383 
 13 2014-01-04 C 0.40077145 
 14 2014-01-05 C 0.11068272 
 15 2014-01-06 C -0.55584113

然后我创建一个用于绘图的数据框，但是用一个for循环来代替更干净的东西。

  pdf = data.frame（dates = as.Date（as.character（）），group = as.character（） ，销售= as.numeric（））
 for（grp in unique（df $ group））{
 subs = filter（df，group == grp）％>％排列（日期）
 pdf = rbind（pdf，data.frame（dates = subs $ dates，group = grp，sales = cumsum（subs $ sales）））
} 
  pre> 
 
 我用这个 pdf 创建一个图。 
  p = ggplot（）
p = p + geom_line（data = pdf，aes（dates，sales，color =组））
p + ggtitle（每组销售额）
  
  
 
 
 有没有更好的方法dplyr方法）来创建这个数据帧？我查看了 summarize 方法，但是这似乎是从N个项目 - > 1个项目中汇总一个组。这个用例似乎目前打破了我的dplyr流程。任何建议，以更好地处理这个？  
 
解决方案
啊。摆弄后我似乎找到了它。  
 
 
  pdf = df％>％group_by（group）％>％排列（日期）％>％mutate（cs = cumsum（sales））
  
 
 
 
使用forloop进行输出：
 
 
 
 > pdf = data.frame（dates = as.Date（as.character（）），group = as.character（），sales = as.numeric（））
> for（grp in unique（df $ group））{
 + subs = filter（df，group == grp）％>％arrange（dates）
 + pdf = rbind（pdf，data.frame （日期= subs $日期，group = grp，sales = subs $ sales，cs = cumsum（subs $销售）））
 +} 
> pdf 
日期团体销售额cs 
 1 2014-01-02 A -0.56047565 -0.5604756 
 2 2014-01-03 A -0.23017749 -0.7906531 
 3 2014-01-04 A 1.55870831 0.7680552 
 4 2014-01-05 A 0.07050839 0.8385636 
 5 2014-01-06 A 0.12928774 0.9678513 
 6 2014-01-02 B 1.71506499 1.7150650 
 7 2014-01- 03 B 0.46091621 2.1759812 
 8 2014-01-04 B -1.26506123 0.9109200 
 9 2014-01-05 B -0.68685285 0.2240671 
 10 2014-01-06 B -0.44566197 -0.2215949 
 11 2014-01-02 C 1.22408180 1.2240818 
 12 2014-01-03 C 0.35981383 1.5838956 
 13 2014-01-04 C 0.40077145 1.9846671 
 14 2014-01-05 C 0.11068272 2.0953498 
 15 2014-01-06 C -0.55584113 1.5395087 
  
 
 
 
输出这行代码：
 
 
 
 > pdf = df％>％group_by（group）％>％mutate（cs = cumsum（sales））
> pdf 
来源：本地资料框[15 x 4] 
团体：团体
 
日期团体销售额cs 
 1 2014-01-02 A -0.56047565 -0.5604756 
 2 2014-01-03 A -0.23017749 -0.7906531 
 3 2014-01-04 A 1.55870831 0.7680552 
 4 2014-01-05 A 0.07050839 0.8385636 
 5 2014-01-06 A 0.12928774 0.9678513 
 6 2014-01-02 B 1.71506499 1.7150650 
 7 2014-01-03 B 0.46091621 2.1759812 
 8 2014-01-04 B -1.26506123 0.9109200 
 9 2014- 01-05 B -0.68685285 0.2240671 
 10 2014-01-06 B -0.44566197 -0.2215949 
 11 2014-01-02 C 1.22408180 1.2240818 
 12 2014-01-03 C 0.35981383 1.5838956 
 13 2014-01-04 C 0.40077145 1.9846671 
 14 2014-01-05 C 0.11068272 2.0953498 
 15 2014-01-06 C -0.55584113 1.5395087 
  
 
I am starting to enjoy dplyr but I got stuck on a use case. I want to be able to apply cumsum per group in a dataframe with the package but I can't seem to get it right. 

For a demo dataframe I've generated the following data:
set.seed(123)

len = 10 
dates = as.Date('2014-01-01') + 1:len
grp_a = data.frame(dates=dates, group='A', sales=rnorm(len))
grp_b = data.frame(dates=dates, group='B', sales=rnorm(len))
grp_c = data.frame(dates=dates, group='C', sales=rnorm(len))
df = rbind(grp_a, grp_b, grp_c)
This creates a dataframe that looks like: 
        dates group       sales
1  2014-01-02     A -0.56047565
2  2014-01-03     A -0.23017749
3  2014-01-04     A  1.55870831
4  2014-01-05     A  0.07050839
5  2014-01-06     A  0.12928774
6  2014-01-02     B  1.71506499
7  2014-01-03     B  0.46091621
8  2014-01-04     B -1.26506123
9  2014-01-05     B -0.68685285
10 2014-01-06     B -0.44566197
11 2014-01-02     C  1.22408180
12 2014-01-03     C  0.35981383
13 2014-01-04     C  0.40077145
14 2014-01-05     C  0.11068272
15 2014-01-06     C -0.55584113
I then go on to create a dataframe for plotting, but with a for loop that I'd like to replace with something cleaner. 
pdf = data.frame(dates=as.Date(as.character()), group=as.character(), sales=as.numeric())
for(grp in unique(df$group)){
  subs = filter(df, group == grp) %>% arrange(dates)
  pdf = rbind(pdf, data.frame(dates=subs$dates, group=grp, sales=cumsum(subs$sales)))
}
I use this pdf to create a plot. 
p = ggplot() 
p = p + geom_line(data=pdf, aes(dates, sales, colour=group))
p + ggtitle("sales per group")


Is there a better way (a way by using the dplyr methods) to create this dataframe? I've looked at the summarize method but this seems to aggregate a group from N items -> 1 item. This use case seems to break my dplyr flow at the moment. Any suggestions to better approach this? 
 解决方案 
Ah. After fiddling around I seem to have found it. 
pdf = df %>% group_by(group) %>% arrange(dates) %>% mutate(cs = cumsum(sales))


Output with forloop in question:

> pdf = data.frame(dates=as.Date(as.character()), group=as.character(), sales=as.numeric())
> for(grp in unique(df$group)){
+   subs = filter(df, group == grp) %>% arrange(dates)
+   pdf = rbind(pdf, data.frame(dates=subs$dates, group=grp, sales=subs$sales, cs=cumsum(subs$sales)))
+ }
> pdf
        dates group       sales         cs
1  2014-01-02     A -0.56047565 -0.5604756
2  2014-01-03     A -0.23017749 -0.7906531
3  2014-01-04     A  1.55870831  0.7680552
4  2014-01-05     A  0.07050839  0.8385636
5  2014-01-06     A  0.12928774  0.9678513
6  2014-01-02     B  1.71506499  1.7150650
7  2014-01-03     B  0.46091621  2.1759812
8  2014-01-04     B -1.26506123  0.9109200
9  2014-01-05     B -0.68685285  0.2240671
10 2014-01-06     B -0.44566197 -0.2215949
11 2014-01-02     C  1.22408180  1.2240818
12 2014-01-03     C  0.35981383  1.5838956
13 2014-01-04     C  0.40077145  1.9846671
14 2014-01-05     C  0.11068272  2.0953498
15 2014-01-06     C -0.55584113  1.5395087


Output with this line of code:

> pdf = df %>% group_by(group) %>% mutate(cs = cumsum(sales))
> pdf
Source: local data frame [15 x 4]
Groups: group

        dates group       sales         cs
1  2014-01-02     A -0.56047565 -0.5604756
2  2014-01-03     A -0.23017749 -0.7906531
3  2014-01-04     A  1.55870831  0.7680552
4  2014-01-05     A  0.07050839  0.8385636
5  2014-01-06     A  0.12928774  0.9678513
6  2014-01-02     B  1.71506499  1.7150650
7  2014-01-03     B  0.46091621  2.1759812
8  2014-01-04     B -1.26506123  0.9109200
9  2014-01-05     B -0.68685285  0.2240671
10 2014-01-06     B -0.44566197 -0.2215949
11 2014-01-02     C  1.22408180  1.2240818
12 2014-01-03     C  0.35981383  1.5838956
13 2014-01-04     C  0.40077145  1.9846671
14 2014-01-05     C  0.11068272  2.0953498
15 2014-01-06     C -0.55584113  1.5395087


                        
这篇关于在dplyr中，每组都有一个cumsum的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在dplyr中，每组都有一个cumsum [英] r cumsum per group in dplyr

问题描述

使用forloop进行输出：

输出这行代码：

Output with forloop in question:

Output with this line of code:

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在dplyr中，每组都有一个cumsum [英] r cumsum per group in dplyr

问题描述

使用forloop进行输出：

输出这行代码：

Output with forloop in question:

Output with this line of code:

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭