dplyr的总和 [英] Rolling sum in dplyr

查看:80
本文介绍了dplyr的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

set.seed(123)

df <- data.frame(x = sample(1:10, 20, replace = T), id = rep(1:2, each = 10))

对于每个 id ,我想创建一列,该列具有前5个 x 值的总和。

For each id, I want to create a column which has the sum of previous 5 x values.

df %>% group_by(id) %>% mutate(roll.sum = c(x[1:4], zoo::rollapply(x, 5, sum)))
# Groups:   id [2]
  x    id roll.sum
<int> <int>    <int>
 3     1        3
 8     1        8
 5     1        5
 9     1        9
10     1       10
 1     1       36
 6     1       39
 9     1       40
 6     1       41
 5     1       37
10     2       10
 5     2        5
 7     2        7
 6     2        6
 2     2        2
 9     2       39
 3     2       32
 1     2       28
 4     2       25
10     2       29

第六行应为 35(3 + 8 + 5 + 9 + 10),第七行应该是 33(8 + 5 + 9 + 10 + 1),依此类推。

The 6th row should be 35 (3 + 8 + 5 + 9 + 10), the 7th row should be 33 (8 + 5 + 9 + 10 + 1) and so on.

但是,上述功能还包括用于计算的行本身。我该如何解决?

However, the above function is also including the row itself for calculation. How can I fix it?

推荐答案

其中有 rollify 函数您可以使用的 tibbletime 软件包。您可以在以下小插图中了解它:滚动时间的滚动计算

There is the rollify function in the tibbletime package that you could use. You can read about it in this vignette: Rolling calculations in tibbletime.

library(tibbletime)
library(dplyr)
rollig_sum <- rollify(.f = sum, window = 5)

df %>% 
  group_by(id) %>% 
  mutate(roll.sum = lag(rollig_sum(x))) #added lag() here
# A tibble: 20 x 3
# Groups:   id [2]
#       x    id roll.sum
#   <int> <int>    <int>
# 1     3     1       NA
# 2     8     1       NA
# 3     5     1       NA
# 4     9     1       NA
# 5    10     1       NA
# 6     1     1       35
# 7     6     1       33
# 8     9     1       31
# 9     6     1       35
#10     5     1       32
#11    10     2       NA
#12     5     2       NA
#13     7     2       NA
#14     6     2       NA
#15     2     2       NA
#16     9     2       30
#17     3     2       29
#18     1     2       27
#19     4     2       21
#20    10     2       19






如果您希望 NA s为其他值,则可以使用例如, if_else


If you want the NAs to be some other value, you can use, for example, if_else

df %>% 
  group_by(id) %>% 
  mutate(roll.sum = lag(rollig_sum(x))) %>%
  mutate(roll.sum = if_else(is.na(roll.sum), x, roll.sum))

这篇关于dplyr的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆