dplyr的总和 [英] Rolling sum in dplyr
问题描述
set.seed(123)
df <- data.frame(x = sample(1:10, 20, replace = T), id = rep(1:2, each = 10))
对于每个 id
,我想创建一列,该列具有前5个 x
值的总和。
For each id
, I want to create a column which has the sum of previous 5 x
values.
df %>% group_by(id) %>% mutate(roll.sum = c(x[1:4], zoo::rollapply(x, 5, sum)))
# Groups: id [2]
x id roll.sum
<int> <int> <int>
3 1 3
8 1 8
5 1 5
9 1 9
10 1 10
1 1 36
6 1 39
9 1 40
6 1 41
5 1 37
10 2 10
5 2 5
7 2 7
6 2 6
2 2 2
9 2 39
3 2 32
1 2 28
4 2 25
10 2 29
第六行应为 35(3 + 8 + 5 + 9 + 10)
,第七行应该是 33(8 + 5 + 9 + 10 + 1)
,依此类推。
The 6th row should be 35 (3 + 8 + 5 + 9 + 10)
, the 7th row should be 33 (8 + 5 + 9 + 10 + 1)
and so on.
但是,上述功能还包括用于计算的行本身。我该如何解决?
However, the above function is also including the row itself for calculation. How can I fix it?
推荐答案
其中有 rollify
函数您可以使用的 tibbletime
软件包。您可以在以下小插图中了解它:滚动时间的滚动计算。
There is the rollify
function in the tibbletime
package that you could use. You can read about it in this vignette: Rolling calculations in tibbletime.
library(tibbletime)
library(dplyr)
rollig_sum <- rollify(.f = sum, window = 5)
df %>%
group_by(id) %>%
mutate(roll.sum = lag(rollig_sum(x))) #added lag() here
# A tibble: 20 x 3
# Groups: id [2]
# x id roll.sum
# <int> <int> <int>
# 1 3 1 NA
# 2 8 1 NA
# 3 5 1 NA
# 4 9 1 NA
# 5 10 1 NA
# 6 1 1 35
# 7 6 1 33
# 8 9 1 31
# 9 6 1 35
#10 5 1 32
#11 10 2 NA
#12 5 2 NA
#13 7 2 NA
#14 6 2 NA
#15 2 2 NA
#16 9 2 30
#17 3 2 29
#18 1 2 27
#19 4 2 21
#20 10 2 19
如果您希望 NA
s为其他值,则可以使用例如, if_else
If you want the NA
s to be some other value, you can use, for example, if_else
df %>%
group_by(id) %>%
mutate(roll.sum = lag(rollig_sum(x))) %>%
mutate(roll.sum = if_else(is.na(roll.sum), x, roll.sum))
这篇关于dplyr的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!