根据R中的行索引计算滚动总和 [英] calculate rolling sum based on row index in R

查看:107
本文介绍了根据R中的行索引计算滚动总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图根据窗口大小k计算分组滚动总和,但是如果组内行索引(n)小于k,我想使用条件k = min计算滚动总和(N,K)。



我的问题与此问题类似 R dplyr滚动总和,但我正在寻找一种解决方案,为每行提供非NA值。



我可以使用dplyr和(boo):

  library(zoo)
library(dplyr)
df< - data.frame( Date = rep(seq(as.Date(2000-01-01),
as.Date(2000-12-01),by =month),2),
ID = c(rep(1,12),rep(2,12)),value = 1)
df < - tbl_df(df)
df < - df%>%
group_by(ID)%>%
mutate(total3mo = rollersum(x = value,k = 3,align =right,fill =NA))

df
来源:本地数据框[24 x 4]
组:ID [2]

日期ID值tota3mo
(日期)(dbl)(dbl)( dbl)
1 2000-01-01 1 1 NA
2 2000-02-01 1 1 NA
3 2000-03-01 1 1 3
4 2000-04- 01 1 1 3
5 2000-05-01 1 1 3
6 2000-06-01 1 1 3
7 2000-07-01 1 1 3
8 2000-08-01 1 1 3
9 2000-09-01 1 1 3
10 2000-10-01 1 1 3
.. ... ... ... ...

在这种情况下,我想要返回值1作为2000-01-01的观测值,观测的值2返回2000-02-01。更一般地说,我希望滚动总和在最大窗口内计算,但不能大于k。

在这种特殊情况下,手动更改一些NA值并不难。但是,最终我想在我的数据框中添加更多的列,这些列将通过各种窗口计算滚动总和。在这种更一般的情况下,手动返回更改许多NA值将会非常繁琐。

使用 partial = TRUE rollapplyr 的参数:

  df%>%
group_by(ID)%>%
mutate(roll = rollapplyr(value,3,sum,partial = TRUE))%>%
ungroup()

或没有dplyr(仍然需要动物园):
$ (x,3,sum,partial = TRUE)
transform(df,roll = ave(value,ID,bf) ,FUN = roll))


I am trying to calculate a grouped rolling sum based on a window size k but, in the event that the within group row index (n) is less than k, I want to calculate the rolling sum using the condition k=min(n,k).

My issue is similar to this question R dplyr rolling sum but I am looking for a solution that provides a non-NA value for each row.

I can get part of the way there using dplyr and rollsum:

library(zoo)
library(dplyr)
df <- data.frame(Date=rep(seq(as.Date("2000-01-01"),
            as.Date("2000-12-01"),by="month"),2),
            ID=c(rep(1,12),rep(2,12)),value=1)
df <- tbl_df(df)
df <- df %>% 
        group_by(ID) %>%
        mutate(total3mo=rollsum(x=value,k=3,align="right",fill="NA"))

df
Source: local data frame [24 x 4]
Groups: ID [2]

     Date    ID value tota3mo
   (date) (dbl) (dbl)   (dbl)
1  2000-01-01     1     1      NA
2  2000-02-01     1     1      NA
3  2000-03-01     1     1       3
4  2000-04-01     1     1       3
5  2000-05-01     1     1       3
6  2000-06-01     1     1       3
7  2000-07-01     1     1       3
8  2000-08-01     1     1       3
9  2000-09-01     1     1       3
10 2000-10-01     1     1       3
..        ...   ...   ...     ...

In this case, what I would like is to return the value 1 for observations on 2000-01-01 and the value 2 for observations on 2000-02-01. More generally, I would like the rolling sum to be calculated over the largest window possible but no larger than k.

In this particular case it's not too difficult to change some NA values by hand. However, ultimately I would like to add several more columns to my data frame that will be rolling sums calculated over various windows. In this more general case it will get quite tedious to go back change many NA values by hand.

解决方案

Using the partial=TRUE argument of rollapplyr :

df %>%
   group_by(ID) %>%
   mutate(roll = rollapplyr(value, 3, sum, partial = TRUE)) %>%
   ungroup()

or without dplyr (still need zoo):

roll <- function(x) rollapplyr(x, 3, sum, partial = TRUE)
transform(df, roll = ave(value, ID, FUN = roll))

这篇关于根据R中的行索引计算滚动总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆