不平衡时间序列上的滚动总和 [英] Rolling sum on an unbalanced time series

查看:84
本文介绍了不平衡时间序列上的滚动总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我每个类别都有一系列的年度事件计数,多年来没有类别未发生事件的行.我想添加一列,以显示每年在过去三年中发生了多少次事件.

I have a series of annual incident counts per category, with no rows for years in which the category did not see an incident. I would like to add a column that shows, for each year, how many incidents occurred in the previous three years.

处理此问题的一种方法是在零事件的所有年份中添加空行,然后对左对齐的四年窗口使用rollapply(),但这将使我的数据集扩展得比我想要的更多.当然有办法使用ddply()transform吗?

One way to handle this is to add empty rows for all years with zero incidents, then use rollapply() with a left-aligned four year window, but that would expand my data set more than I want to. Surely there's a way to use ddply() and transform for this?

以下两行代码构建一个虚拟数据集,然后按类别执行一个简单的plyr总和:

The following two lines of code build a dummy data set, then execute a simple plyr sum by category:

dat <- data.frame(
   category=c(rep('A',6), rep('B',6), rep('C',6)), 
   year=rep(c(2000,2001,2004,2005,2009, 2010),3), 
   incidents=rpois(18, 3)
   )

ddply(dat, .(category) , transform, i_per_c=sum(incidents) )

可以,但是只显示每个类别的总数.

That works, but it only shows a per-category total.

我想要一个与年份相关的总数.

I want a total that's year-dependent.

因此,我尝试使用function()语法扩展ddply()调用,如下所示:

So I try to expand the ddply() call with the function() syntax, like so:

ddply(dat, .(category) , transform, 
      function(x) i_per_c=sum(ifelse(x$year >= year - 4 & x$year < year,  x$incidents, 0) )
      )

这只是返回未经修改的原始数据帧.

This just returns the original data frame, unmodified.

我肯定缺少plyr语法中的某些内容,但我不知道它是什么.

I must be missing something in the plyr syntax, but I don't know what it is.

谢谢, 马特

推荐答案

这有点难看,但是可以.嵌套层调用:

This is sorta ugly, but it works. Nested ply calls:

ddply(dat, .(category), 
    function(datc) adply(datc, 1, 
         function(x) data.frame(run_incidents =
                                sum(subset(datc, year>(x$year-2) & year<=x$year)$incidents))))

可能有一种更简洁的方法来执行此操作,并且肯定有一些方法可以执行得更快.

There might be a slightly cleaner way to do it, and there are definitely ways that execute much faster.

这篇关于不平衡时间序列上的滚动总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆