不平衡时间序列上的滚动总和 [英] Rolling sum on an unbalanced time series
问题描述
我每个类别都有一系列的年度事件计数,多年来没有类别未发生事件的行.我想添加一列,以显示每年在过去三年中发生了多少次事件.
I have a series of annual incident counts per category, with no rows for years in which the category did not see an incident. I would like to add a column that shows, for each year, how many incidents occurred in the previous three years.
处理此问题的一种方法是在零事件的所有年份中添加空行,然后对左对齐的四年窗口使用rollapply()
,但这将使我的数据集扩展得比我想要的更多.当然有办法使用ddply()
和transform
吗?
One way to handle this is to add empty rows for all years with zero incidents, then use rollapply()
with a left-aligned four year window, but that would expand my data set more than I want to. Surely there's a way to use ddply()
and transform
for this?
以下两行代码构建一个虚拟数据集,然后按类别执行一个简单的plyr
总和:
The following two lines of code build a dummy data set, then execute a simple plyr
sum by category:
dat <- data.frame(
category=c(rep('A',6), rep('B',6), rep('C',6)),
year=rep(c(2000,2001,2004,2005,2009, 2010),3),
incidents=rpois(18, 3)
)
ddply(dat, .(category) , transform, i_per_c=sum(incidents) )
可以,但是只显示每个类别的总数.
That works, but it only shows a per-category total.
我想要一个与年份相关的总数.
I want a total that's year-dependent.
因此,我尝试使用function()
语法扩展ddply()
调用,如下所示:
So I try to expand the ddply()
call with the function()
syntax, like so:
ddply(dat, .(category) , transform,
function(x) i_per_c=sum(ifelse(x$year >= year - 4 & x$year < year, x$incidents, 0) )
)
这只是返回未经修改的原始数据帧.
This just returns the original data frame, unmodified.
我肯定缺少plyr
语法中的某些内容,但我不知道它是什么.
I must be missing something in the plyr
syntax, but I don't know what it is.
谢谢, 马特
推荐答案
这有点难看,但是可以.嵌套层调用:
This is sorta ugly, but it works. Nested ply calls:
ddply(dat, .(category),
function(datc) adply(datc, 1,
function(x) data.frame(run_incidents =
sum(subset(datc, year>(x$year-2) & year<=x$year)$incidents))))
可能有一种更简洁的方法来执行此操作,并且肯定有一些方法可以执行得更快.
There might be a slightly cleaner way to do it, and there are definitely ways that execute much faster.
这篇关于不平衡时间序列上的滚动总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!