dplyr 用于不规则时间序列的自定义滞后函数 [英] dplyr custom lag function for irregular time series

查看:18
本文介绍了dplyr 用于不规则时间序列的自定义滞后函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个不规则的时间序列,其中数据集中存在间隙.此外,数据被分组.我已经能够通过观察找到滞后的滞后函数(因此它们在数据集中找到先前的记录),但我想指定一个时间变量并通过匹配滞后时间来计算滞后.这个问题:R lag/lead不规则时间序列数据正在做一个类似的事情.但是,我无法使用 zoo 解决方案(我有某种包不兼容,根本无法使用 zoo)并且未能成功制作 data.table 将解决方案转换为足够灵活的东西,可以用作具有滞后量作为输入和分组数据容量的函数.

I have an irregular time series, where there are gaps in the dataset. Further, the data is grouped. The lag functions I have been able to find lag by observation (so they find the prior record in the dataset), but I want to specify a time variable and have the lag calculated by matching the lagged time. This question: R lag/lead irregular time series data is doing a similar thing. However, I can't use zoo solution (I have some sort of package incompatibility and can't use zoo at all) and have been unsuccessful in making the data.table solution into something sufficiently flexible to use as a function with lag amount as an input and the capacity for grouped data.

测试数据:

testdf <- data.frame(group = c(1,1,1,1,1,2,2,2,2,2),
                 counter = c(1,2,3,5,6,7,8,9,11,12),
                 xval = seq(100, 1000, 100))
lagamount <- 1

输出应该是向量:NA 100 200 NA 400 NA 600 700 NA 900

The output should be the vector: NA 100 200 NA 400 NA 600 700 NA 900

这是我目前正在使用的:

This is what I am using at the moment:

library(dplyr)
testout <- group_by(testdf, group) %>%
  mutate(testout = function(x) which((testdf$counter - x) == lagamount))

这给了我一个数据类型错误,即某些东西(未指定)不是向量.

This gives me a datatype error that something (unspecified) is not a vector.

有没有办法让这个建筑工作?或者,我怎么能滞后于分组变量的不规则时间序列?

Is there a way to make this construction work? Alternatively, how could I lag with irregular time series with grouped variables?

推荐答案

dplyr 中做到这一点的唯一方法是不使用 do,首先使隐式缺失值显式化,然后将其过滤掉.

The only way to do this within dplyr, whithout resorting to using do, would be to first make implicit missing values explicit, and filter them out afterwards.

提供一个向量进行变异,并使用 ifelse(或者可能是新的 dplyr::if_else)来检查延迟是否是您想要的.示例:

Supply a vector to mutate, and use ifelse (or perhaps the new dplyr::if_else) to check whether the lag is what you want it to be. Example:

library(tidyr)
lagamount <- 2

testout <- group_by(testdf, group) %>%
  complete(group, counter = min(counter):max(counter)) %>% 
  mutate(testout = if_else(counter - lag(counter, lagamount) == lagamount, 
                           lag(xval, lagamount), 
                           NA_real_)) %>% 
  filter(!is.na(xval))

生产:

Source: local data frame [10 x 4]
Groups: group [2]

   group counter  xval testout
   <dbl>   <dbl> <dbl>   <dbl>
1      1       1   100      NA
2      1       2   200      NA
3      1       3   300     100
4      1       5   400     300
5      1       6   500      NA
6      2       7   600      NA
7      2       8   700      NA
8      2       9   800     600
9      2      11   900     800
10     2      12  1000      NA

这篇关于dplyr 用于不规则时间序列的自定义滞后函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆