从年度数据转换为季度数据,限制为年度平均值 [英] Convert from annual to quarterly data, constrained to annual average

查看:34
本文介绍了从年度数据转换为季度数据,限制为年度平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 R 中有几个年度频率的变量,我想将它们包含在回归分析中,其他变量以季度频率可用.此外,我希望能够以重现原始年度数据的方式将季度数据转换回年度频率.

I have several variables at annual frequency in R that I would like to include in a regression analysis with other variables available at quarterly frequency. Additionally, I would like to be able to convert the quarterly data back to annual frequency in a way that reproduces the original annual data.

我目前将低频时间序列数据转换为高频时间序列数据的方法是使用 zoo 包中的 na.spline 函数.但是,我不知道如何约束季度数据以匹配相应的年度平均值.因此,当我将数据从季度频率转换回年度频率时,我得到的年度值与原始系列不同.

My current approach when converting from low frequency to high frequency time series data is to use the na.spline function in the zoo package. However, I don’t see how to constrain the quarterly data to match the corresponding annual average. As a result, when I convert the data back from quarterly to annual frequency, I get annual values that differ from the original series.

可重现的例子:

library(zoo)

# create annual example series
a <- as.numeric(c("100", "110", "111"))
b <- as.Date(c("2000-01-01", "2001-01-01", "2002-01-01"))
z_a <- zoo(a, b); z_a

# current approach using na.spline in zoo package
end_z <- as.Date(as.yearqtr(end(z_a))+ 3/4)
z_q <- na.spline(z_a, xout = seq(start(z_a), end_z, by = "quarter"), method = "hyman")

# result, with first quarter equal to annual value
c <- merge(z_a, z_q); c

# convert back to annual using aggregate in zoo package 
# At this point I would want both series to be equal, but they aren't. 
d <- aggregate(c, as.integer(format(index(c),"%Y")), mean, na.rm=TRUE); d

存储原始年度数据是一种解决方案,或者我可以通过将第一季度值作为年度值来转换回来.但是这两种方法都增加了复杂性,因为我需要跟踪我的哪个季度系列最初是从年度数据转换而来的.

Storing the original annual data is one solution, or I could convert back by taking the first quarter value as the annual values. But either approach adds complexity because I would need to keep track of which of my quarterly series had originally be converted from annual data.

我更喜欢 zoo 或 xts 包中的解决方案,但也欢迎其他建议.

I would prefer a solution within the zoo or xts packages, but alternative suggestions are also welcome.

编辑以包含方法 #1 由 G. Grothendieck 提出

Edited to include Approach #1 Proposed by G. Grothendieck

# Approach 1
yr <- format(time(c), "%Y")
c$z_q_adj <- ave(coredata(c$z_q), yr, FUN = function(x) x - mean(x) + x[1]); c

# simple plot
dat <- c%>%
data.frame(date=time(.), .) %>%
gather(variable, value, -date)
ggplot(data=dat, aes(x=date, y=value, group=variable, color=variable)) +
  geom_line() +
  geom_point() +
  theme(legend.position=c(.7, .4)) + 
  geom_point(data = subset(dat,variable == "z_a"),  colour="red", shape=1, size=7)

这是一个干净、有效的建议.但是,我对方法 1 的最初挑战是它有可能导致第 4 季度和第 1 季度之间出现跳跃(例如,如图所示,2001 年第 1 季度相对于上一季度).这些将意味着单个季度的快速增长.部分解决方案可能是从年度转换为月度,使用六月的年度值,然后使用样条曲线,然后应用 G. Grothendieck 提出的方法 1,然后转换为季度.

This is a clean, effective suggestion. However, the initial challenge I have with Approach 1 is that it has the potential to result in jump-offs between Q4 and Q1 (e.g. 2001Q1 relative to the prior quarter as shown in the plot). These would imply fast growth in a single quarter. Part of the solution may be to convert from annual to monthly, using the annual value for June, then spline, then apply Approach 1 as proposed by G. Grothendieck, and then convert to quarterly.

其他研究:

  • 我已经查看了 zoo 文档,并通过 r 中的频率转换讨论进行了广泛的搜索.也许我忽略了 na.approx 或 na.spline 中的参数?
  • 我查看了 cobs 包(受约束的 B 样条").也许它会起作用,但将值限制为特定系列的平均值的选项对我来说并不容易.如果这是最好的方法,我愿意投入更多时间来学习如何使用它.
  • 相关问题包括:
    • I've reviewed the documentation for zoo and searched extensively through frequency conversion discussions in r. Maybe there is an argument in na.approx or na.spline that I'm overlooking?
    • I've looked at the cobs package ("COnstrained B-Splines"). Maybe it would work, but the option to constrain values to average to a particular series is not readily apparent to me. I'm willing to invest more time to learn how to use it, if it's the best approach.
    • Related questions include:
      • https://stackoverflow.com/questions/26888433/spline-constraint
      • https://stackoverflow.com/questions/32577348/interpolating-annual-data-to-quarterly-with-tidyr

      推荐答案

      这里有点晚了,但是 tempdisagg 包可以满足您的需求.它确保所得高频序列的总和、平均值、第一个或最后一个值与低频序列一致.

      A bit late here, but the tempdisagg package does what you want. It ensures that either the sum, the average, the first or the last value of the resulting high frequency series is consistent with the low frequency series.

      它还允许您使用外部指标系列,例如通过 Chow-Lin 技术.如果您没有它,Denton-Cholette 方法会产生比 Eviews 中的方法更好的结果.

      It also allows you to use external indicator series, e.g., by the Chow-Lin technique. If you don't have it, the Denton-Cholette method produces a better result than the method in Eviews.

      这是您的示例:

      # need ts object as input
      z_a <- ts(c(100, 110, 111), start = 2000)
      
      library(tempdisagg)
      z_q <- predict(td(z_a ~ 1, method = "denton-cholette", conversion = "average"))
      
      z_q
      #           Qtr1      Qtr2      Qtr3      Qtr4
      # 2000  97.65795  98.59477 100.46841 103.27887
      # 2001 107.02614 109.71460 111.34423 111.91503
      # 2002 111.42702 111.06100 110.81699 110.69499
      
      # which has the same means as your original series:
      
      tapply(z_q, floor(time(z_q)), mean)
      # 2000 2001 2002 
      #  100  110  111 
      

      这篇关于从年度数据转换为季度数据,限制为年度平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆