在R年度时间序列数据中插入季度值 [英] interpolating in R yearly time series data with quarterly values

查看:610
本文介绍了在R年度时间序列数据中插入季度值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,其中包含ID,年份和收入列表。我试图将年度价值插值到季度价值。

I have a data set that has a list of IDs, year, and income. I am trying to interpolate the yearly values to quarterly values.

id = c(2, 2, 2, 3, 3, 3,4,4,4,5,5)
year = c(2000, 2001, 2002, 2000,2001,2002, 2000,2001,2002,2000,2002)
income = c(20, 24, 26, 30,34,36, 40,46,48,53,56)
df = data.frame(id, year, income)

例如,我正在寻求2000Q1,2000Q2,2000Q3,2000Q4,2001Q1,...,2001Q4的季度收入(插值)收入。因此,数据帧将是ID,季度,收入。收入将以内插收入为基础。

For e.g., I am looking to get the values of (interpolated) income for year-quarter 2000Q1, 2000Q2, 2000Q3, 2000Q4, 2001Q1, ... , 2001Q4. So the dataframe would be id,year-quarter, income. The income would be based on interpolated income.

我意识到线性插值时,趋势只能基于相应的ID。关于我如何在R中进行插值的任何建议?

I realize when linear interpolating, the trend must only be based on the respective IDs. Any suggestions on how I would do the interpolation in R?

推荐答案

这里是一个使用 dplyr

Here's an example using dplyr:

library(dplyr)

annual_data <- data.frame(
    person=c(1, 1, 1, 2, 2),
    year=c(2010, 2011, 2012, 2010, 2012),
    y=c(1, 2, 3, 1, 3)
    )

expand_data <- function(x) {
    years <- min(x$year):max(x$year)
    quarters <- 1:4
    grid <- expand.grid(quarter=quarters, year=years)
    x$quarter <- 1
    merged <- grid %>% left_join(x, by=c('year', 'quarter'))
    merged$person <- x$person[1]
    return(merged)
}

interpolate_data <- function(data) {
    xout <- 1:nrow(data)
    y <- data$y
    interpolation <- approx(x=xout[!is.na(y)], y=y[!is.na(y)], xout=xout)
    data$yhat <- interpolation$y
    return(data)
}

expand_and_interpolate <- function(x) interpolate_data(expand_data(x))

quarterly_data <- annual_data %>% group_by(person) %>% do(expand_and_interpolate(.))

print(as.data.frame(quarterly_data))

此方法的输出是:

   quarter year person  y yhat
1        1 2010      1  1 1.00
2        2 2010      1 NA 1.25
3        3 2010      1 NA 1.50
4        4 2010      1 NA 1.75
5        1 2011      1  2 2.00
6        2 2011      1 NA 2.25
7        3 2011      1 NA 2.50
8        4 2011      1 NA 2.75
9        1 2012      1  3 3.00
10       2 2012      1 NA   NA
11       3 2012      1 NA   NA
12       4 2012      1 NA   NA
13       1 2010      2  1 1.00
14       2 2010      2 NA 1.25
15       3 2010      2 NA 1.50
16       4 2010      2 NA 1.75
17       1 2011      2 NA 2.00
18       2 2011      2 NA 2.25
19       3 2011      2 NA 2.50
20       4 2011      2 NA 2.75
21       1 2012      2  3 3.00
22       2 2012      2 NA   NA
23       3 2012      2 NA   NA
24       4 2012      2 NA   NA

可能有很多方法来清理它。正在使用的主要功能是 expand.grid dplyr :: group_by 功能有点棘手。看看 zoo :: na.approx.default 的实现对于了解如何使用

There are probably a bunch of ways to clean this up. The key functions being used are expand.grid, approx, and dplyr::group_by. The approx function is a little tricky. Looking at the implementation of zoo::na.approx.default was quite helpful in figuring out how to work with approx.

这篇关于在R年度时间序列数据中插入季度值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆