Excel 或 R:从多个来源准备时间序列? [英] Excel or R: Preparing time series from multiple sources?

查看:28
本文介绍了Excel 或 R:从多个来源准备时间序列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,我经常不得不在同一个分析中处理来自多个 .csv 源的时间序列数据.为简单起见,我们假设所有系列都是常规的季度系列(中间没有缺失值).通常,原始 .csv 数据包含一个日期列和 1-3 个变量.不幸的是,该系列在 .csv 文件中的长度不同.

Lately I often had to handle time series data from multiple .csv sources in the same analysis. Let's assume for simplicity that all series are regular quarterly series (no missing values in between). Typically the original .csv data contains a date column plus 1-3 variables. Unfortunately the series are not of equal length across .csv files.

我开始在 R 中组织我的数据集,结果一团糟,其中包含许多 window() 命令.另外,在将 NA 和原始系列转换为 ts() 对象之前,我必须将它们连接起来,因为我发现连接(多变量)ts() 对象非常违反直觉.请注意,我添加 NA 的原因是我希望所有系列的长度都相同.当然,我可以修剪较长的,但是如果不使用较短的系列,我最终会失去观察结果.

I started to organize my dataset in R and ended up with a big mess containing lots of window()commands. Plus I had to concatenate NAs and original series before turning them into ts()objects because I found concatenating (multivariate) ts()objects so counter-intuitive. Note that the reason why I added NAs is that I wanted to all series to be of the same length. Of course I could have trimmed the longer ones, but then I´d eventually loose observations when not using shorter series.

我想写一个函数来读取 .csv 文件并使用它的日期列来创建 ts() 对象,也许与另一个函数合并所有单个系列以创建一个包含 NA 的多元系列,当数据丢失.我发现自己一直在切换数据类型,阅读 ts 和 zoo 手册 - 我简直不敢相信它有那么复杂.

I thought about writing a function that reads .csv files and uses it's date column to create ts()objects and maybe with another function merge all the single series to create a multivariate series containing NAs when data is missing. I found myself switching data types all the time, reading through the ts and zoo manuals – i just could not believe it was that complex.

我真的以为这个问题真的很普遍,想了想在excel中的准备..我的意思是我真的很讨厌excel,但是这次我想知道更有经验的用户会做什么?R 还是 Excel?

I really thought this problem is really common and thought about the preparations in excel.. I mean I really hate excel, but this time I wonder what more experienced useRs do? R or Excel?

添加了一些示例性数据(需要汇总每日数据)文件 1:

added some exemplary data (need to aggregate daily data) file1:

27.05.11;5965.95
26.05.11;5947.06
25.05.11;5942.82
24.05.11;5939.98

file2(没有日期列,但我知道开始和频率)

file2 (without date col, but i know start and frequency)

Germany;Switzerland;USA;OECDEurope
69,90974;61,8241;55,60966;64,96157
67,0394;62,18966;56,47361;64,15152
70,56651;63,6347;56,87237;65,43568

文件 3:

1984-04-01,33.3238396624473
1984-07-01,63.579833082501
1984-10-01,35.8375401560349

我承认示例数据确实有助于说明问题,但它是一种最佳实践类型的问题,适用于比我更有经验的用户.您如何为多元 ts 分析准备数据?

I admit exemplary data does help to illustrate the question, but it`s rather a best practice type of question adressing more experienced users than myself. How do you prepare your data for multivariate ts analysis ?

推荐答案

我一直在 R 中这样做.您可能会发现在 Excel 中这样做更容易,但如果您的数据发生变化,则必须再次执行相同的过程.使用 R 可以更轻松地更新和重现结果.

I do this in R all the time. You may find it easier to do in Excel but if your data change, you have to do the same process again. Using R makes it much easier to update and reproduce your results.

使用zoo 的yearmonyearqtr 索引类分别使处理每月或每季度的频率变得更加容易.一旦您将数据放入带有 yearqtr 索引的 zoo 对象中,您所要做的就是合并所有对象.

Dealing with monthly or quarterly frequencies are made significantly easier with zoo's yearmon and yearqtr index classes, respectively. Once you have your data in zoo objects with yearqtr indexes, all you have to do is merge all the objects.

这是您的示例数据:

Lines1 <-
"27.05.11;5965.95
26.05.11;5947.06
25.05.11;5942.82
24.05.11;5939.98"
f1 <- read.csv2(con <- textConnection(Lines1), header=FALSE)
close(con)

Lines2 <-
"Germany;Switzerland;USA;OECDEurope
69,90974;61,8241;55,60966;64,96157
67,0394;62,18966;56,47361;64,15152
70,56651;63,6347;56,87237;65,43568"
f2 <- read.csv2(con <- textConnection(Lines2), header=TRUE)
close(con)

Lines3 <-
"1984-04-01,33.3238396624473
1984-07-01,63.579833082501
1984-10-01,35.8375401560349"
f3 <- read.csv(con <- textConnection(Lines3), header=FALSE)
close(con)

下面的示例假设第一个文件的开始日期是 1984Q2,第二个文件的开始日期是 1984Q4.您可以看到 merge.zoo 负责为您对齐所有日期.在 zoo 对象中对齐所有内容后,您可以使用 as.ts 方法创建一个 mts 对象.

The example below assumes the starting date for the first file is 1984Q2 and the starting date for the second file is 1984Q4. You can see that merge.zoo takes care of aligning all the dates for you. After everything is aligned in your zoo object, you can use the as.ts method to create a mts object.

z1 <- zoo(f1[,-1], as.Date(f1[,1], "%d.%m.%y"))
z2 <- zoo(f2, as.yearqtr("1984Q4")+(seq_len(NROW(f1))-1)/4)
z3 <- zoo(f3[,-1], as.yearqtr(as.Date(f3[,1])))

library(xts)
# Use xts::apply.quarterly to aggregate series with higher periodicity.
# Here I just take the last obs but you could use another function (e.g. mean).
z1 <- apply.quarterly(z1, last)
index(z1) <- as.yearqtr(index(z1))  # convert the index to yearqtr

(Z <- merge(z1,z2,z3))
#         z1      Germany  Switzerland USA      OECDEurope z3
# 1984 Q2 <NA>    <NA>     <NA>        <NA>     <NA>       33.32383
# 1984 Q3 <NA>    <NA>     <NA>        <NA>     <NA>       63.57983
# 1984 Q4 <NA>    69.90974 61.8241     55.60966 64.96157   35.83754
# 1985 Q1 <NA>    67.0394  62.18966    56.47361 64.15152   <NA>
# 1985 Q2 <NA>    70.56651 63.6347     56.87237 65.43568   <NA>
# 1985 Q3 <NA>    69.90974 61.8241     55.60966 64.96157   <NA>
# 2011 Q2 5965.95 <NA>     <NA>        <NA>     <NA>       <NA>

# Note that ts will create an object with a observation for every period,
# even if all the columns are missing.
TS <- as.ts(Z)

这篇关于Excel 或 R:从多个来源准备时间序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆