在R中汇总时间序列 [英] aggregating time series in R

查看:78
本文介绍了在R中汇总时间序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下OHLC数据(以3分钟为间隔)

I have the following OHLC data (by 3-minute intervals)

library(tseries)
library(xts)
library(quantmod)
> str(tickmin)
An ‘xts’ object from 2010-06-30 15:47:00 to 2010-09-08 15:14:00 containing:
  Data: num [1:8776, 1:5] 9215 9220 9205 9195 9195 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:5] "zv.Open" "zv.High" "zv.Low" "zv.Close" ...
  Indexed by objects of class: [POSIXct,POSIXt] TZ: 
  xts Attributes:  
 NULL


>tickmin
2010-09-08 15:02:00        20
2010-09-08 15:04:00        77
2010-09-08 15:08:00        86
2010-09-08 15:11:00         7
2010-09-08 15:14:00        43
> start(tickmin)
[1] "2010-06-30 15:47:00 EDT"
> end(tickmin)
[1] "2010-09-08 15:14:00 EDT"

我正在尝试使用以下方法对其进行汇总:

I am trying to aggregate it using the following:

> by <-timeSequence(from = start(tickmin), to = end(tickmin), format="%Y-%m-%d %H%M", by = "day")
>by
[61] [2010-08-29 19:47:00] [2010-08-30 19:47:00] [2010-08-31 19:47:00]
[64] [2010-09-01 19:47:00] [2010-09-02 19:47:00] [2010-09-03 19:47:00]
[67] [2010-09-04 19:47:00] [2010-09-05 19:47:00] [2010-09-06 19:47:00]
[70] [2010-09-07 19:47:00]

> aggregate(Vo(tickmin),by,sum)
Error: length(time(x)) == length(by[[1]]) is not TRUE

..非常感谢您提出有关如何解决该错误的建议。

..would appreciate any suggestions on how I can fix the error.

推荐答案

我将解释您的错误并告诉您如何解决,但是有一种更好的方法来执行您正在执行的操作。因此,请确保您已阅读完整的答案!

I'll explain your error and tell you how to fix it, but there's a better way to do what you're doing. So make sure you read my entire answer!

从错误消息中,您的 by 的长度不是与 Vo(tickmin)的长度相同。
您必须生成您的 by ,以使一天中的 tickmin 中的每个对应值具有一个值。

From the error message, the length of your by is not the same length as Vo(tickmin). You have to generate your by to have one value per corresponding value in tickmin, with the day.

作为示例,在此生成一个 xts 对象:

As an example here I generate an xts object:

# generate a set of times from 2010-06-30 onwards at 20 minute intervals
tms <- as.POSIXct(seq(0,3600*24*30,by=60*20),origin="2010-06-30")
n   <- length(tms)
# generate volumes for those intervals, random 0 -- 100, turn into xts object
xts.ts <- xts(sample.int(100,n,replace=T),tms)
colnames(xts.ts)<-'Volume'

这将产生:

> head(xts.ts)
                    Volume
2010-06-30 00:00:00     97
2010-06-30 00:20:00     78
2010-06-30 00:40:00     38
2010-06-30 01:00:00     86
2010-06-30 01:20:00     79
2010-06-30 01:40:00     55

访问 xts.ts 您使用 index(xts.ts)给出了一大堆日期字符串,例如美国东部标准时间2010-07-30 00:00:00

To access the dates of xts.ts you use index(xts.ts) which gives a whole bunch of strings of the date, e.g. "2010-07-30 00:00:00 EST".

将这些数字四舍五入到最近的一天可以使用作为日期

To round these to the nearest day you can use as.Date:

> as.Date(index(xts.ts))
   [1] "2010-06-29" "2010-06-29" "2010-06-29" "2010-06-29" "2010-06-29"
    ....



解决问题的方法



然后使用汇总,您将执行以下操作:

Solution to your problem

Then to use aggregate you do:

> aggregate(Vo(xts.ts),as.Date(index(xts.ts)),sum)

2010-06-29 1858
2010-06-30 3733
2010-07-01 3906
2010-07-02 3359
2010-07-03 3838
...



更好地解决问题的方法



xts 软件包具有函数 apply.daily apply.monthly 等(使用 ls('package:xts ')以查看其功能-可能您感兴趣的功能。

Better solution to your problem

The xts package has functions apply.daily, apply.monthly, etc (use ls('package:xts') to see what functions it has -- there may be ones you're interested in).

应用.daily(x,FUN,...)精确地您想要的。请参见?apply.daily
要使用它,您可以执行以下操作:

apply.daily(x,FUN,...) does exactly what you want. See ?apply.daily. To use it you can do:

> apply.daily(xts.ts,sum)

                    Volume
2010-06-30 23:40:00   4005
2010-07-01 23:40:00   4093
2010-07-02 23:40:00   3419
2010-07-03 23:40:00   3737
...

或者如果您的 xts 对象具有其他列,例如 Open 关闭等,您可以执行 apply.daily(xts.ts,function(x)sum(Vo(x)))

Or if your xts object has other columns like Open, Close etc, you can do apply.daily(xts.ts, function(x) sum(Vo(x))).

请注意,使用 apply.daily 汇总... as.Date 方法。这是因为 apply.daily 每天从 start(xts.ts) end(xts .ts)(或多或少),而总计只是从午夜到午夜。

Note that the answers are slightly different using apply.daily to the aggregate ... as.Date method. That's because apply.daily goes daily from start(xts.ts) to end(xts.ts) (more or less) whereas aggregate just went by day from midnight to midnight.

看着您的问题, apply.daily 似乎与您想要的工作最接近(并且提供了 xts ,为什么不使用它?)

Looking at your question, apply.daily seems to match most closely what you want to do (and is provided with xts anyway, so why not use it?)

这篇关于在R中汇总时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆