在 R 中聚合时间序列 [英] aggregating time series in R

查看:16
本文介绍了在 R 中聚合时间序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下 OHLC 数据(以 3 分钟为间隔)

I have the following OHLC data (by 3-minute intervals)

library(tseries)
library(xts)
library(quantmod)
> str(tickmin)
An ‘xts’ object from 2010-06-30 15:47:00 to 2010-09-08 15:14:00 containing:
  Data: num [1:8776, 1:5] 9215 9220 9205 9195 9195 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:5] "zv.Open" "zv.High" "zv.Low" "zv.Close" ...
  Indexed by objects of class: [POSIXct,POSIXt] TZ: 
  xts Attributes:  
 NULL


>tickmin
2010-09-08 15:02:00        20
2010-09-08 15:04:00        77
2010-09-08 15:08:00        86
2010-09-08 15:11:00         7
2010-09-08 15:14:00        43
> start(tickmin)
[1] "2010-06-30 15:47:00 EDT"
> end(tickmin)
[1] "2010-09-08 15:14:00 EDT"

我正在尝试使用以下方法对其进行聚合:

I am trying to aggregate it using the following:

> by <-timeSequence(from = start(tickmin), to = end(tickmin), format="%Y-%m-%d %H%M", by = "day")
>by
[61] [2010-08-29 19:47:00] [2010-08-30 19:47:00] [2010-08-31 19:47:00]
[64] [2010-09-01 19:47:00] [2010-09-02 19:47:00] [2010-09-03 19:47:00]
[67] [2010-09-04 19:47:00] [2010-09-05 19:47:00] [2010-09-06 19:47:00]
[70] [2010-09-07 19:47:00]

> aggregate(Vo(tickmin),by,sum)
Error: length(time(x)) == length(by[[1]]) is not TRUE

.. 如有任何关于如何修复错误的建议.

..would appreciate any suggestions on how I can fix the error.

推荐答案

我会解释你的错误并告诉你如何解决它,但是有一个更好的方法来做你正在做的事情.所以请务必阅读我的全部答案!

I'll explain your error and tell you how to fix it, but there's a better way to do what you're doing. So make sure you read my entire answer!

根据错误消息,您的 by 的长度与 Vo(tickmin) 的长度不同.您必须生成您的 by,以便在 tickmin 中每个对应的值都有一个值,日期.

From the error message, the length of your by is not the same length as Vo(tickmin). You have to generate your by to have one value per corresponding value in tickmin, with the day.

作为一个例子,这里我生成一个 xts 对象:

As an example here I generate an xts object:

# generate a set of times from 2010-06-30 onwards at 20 minute intervals
tms <- as.POSIXct(seq(0,3600*24*30,by=60*20),origin="2010-06-30")
n   <- length(tms)
# generate volumes for those intervals, random 0 -- 100, turn into xts object
xts.ts <- xts(sample.int(100,n,replace=T),tms)
colnames(xts.ts)<-'Volume'

产生:

> head(xts.ts)
                    Volume
2010-06-30 00:00:00     97
2010-06-30 00:20:00     78
2010-06-30 00:40:00     38
2010-06-30 01:00:00     86
2010-06-30 01:20:00     79
2010-06-30 01:40:00     55

要访问 xts.ts 的日期,您可以使用 index(xts.ts) 它提供一大堆日期字符串,例如2010-07-30 00:00:00 EST".

To access the dates of xts.ts you use index(xts.ts) which gives a whole bunch of strings of the date, e.g. "2010-07-30 00:00:00 EST".

要将这些四舍五入到最近的一天,您可以使用 as.Date:

To round these to the nearest day you can use as.Date:

> as.Date(index(xts.ts))
   [1] "2010-06-29" "2010-06-29" "2010-06-29" "2010-06-29" "2010-06-29"
    ....

解决您的问题

然后使用 aggregate 你这样做:

> aggregate(Vo(xts.ts),as.Date(index(xts.ts)),sum)

2010-06-29 1858
2010-06-30 3733
2010-07-01 3906
2010-07-02 3359
2010-07-03 3838
...

更好地解决您的问题

xts包有apply.dailyapply.monthly等功能(使用ls('package:xts') 看看它有什么功能——可能有你感兴趣的功能).

Better solution to your problem

The xts package has functions apply.daily, apply.monthly, etc (use ls('package:xts') to see what functions it has -- there may be ones you're interested in).

apply.daily(x,FUN,...) 正是 你想要的.请参阅 ?apply.daily.要使用它,您可以:

apply.daily(x,FUN,...) does exactly what you want. See ?apply.daily. To use it you can do:

> apply.daily(xts.ts,sum)

                    Volume
2010-06-30 23:40:00   4005
2010-07-01 23:40:00   4093
2010-07-02 23:40:00   3419
2010-07-03 23:40:00   3737
...

或者,如果您的 xts 对象有其他列,例如 OpenClose 等,您可以执行 apply.daily(xts.ts, function(x) sum(Vo(x))).

Or if your xts object has other columns like Open, Close etc, you can do apply.daily(xts.ts, function(x) sum(Vo(x))).

请注意,使用 apply.dailyaggregate ... as.Date 方法的答案略有不同.这是因为 apply.daily 每天从 start(xts.ts)end(xts.ts) (或多或少)而 聚合刚刚从午夜到午夜.

Note that the answers are slightly different using apply.daily to the aggregate ... as.Date method. That's because apply.daily goes daily from start(xts.ts) to end(xts.ts) (more or less) whereas aggregate just went by day from midnight to midnight.

看看你的问题,apply.daily 似乎最符合你想做的事情(并且无论如何都提供了 xts,所以为什么不使用它呢?)

Looking at your question, apply.daily seems to match most closely what you want to do (and is provided with xts anyway, so why not use it?)

这篇关于在 R 中聚合时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆