在 R 中聚合时间序列 [英] aggregating time series in R
问题描述
我有以下 OHLC 数据(以 3 分钟为间隔)
I have the following OHLC data (by 3-minute intervals)
library(tseries)
library(xts)
library(quantmod)
> str(tickmin)
An ‘xts’ object from 2010-06-30 15:47:00 to 2010-09-08 15:14:00 containing:
Data: num [1:8776, 1:5] 9215 9220 9205 9195 9195 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] "zv.Open" "zv.High" "zv.Low" "zv.Close" ...
Indexed by objects of class: [POSIXct,POSIXt] TZ:
xts Attributes:
NULL
>tickmin
2010-09-08 15:02:00 20
2010-09-08 15:04:00 77
2010-09-08 15:08:00 86
2010-09-08 15:11:00 7
2010-09-08 15:14:00 43
> start(tickmin)
[1] "2010-06-30 15:47:00 EDT"
> end(tickmin)
[1] "2010-09-08 15:14:00 EDT"
我正在尝试使用以下方法对其进行聚合:
I am trying to aggregate it using the following:
> by <-timeSequence(from = start(tickmin), to = end(tickmin), format="%Y-%m-%d %H%M", by = "day")
>by
[61] [2010-08-29 19:47:00] [2010-08-30 19:47:00] [2010-08-31 19:47:00]
[64] [2010-09-01 19:47:00] [2010-09-02 19:47:00] [2010-09-03 19:47:00]
[67] [2010-09-04 19:47:00] [2010-09-05 19:47:00] [2010-09-06 19:47:00]
[70] [2010-09-07 19:47:00]
> aggregate(Vo(tickmin),by,sum)
Error: length(time(x)) == length(by[[1]]) is not TRUE
.. 如有任何关于如何修复错误的建议.
..would appreciate any suggestions on how I can fix the error.
推荐答案
我会解释你的错误并告诉你如何解决它,但是有一个更好的方法来做你正在做的事情.所以请务必阅读我的全部答案!
I'll explain your error and tell you how to fix it, but there's a better way to do what you're doing. So make sure you read my entire answer!
根据错误消息,您的 by
的长度与 Vo(tickmin)
的长度不同.您必须生成您的 by
,以便在 tickmin
中每个对应的值都有一个值,日期.
From the error message, the length of your by
is not the same length as Vo(tickmin)
.
You have to generate your by
to have one value per corresponding value in tickmin
, with the day.
作为一个例子,这里我生成一个 xts
对象:
As an example here I generate an xts
object:
# generate a set of times from 2010-06-30 onwards at 20 minute intervals
tms <- as.POSIXct(seq(0,3600*24*30,by=60*20),origin="2010-06-30")
n <- length(tms)
# generate volumes for those intervals, random 0 -- 100, turn into xts object
xts.ts <- xts(sample.int(100,n,replace=T),tms)
colnames(xts.ts)<-'Volume'
产生:
> head(xts.ts)
Volume
2010-06-30 00:00:00 97
2010-06-30 00:20:00 78
2010-06-30 00:40:00 38
2010-06-30 01:00:00 86
2010-06-30 01:20:00 79
2010-06-30 01:40:00 55
要访问 xts.ts
的日期,您可以使用 index(xts.ts)
它提供一大堆日期字符串,例如2010-07-30 00:00:00 EST"
.
To access the dates of xts.ts
you use index(xts.ts)
which gives a whole bunch of strings of the date, e.g. "2010-07-30 00:00:00 EST"
.
要将这些四舍五入到最近的一天,您可以使用 as.Date
:
To round these to the nearest day you can use as.Date
:
> as.Date(index(xts.ts))
[1] "2010-06-29" "2010-06-29" "2010-06-29" "2010-06-29" "2010-06-29"
....
解决您的问题
然后使用 aggregate
你这样做:
> aggregate(Vo(xts.ts),as.Date(index(xts.ts)),sum)
2010-06-29 1858
2010-06-30 3733
2010-07-01 3906
2010-07-02 3359
2010-07-03 3838
...
更好地解决您的问题
xts
包有apply.daily
、apply.monthly
等功能(使用ls('package:xts')
看看它有什么功能——可能有你感兴趣的功能).
Better solution to your problem
The xts
package has functions apply.daily
, apply.monthly
, etc (use ls('package:xts')
to see what functions it has -- there may be ones you're interested in).
apply.daily(x,FUN,...)
正是 你想要的.请参阅 ?apply.daily
.要使用它,您可以:
apply.daily(x,FUN,...)
does exactly what you want. See ?apply.daily
.
To use it you can do:
> apply.daily(xts.ts,sum)
Volume
2010-06-30 23:40:00 4005
2010-07-01 23:40:00 4093
2010-07-02 23:40:00 3419
2010-07-03 23:40:00 3737
...
或者,如果您的 xts
对象有其他列,例如 Open
、Close
等,您可以执行 apply.daily(xts.ts, function(x) sum(Vo(x)))
.
Or if your xts
object has other columns like Open
, Close
etc, you can do apply.daily(xts.ts, function(x) sum(Vo(x)))
.
请注意,使用 apply.daily
到 aggregate ... as.Date
方法的答案略有不同.这是因为 apply.daily
每天从 start(xts.ts)
到 end(xts.ts)
(或多或少)而 聚合
刚刚从午夜到午夜.
Note that the answers are slightly different using apply.daily
to the aggregate ... as.Date
method. That's because apply.daily
goes daily from start(xts.ts)
to end(xts.ts)
(more or less) whereas aggregate
just went by day from midnight to midnight.
看看你的问题,apply.daily
似乎最符合你想做的事情(并且无论如何都提供了 xts
,所以为什么不使用它呢?)
Looking at your question, apply.daily
seems to match most closely what you want to do (and is provided with xts
anyway, so why not use it?)
这篇关于在 R 中聚合时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!