在R中汇总时间序列 [英] aggregating time series in R
问题描述
我有以下OHLC数据(以3分钟为间隔)
I have the following OHLC data (by 3-minute intervals)
library(tseries)
library(xts)
library(quantmod)
> str(tickmin)
An ‘xts’ object from 2010-06-30 15:47:00 to 2010-09-08 15:14:00 containing:
Data: num [1:8776, 1:5] 9215 9220 9205 9195 9195 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:5] "zv.Open" "zv.High" "zv.Low" "zv.Close" ...
Indexed by objects of class: [POSIXct,POSIXt] TZ:
xts Attributes:
NULL
>tickmin
2010-09-08 15:02:00 20
2010-09-08 15:04:00 77
2010-09-08 15:08:00 86
2010-09-08 15:11:00 7
2010-09-08 15:14:00 43
> start(tickmin)
[1] "2010-06-30 15:47:00 EDT"
> end(tickmin)
[1] "2010-09-08 15:14:00 EDT"
我正在尝试使用以下方法对其进行汇总:
I am trying to aggregate it using the following:
> by <-timeSequence(from = start(tickmin), to = end(tickmin), format="%Y-%m-%d %H%M", by = "day")
>by
[61] [2010-08-29 19:47:00] [2010-08-30 19:47:00] [2010-08-31 19:47:00]
[64] [2010-09-01 19:47:00] [2010-09-02 19:47:00] [2010-09-03 19:47:00]
[67] [2010-09-04 19:47:00] [2010-09-05 19:47:00] [2010-09-06 19:47:00]
[70] [2010-09-07 19:47:00]
> aggregate(Vo(tickmin),by,sum)
Error: length(time(x)) == length(by[[1]]) is not TRUE
..非常感谢您提出有关如何解决该错误的建议。
..would appreciate any suggestions on how I can fix the error.
推荐答案
我将解释您的错误并告诉您如何解决,但是有一种更好的方法来执行您正在执行的操作。因此,请确保您已阅读完整的答案!
I'll explain your error and tell you how to fix it, but there's a better way to do what you're doing. So make sure you read my entire answer!
从错误消息中,您的 by
的长度不是与 Vo(tickmin)
的长度相同。
您必须生成您的 by
,以使一天中的 tickmin
中的每个对应值具有一个值。
From the error message, the length of your by
is not the same length as Vo(tickmin)
.
You have to generate your by
to have one value per corresponding value in tickmin
, with the day.
作为示例,在此生成一个 xts
对象:
As an example here I generate an xts
object:
# generate a set of times from 2010-06-30 onwards at 20 minute intervals
tms <- as.POSIXct(seq(0,3600*24*30,by=60*20),origin="2010-06-30")
n <- length(tms)
# generate volumes for those intervals, random 0 -- 100, turn into xts object
xts.ts <- xts(sample.int(100,n,replace=T),tms)
colnames(xts.ts)<-'Volume'
这将产生:
> head(xts.ts)
Volume
2010-06-30 00:00:00 97
2010-06-30 00:20:00 78
2010-06-30 00:40:00 38
2010-06-30 01:00:00 86
2010-06-30 01:20:00 79
2010-06-30 01:40:00 55
访问 xts.ts
您使用 index(xts.ts)
给出了一大堆日期字符串,例如美国东部标准时间2010-07-30 00:00:00 。
To access the dates of xts.ts
you use index(xts.ts)
which gives a whole bunch of strings of the date, e.g. "2010-07-30 00:00:00 EST"
.
将这些数字四舍五入到最近的一天可以使用作为日期
:
To round these to the nearest day you can use as.Date
:
> as.Date(index(xts.ts))
[1] "2010-06-29" "2010-06-29" "2010-06-29" "2010-06-29" "2010-06-29"
....
解决问题的方法
然后使用汇总
,您将执行以下操作:
Solution to your problem
Then to use aggregate
you do:
> aggregate(Vo(xts.ts),as.Date(index(xts.ts)),sum)
2010-06-29 1858
2010-06-30 3733
2010-07-01 3906
2010-07-02 3359
2010-07-03 3838
...
更好地解决问题的方法
xts
软件包具有函数 apply.daily
, apply.monthly
等(使用 ls('package:xts ')
以查看其功能-可能您感兴趣的功能。
Better solution to your problem
The xts
package has functions apply.daily
, apply.monthly
, etc (use ls('package:xts')
to see what functions it has -- there may be ones you're interested in).
应用.daily(x,FUN,...)
精确地了您想要的。请参见?apply.daily
。
要使用它,您可以执行以下操作:
apply.daily(x,FUN,...)
does exactly what you want. See ?apply.daily
.
To use it you can do:
> apply.daily(xts.ts,sum)
Volume
2010-06-30 23:40:00 4005
2010-07-01 23:40:00 4093
2010-07-02 23:40:00 3419
2010-07-03 23:40:00 3737
...
或者如果您的 xts
对象具有其他列,例如 Open
,关闭
等,您可以执行 apply.daily(xts.ts,function(x)sum(Vo(x)))
。
Or if your xts
object has other columns like Open
, Close
etc, you can do apply.daily(xts.ts, function(x) sum(Vo(x)))
.
请注意,使用 apply.daily
到汇总... as.Date
方法。这是因为 apply.daily
每天从 start(xts.ts)
到 end(xts .ts)
(或多或少),而总计
只是从午夜到午夜。
Note that the answers are slightly different using apply.daily
to the aggregate ... as.Date
method. That's because apply.daily
goes daily from start(xts.ts)
to end(xts.ts)
(more or less) whereas aggregate
just went by day from midnight to midnight.
看着您的问题, apply.daily
似乎与您想要的工作最接近(并且提供了 xts
,为什么不使用它?)
Looking at your question, apply.daily
seems to match most closely what you want to do (and is provided with xts
anyway, so why not use it?)
这篇关于在R中汇总时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!