在R数据表中将价格数据聚合到不同的时间范围 [英] Aggregating price data to different time horizon in R data.table

查看:107
本文介绍了在R数据表中将价格数据聚合到不同的时间范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想把data.table中的精确数据汇总到5分钟(或10分钟)。我知道这很容易通过使用xts和to.minutes5函数,但我不喜欢使用xts在这个实例中,因为数据集是相当大。在data.table中有一个简单的方法吗?

Hi I'm looking to roll up minutely data in a data.table to 5 minutely (or 10 minutely) horizon. I know this is easily done via using xts and the to.minutes5 function, but I prefer not to use xts in this instance as the data set is rather large. Is there an easy way to do this in data.table ?

数据示例:在这个例子中,21.30到21.34之间的周期只有一行t = 21.30,open = 0.88703,high = 0.88799,low = 0.88702,close = 0.88798,volume = 43(注意来自21.35本身的数据被忽略)。

Data example : In this example the period between 21.30 to 21.34 (both inclusive) would have just one row with t = 21.30, open = 0.88703 , high = 0.88799 , low = 0.88702 , close = 0.88798, volume = 43 (note the data from 21.35 itself is ignored).

                      t    open    high     low   close volume
 1: 2010-01-03 21:27:00 0.88685 0.88688 0.88685 0.88688      2
 2: 2010-01-03 21:28:00 0.88688 0.88688 0.88686 0.88688      5
 3: 2010-01-03 21:29:00 0.88688 0.88704 0.88687 0.88703      7
 4: 2010-01-03 21:30:00 0.88703 0.88795 0.88702 0.88795     10
 5: 2010-01-03 21:31:00 0.88795 0.88795 0.88774 0.88778      7
 6: 2010-01-03 21:32:00 0.88778 0.88778 0.88753 0.88760      8
 7: 2010-01-03 21:33:00 0.88760 0.88781 0.88760 0.88775     11
 8: 2010-01-03 21:34:00 0.88775 0.88799 0.88775 0.88798      7
 9: 2010-01-03 21:35:00 0.88798 0.88803 0.88743 0.88782      8
10: 2010-01-03 21:36:00 0.88782 0.88782 0.88770 0.88778      6

根据GSee要求,从dput(head(myData))输出。我想使用data.table存储一些基于这个原始数据的派生字段。所以,即使我使用xts卷起这些价格栏,我将不得不把它们放在一个数据表中的某种方式,所以我会感谢任何提示围绕正确的方式来保存data.table与xts项目。

Output from dput(head(myData)) as requested by GSee. I want to use the data.table for storing some more derived fields based on this original data. So, even if I did use xts to roll up these price bars, I'll have to put them in a data table somehow, so I'd appreciate any tips around the correct way to hold data.table with xts items.

structure(list(t = structure(c(1241136000, 1241136060, 1241136120, 
1241136180, 1241136240, 1241136300), class = c("POSIXct", "POSIXt"
), tzone = "Europe/London"), open = c(0.89467, 0.89467, 0.89472, 
0.89473, 0.89504, 0.895), high = c(0.89481, 0.89475, 0.89473, 
0.89506, 0.8951, 0.895), low = c(0.89457, 0.89465, 0.89462, 0.89473, 
0.89486, 0.89486), close = c(0.89467, 0.89472, 0.89473, 0.89504, 
0.895, 0.89488), volume = c(96L, 14L, 123L, 49L, 121L, 36L)), .Names = c("t", 
"open", "high", "low", "close", "volume"), class = c("data.table", 
"data.frame"), row.names = c(NA, -6L), .internal.selfref = <pointer: 0x0000000000100788>)


推荐答案

xts 上的端点函数$ c>向量。 endpoints 查找某个时间段的最后一个元素的位置。按照惯例,1:05不会被包括在与1:00相同的栏中。因此,您提供的 dput 的数据(与上面打印的数据不同)将有2个柱。

You can use the endpoints function (which is written in C) from xts on POSIXt vectors. endpoints finds the position of the last element of a certain time period. By convention, 1:05 would not be included in the same bar as 1:00. So, the data that you provided dput for (which is different than the printed data above it) will have 2 bars.

假设 dt 是您的 data.table

library(data.table)
library(xts)

setkey(dt, t)  # make sure the data.table is sorted by time.
ep <- endpoints(dt$t, "minutes", 5)[-1] # remove the first value, which is 0
dt[ep, grp:=seq_along(ep)]              # create a column to group by
dt[, grp:=na.locf(grp, fromLast=TRUE)]  # fill in NAs

dt[, list(t=last(t), open=open[1], high=max(high), low=min(low), 
          close=last(close), volume=sum(volume)), by=grp]

   grp                   t    open   high     low   close volume
1:   1 2009-05-01 01:04:00 0.89467 0.8951 0.89457 0.89500    403
2:   2 2009-05-01 01:05:00 0.89500 0.8950 0.89486 0.89488     36

这篇关于在R数据表中将价格数据聚合到不同的时间范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆