data.table时间子集与xts时间子集 [英] data.table time subset vs xts time subset

查看:169
本文介绍了data.table时间子集与xts时间子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好我正在按时间分析一些精确的数据。我通常使用 xts 执行像:

Hi I am looking to subset some minutely data by time. I normally use xts doing something like:

subset.string <- 'T10:00/T13:00' 
xts.min.obj[subset.string]

可以获取在10am和1pm(包含)每天之间的所有行,并将输出作为xts格式。但是对我的目的来说有点慢...例如

to get all the rows which are between 10am and 1pm (inclusive) EACH DAY and have the output as an xts format. But is a bit slow for my purposes...e.g

j <- xts(rnorm(10e6),Sys.time()-(10e6:1))
system.time(j['T10:00/T16:00'])
   user  system elapsed 
  5.704   0.577  17.115 

我知道 data.table 是v fast,所以我想知道如果结合 fasttime 包来处理快速POSIXct创建,如果它是值得的,创建一个像

I know that data.table is v fast and at subsetting large datasets so am wondering if in conjunction with the fasttime package to deal with fast POSIXct creations, if it would be worth it to create a function like

dt.time.subset <- function(xts.min.obj, subset.string){
  require(data.table)
  require(fasttime)
  x.dt <- data.table(ts=format(index(xts.min.obj),"%Y-%m-%d %H:%M:%S %Z"),
                     coredata(xts.min.obj))
  out <- x.dt[,some.subsetting.operation.using."%between%"]
  xts(out,fastPOSIXct(out[,ts])
}

obj into一个data.table添加一些字符索引,然后使用data.table子集相关行使用输出行索引与fasttime重新创建一个xts输出?或者对于已经高度优化并用C语言编写的东西来说,这太多的操作了吗?

to convert the xts.min.obj into a data.table add some sort of character index and then use data.table to subset the relevant rows use the output row index with fasttime to recreate an xts output? or is this too many excess operations for something that is already highly optimised and written in C?

推荐答案

UTC 中指定您的范围,您可以:

If you're ok with specifying your range in UTC, you can do:

j[(.index(j) %% 86400) %between% c(10*3600, 16*3600 + 60)]
# +60 because xts includes that minute; you'll need to offset the times
# appropriately to match with xts unless you live in UTC :)

j <- xts(rnorm(10e6),Sys.time()-(10e6:1))
system.time(j[(.index(j) %% 86400) %between% c(10*3600, 16*3600 + 60)])
#  user  system elapsed 
#  1.17    0.08    1.25 
# likely faster on your machine as mine takes minutes to run the OP bench

这篇关于data.table时间子集与xts时间子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆