R - 使用data.table滚动连接的意外输出 [英] R - Unexpected output for rolling join with data.table

查看:136
本文介绍了R - 使用data.table滚动连接的意外输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用从data.table包的滚动连接,但我似乎无法得到我想要的输出。

I'm trying to use the rolling join from the data.table package but I can't seem to get the output I want.

我的资料是:

library(data.table)

dt <- fread('    datetime   price
"2016-05-01 18:58:49.078" 2059.25
"2016-05-01 18:58:49.078" 2059.25
"2016-05-01 18:58:49.078" 2059.25
"2016-05-01 18:58:49.078" 2059.25
"2016-05-01 18:58:51.085" 2059.25
"2016-05-01 18:58:51.085" 2059.25
"2016-05-01 18:58:51.085" 2059.25
"2016-05-01 18:58:51.085" 2059.25
"2016-05-01 18:58:51.085" 2059.25
"2016-05-01 18:58:51.085" 2059.25
"2016-05-01 18:58:51.085" 2059.25
"2016-05-01 18:58:53.703" 2059.25
"2016-05-01 18:58:53.757" 2059.25
"2016-05-01 18:58:53.757" 2059.25
"2016-05-01 18:58:53.757" 2059.25
"2016-05-01 18:58:54.155" 2059.50
"2016-05-01 18:59:07.013" 2059.25
"2016-05-01 18:59:07.013" 2059.25
"2016-05-01 18:59:07.015" 2059.25
"2016-05-01 18:59:08.604" 2059.25
"2016-05-01 18:59:31.500" 2059.50
"2016-05-01 18:59:40.723" 2059.25
"2016-05-01 18:59:40.723" 2059.25
"2016-05-01 19:00:00.003" 2059.50
"2016-05-01 19:00:00.003" 2059.50
"2016-05-01 19:00:00.003" 2059.50
"2016-05-01 19:00:00.359" 2059.50
"2016-05-01 19:00:00.381" 2059.50
"2016-05-01 19:00:02.390" 2059.50
"2016-05-01 19:00:04.355" 2059.50
"2016-05-01 19:00:06.230" 2059.50', header = T)

dt$datetime <- as.POSIXct(dt$datetime)

我想知道每一分钟的最新价格:

and I want to know the most recent price at each minute:

dt_minutes <- data.table(datetime = c(as.POSIXct("2016-05-01 18:59:00"),as.POSIXct("2016-05-01 19:00:00"),as.POSIXct("2016-05-01 19:01:00")))

> dt_minutes
              datetime
1: 2016-05-01 18:59:00
2: 2016-05-01 19:00:00
3: 2016-05-01 19:01:00

,我得到的输出是:

> dt[dt_minutes, roll = TRUE, on = "datetime"]
              datetime  price
1: 2016-05-01 18:59:00 2059.5
2: 2016-05-01 19:00:00 2059.5
3: 2016-05-01 19:00:00 2059.5
4: 2016-05-01 19:00:00 2059.5
5: 2016-05-01 19:01:00 2059.5

但我预计:

1: 2016-05-01 18:59:00 2059.5
2: 2016-05-01 19:00:00 2059.25
5: 2016-05-01 19:01:00 2059.5

有人知道我为什么要重复-01 19:00:00和当时的错误价格?

Does anyone know why I am getting repeated "2016-05-01 19:00:00" in my output and the wrong price for that time?

推荐答案

注释,如果你运行 setNumericRounding(0),你会得到你想要的结果。

Building off of Frank's answer in the comment, if you run setNumericRounding(0) you will achieve the results you want.

请注意,您可以将 datetime 变量的副本存储在 dt ,以查看它起是否与 dt_minutes 中的键匹配,这也应该清楚它是如何四舍五入第一个也是:

Note also that you can store a copy of your datetime variable in dt to see which time it rolled from to match the key in your dt_minutes, which also should make clear how it was rounding in the first place as well:

dt[ , dt_datetime_orig := datetime]  # make a copy of time variable
setNumericRounding(2)  # 2 is the default
dt[dt_minutes, roll = TRUE, on = "datetime"]
##               datetime  price    dt_datetime_orig
## 1: 2016-05-01 18:59:00 2059.5 2016-05-01 18:58:54
## 2: 2016-05-01 19:00:00 2059.5 2016-05-01 19:00:00
## 3: 2016-05-01 19:00:00 2059.5 2016-05-01 19:00:00
## 4: 2016-05-01 19:00:00 2059.5 2016-05-01 19:00:00
## 5: 2016-05-01 19:01:00 2059.5 2016-05-01 19:00:06
setNumericRounding(0)
dt[dt_minutes, roll = TRUE, on = "datetime"]
##               datetime   price    dt_datetime_orig
## 1: 2016-05-01 18:59:00 2059.50 2016-05-01 18:58:54
## 2: 2016-05-01 19:00:00 2059.25 2016-05-01 18:59:40
## 3: 2016-05-01 19:01:00 2059.50 2016-05-01 19:00:06

这篇关于R - 使用data.table滚动连接的意外输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆