R –如何在最近的时间日期合并两个数据帧? [英] R – How to join two data frames by nearest time-date?

查看:99
本文介绍了R –如何在最近的时间日期合并两个数据帧?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个数据集,每个数据集包含POSIXlt格式的日期时间值,以及一些其他数字和字符变量。



我想将两个数据集合并根据日期时间列。
但是两个数据集的日期戳不匹配,因此我需要按最近的日期(之前或之后)组合它们。
在我的示例中,需要将2016-03-01 23:52:00的数据值 e与2016-03-02 00:00:00的 binH组合,而不是 binG。 / p>

是否有一个函数可以让我按最近的日期时间值组合数据集,即使它在后面?



我发现了使用cut()函数或data.tables中的roll = Inf函数将日期组合到下一个上一个日期的方法。但是我无法将时间戳设置为roll ='nearest'接受的任何格式。

 > df1 
date1值
1 2016-03-01 17:52:00 a
2 2016-03-01 18:01:30 b
3 2016-03-01 18:05:00 c
4 2016-03-01 20:42:30 d
5 2016-03-01 23:52:00 e

> df2
date2 bin_name
1 2016-03-01 17:00:00 binA
2 2016-03-01 18:00:00 binB
3 2016-03-01 19:00:00 binC
4 2016-03-01 20:00:00 binD
5 2016-03-01 21:00:00 binE
6 2016-03-01 22:00:00 binF
7 2016-03-01 23:00:00 binG
8 2016-03-02 00:00:00 binH
9 2016-03-02 01:00:00 binI


解决方案

data.table 应该可以为此(您能解释遇到的错误吗?),尽管它确实会自行将POSIXlt转换为POSIXct(也许是在您的datetime co上执行该转换)手动查看以保持 data.table 开心)。还要确保在使用 roll 之前设置键列。



(我创建了自己的示例此处的表格可以使我的生活更轻松一些。如果您想在自己的计算机上使用dput,我很乐意使用您的数据更新此示例):

 新<-data.table(date = as.POSIXct(c( 2016-03-02 12:20:00, 2016-03-07 12:20:00, 2016-04-02 12:20:00)),data.new = c( t, u, v))
head(new,2)

日期data.new
1:2016-03-02 12:20:00 t
2:2016-03-07 12:20:00 u

旧< -data.table(date = as.POSIXct(c( 2016-03-02 12:20:00, 2016-03-07 12:20:00, 2016-04-02 12:20:00 , 2015-03-02 12:20:00)),data.old = c( a, b, c, d)))
head(old,2)


日期data.old
1:2016-03-02 12:20:00 a
2:2016-03-07 12:20:00 b

setkey(新,日期)
setkey(旧,日期)

组合<-new [旧,roll = n earest]
合并

日期data.new data.old
1:2015-03-02 12:20:00 td
2:2016-03- 02 12:20:00 ta
3:2016-03-07 12:20:00 ub
4:2016-04-02 12:20:00 vc

我故意使两个表的行长不同,以显示滚动联接如何处理多个匹配项。您可以通过以下方式切换其加入方式:

 组合的<-old [new,roll = nearest] 
合并

日期data.old data.new
1:2016-03-02 12:20:00 at
2:2016-03-07 12:20:00 bu
3:2016-04-02 12:20:00 cv


I have 2 data sets, each containing a date-time value in POSIXlt format, and some other numeric and character variables.

I want to combine both data sets based on the date-time column. But the date stamps of both data sets do not match, so I need to combine them by nearest date (before or after). In my example, data value "e" from 2016-03-01 23:52:00 needs to be combined with "binH" at 2016-03-02 00:00:00, not "binG".

Is there a function that would allow me to combine my data sets by nearest date-time value, even if it is after?

I have found ways of combining dates to the next previous date using the cut() function, or the roll=Inf function in data.tables. But I couldn't get my timestamps into any format roll='nearest' would accept.

    >df1
    date1 value
    1 2016-03-01 17:52:00     a
    2 2016-03-01 18:01:30     b
    3 2016-03-01 18:05:00     c
    4 2016-03-01 20:42:30     d
    5 2016-03-01 23:52:00     e

    >df2
    date2 bin_name
    1 2016-03-01 17:00:00     binA
    2 2016-03-01 18:00:00     binB
    3 2016-03-01 19:00:00     binC
    4 2016-03-01 20:00:00     binD
    5 2016-03-01 21:00:00     binE
    6 2016-03-01 22:00:00     binF
    7 2016-03-01 23:00:00     binG
    8 2016-03-02 00:00:00     binH
    9 2016-03-02 01:00:00     binI

解决方案

data.table should work for this (can you explain the error you're coming up against?), although it does tend to convert POSIXlt to POSIXct on its own (perhaps do that conversion on your datetime column manually to keep data.table happy). Also make sure you're setting the key column before using roll.

(I've created my own example tables here to make my life that little bit easier. If you want to use dput on yours, I'm happy to update this example with your data):

new <- data.table( date = as.POSIXct( c( "2016-03-02 12:20:00", "2016-03-07 12:20:00", "2016-04-02 12:20:00" ) ), data.new = c( "t","u","v" ) )
head( new, 2 )

                  date data.new
1: 2016-03-02 12:20:00        t
2: 2016-03-07 12:20:00        u

old <- data.table( date = as.POSIXct( c( "2016-03-02 12:20:00", "2016-03-07 12:20:00", "2016-04-02 12:20:00", "2015-03-02 12:20:00" ) ), data.old = c( "a","b","c","d" ) )
head( old, 2 )


                  date data.old
1: 2016-03-02 12:20:00        a
2: 2016-03-07 12:20:00        b

setkey( new, date )
setkey( old, date )

combined <- new[ old, roll = "nearest" ]
combined

                  date data.new data.old
1: 2015-03-02 12:20:00        t        d
2: 2016-03-02 12:20:00        t        a
3: 2016-03-07 12:20:00        u        b
4: 2016-04-02 12:20:00        v        c

I've intentionally made the two tables different row lengths, in order to show how the rolling join deals with multiple matches. You can switch the way it joins with:

combined <- old[ new, roll = "nearest" ]
combined

                  date data.old data.new
1: 2016-03-02 12:20:00        a        t
2: 2016-03-07 12:20:00        b        u
3: 2016-04-02 12:20:00        c        v

这篇关于R –如何在最近的时间日期合并两个数据帧?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆