连接R数据表,其中键值不完全相等 - 将行与最接近的时间组合 [英] Join R data.tables where key values are not exactly equal--combine rows with closest times

查看:126
本文介绍了连接R数据表,其中键值不完全相等 - 将行与最接近的时间组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个光滑的方法来连接数据表在R中的时间的键值接近,但不完全相同?例如,假设我有一个为不同时间段提供的结果数据表:

Is there a slick way to join data tables in R where key values of time are close, but not exactly the same? For example, suppose I have a data table of results that are given for different time periods:

DT1 = data.table(x=rep(c("a","b","c"),each=3), time=c(10,30,60), v=1:9)

这里,我们有不同类别(x)在不同时间(时间)拍摄的一些值(v)。现在假设我有另一个来源的数据,为不同的类别提供一些时间值:

Here, we have some values (v) for different categories (x) taken at different times (time). Now suppose that I have data from another source that provides some time values for the different categories:

DT2=data.table(x=rep(c("a","b","c"),each=1),time=c(10,10,60))

我可能有兴趣尝试匹配DT2中的时间,尽可能接近DT1来预测DT2类别的值v。我想做一些像

I might be interested in trying to match the times in DT2 as closely as I can to DT1 to predict a value, v, for my DT2 categories. I would like to do something like

setkeyv(DT2,c("x","time"))
merge(DT1,DT2,by=c("time","v")

   time x v
1:   10 a 1
2:   10 b 4
3:   60 c 9

但是如果我的时间没有相同的精度呢? p>

But what if my times didn't have the same precision? For example:

DT2=data.table(x=rep(c("a","b","c"),each=1),time=c(17,54,3))

合并,但是DT2的接收时间接近DT1的接收时间。这是17接近30,54接近60,并且3接近10?

Is there a way to perform a similar merge but pick times of DT2 that are close to those of DT1? That is 17 would be close to 30, 54 close to 60, and 3 close to 10?

这个简单的例子不清楚,我会简要解释我有更大的问题我有一个数据表与列:category,time,output1,output2 ...有数百个类别与关联的时间。想要在特定时间为所有类别拉出输出1.由于时间是没有明显逻辑的采样,有时时间四舍五入到最接近的偶数秒;在其他情况下,时间将四舍五入到最接近的分钟或甚至10分钟。

If this simple example isn't clear, I'll briefly explain the larger problem that I am having. I have a data table with columns: category, time, output1, output2... There are hundreds of categories with associated times. I might want to pull output 1 for all categories at a specific time. Since the times were sampled with no apparent logic, sometimes the time is rounded to the nearest even second; in other cases the times are rounded to the nearest minute or even 10 minutes.

我可以写一个脚本以更常见的格式重写时间,但我很好奇,如果有是一个光滑的data.table解决方案,我没有看到。

I could write a script to rewrite times in more common format, but I am curious if there is a slick data.table solution that I haven't seen. I have explored the rolling merge with no success.

推荐答案

另一个选项可能是 roll ='nearest' (在CRAN上为v1.8.8中的新功能)。

Another option may be roll='nearest' (new in v1.8.8 on CRAN).

> setkey(DT1,x,time)
> DT1
   x time v
1: a   10 1
2: a   30 2
3: a   60 3
4: b   10 4
5: b   30 5
6: b   60 6
7: c   10 7
8: c   30 8
9: c   60 9
> DT2
   x time
1: a   17
2: b   54
3: c    3
> DT1[DT2,roll="nearest"]
   x time v
1: a   17 1
2: b   54 6
3: c    3 7

请注意,17似乎更接近10比30,因此结果在第一行。

Note that 17 appears to be closer to 10 than 30, hence the result in the first row.

如果您需要滚动到下一次观察(下一次观察结果向后):

If you need to roll to the next observation (next observation carried backwards) :

> DT1[DT2,roll=-Inf]
   x time v
1: a   17 2
2: b   54 6
3: c    3 7

这篇关于连接R数据表,其中键值不完全相等 - 将行与最接近的时间组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆