连接R数据表，其中键值不完全相等 - 将行与最接近的时间组合 [英] Join R data.tables where key values are not exactly equal--combine rows with closest times

查看：126 发布时间：2017/3/12 10:46:27 r time data.table

本文介绍了连接R数据表，其中键值不完全相等 - 将行与最接近的时间组合的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有一个光滑的方法来连接数据表在R中的时间的键值接近，但不完全相同？例如，假设我有一个为不同时间段提供的结果数据表：

Is there a slick way to join data tables in R where key values of time are close, but not exactly the same? For example, suppose I have a data table of results that are given for different time periods:

DT1 = data.table(x=rep(c("a","b","c"),each=3), time=c(10,30,60), v=1:9)

这里，我们有不同类别（x）在不同时间（时间）拍摄的一些值（v）。现在假设我有另一个来源的数据，为不同的类别提供一些时间值：

Here, we have some values (v) for different categories (x) taken at different times (time). Now suppose that I have data from another source that provides some time values for the different categories:

DT2=data.table(x=rep(c("a","b","c"),each=1),time=c(10,10,60))

我可能有兴趣尝试匹配DT2中的时间，尽可能接近DT1来预测DT2类别的值v。我想做一些像

I might be interested in trying to match the times in DT2 as closely as I can to DT1 to predict a value, v, for my DT2 categories. I would like to do something like

setkeyv(DT2,c("x","time"))
merge(DT1,DT2,by=c("time","v")

   time x v
1:   10 a 1
2:   10 b 4
3:   60 c 9

但是如果我的时间没有相同的精度呢？ p>

But what if my times didn't have the same precision? For example:

DT2=data.table(x=rep(c("a","b","c"),each=1),time=c(17,54,3))

合并，但是DT2的接收时间接近DT1的接收时间。这是17接近30,54接近60，并且3接近10？

Is there a way to perform a similar merge but pick times of DT2 that are close to those of DT1? That is 17 would be close to 30, 54 close to 60, and 3 close to 10?

这个简单的例子不清楚，我会简要解释我有更大的问题我有一个数据表与列：category，time，output1，output2 ...有数百个类别与关联的时间。想要在特定时间为所有类别拉出输出1.由于时间是没有明显逻辑的采样，有时时间四舍五入到最接近的偶数秒;在其他情况下，时间将四舍五入到最接近的分钟或甚至10分钟。

If this simple example isn't clear, I'll briefly explain the larger problem that I am having. I have a data table with columns: category, time, output1, output2... There are hundreds of categories with associated times. I might want to pull output 1 for all categories at a specific time. Since the times were sampled with no apparent logic, sometimes the time is rounded to the nearest even second; in other cases the times are rounded to the nearest minute or even 10 minutes.

我可以写一个脚本以更常见的格式重写时间，但我很好奇，如果有是一个光滑的data.table解决方案，我没有看到。

I could write a script to rewrite times in more common format, but I am curious if there is a slick data.table solution that I haven't seen. I have explored the rolling merge with no success.

推荐答案

另一个选项可能是 roll ='nearest' （在CRAN上为v1.8.8中的新功能）。

Another option may be roll='nearest' (new in v1.8.8 on CRAN).

> setkey(DT1,x,time)
> DT1
   x time v
1: a   10 1
2: a   30 2
3: a   60 3
4: b   10 4
5: b   30 5
6: b   60 6
7: c   10 7
8: c   30 8
9: c   60 9
> DT2
   x time
1: a   17
2: b   54
3: c    3
> DT1[DT2,roll="nearest"]
   x time v
1: a   17 1
2: b   54 6
3: c    3 7

请注意，17似乎更接近10比30，因此结果在第一行。

Note that 17 appears to be closer to 10 than 30, hence the result in the first row.

如果您需要滚动到下一次观察（下一次观察结果向后）：

If you need to roll to the next observation (next observation carried backwards) :

> DT1[DT2,roll=-Inf]
   x time v
1: a   17 2
2: b   54 6
3: c    3 7

这篇关于连接R数据表，其中键值不完全相等 - 将行与最接近的时间组合的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

连接R数据表，其中键值不完全相等 - 将行与最接近的时间组合 [英] Join R data.tables where key values are not exactly equal--combine rows with closest times

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

连接R数据表，其中键值不完全相等 - 将行与最接近的时间组合 [英] Join R data.tables where key values are not exactly equal--combine rows with closest times

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭