通过两个最近的变量合并data.table [英] Merge data.table by two nearest variables

查看:161
本文介绍了通过两个最近的变量合并data.table的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据表,x,y坐标和一些其他信息,我想基于最近邻居距离,即在x和y的平方差的最小值上合并(dx_i = min([(x_i-说我有以下两组:

  DT1(x_j)^ 2 +(y_i-y_j)^ 2] = data.table(x = 1:5,y = 3:7)
DT2 = data.table(x = c(2,4,2,3,6),y = c(2.5,3.1, 2,3,5),Q = c('a','b','c','d','e'))

然后,合并的期望结果是:

  xy Q 
1:1 3 a
2:2 4 d
3:3 5 d
4:4 6 e
5:5 7 e

我当然可以在DT1上写一个循环来计算DT1中每一行的最近邻,然后根据这个计算进行合并,似乎破坏了数据表的目的,此外,对于几百万行的数据表,这将是非常缓慢的。



我知道对于一个列,我可以做一个最近邻居合并像这样

  DT2 [DT1,roll =nearest] 

但是,当我为要合并的表定义2个键(x和y)时(逻辑上)不起作用。 2参数最近邻居合并的类似语法是否存在?如果没有,是否有一个更聪明的方式做这只是循环,像我提到的?

解决方案

一种可能的解决方案:

  func = function(u,v)
{
vec = with(DT2,(ux)^ 2 +(vy)^ 2)
DT2 [which.min ,] $ Q
}

transform(DT1,Q = apply(DT1,1,function(u)func(u [1],u [2])))

#xy Q
#1:1 3 a
#2:2 4 d
#3:3 5 d
#4:4 6 e
#5:5 7 e


I have two data tables with x,y coordinates and some other info which I would like to merge based on nearest neighbour distance, i.e. on the minimum in squared difference of both x and y (dx_i =min ([(x_i-x_j)^2+(y_i-y_j)^2]^0.5). Say I have the following two sets:

DT1=data.table(x=1:5,y=3:7)    
DT2=data.table(x=c(2,4,2,3,6),y=c(2.5,3.1,2,3,5),Q=c('a','b','c','d','e'))

Then the desired result of the merge would be:

   x y Q
1: 1 3 a
2: 2 4 d
3: 3 5 d
4: 4 6 e
5: 5 7 e

I could of course write a loop over DT1 to calculate the nearest neighbour for each row in DT1 and then merge based on this calculation, but that seems to defeat the purpose of data tables. Moreover, that will be very slow for data tables of several million rows.

I know that for a single column I could do a nearest neighbour merge like this

DT2[DT1,roll="nearest"]

But that (logically) doesn't work when I define 2 keys (x and y) for the tables to be merged. Does a similar syntax for a 2-parameter nearest neighbour merge exist? If not, is there a smarter way to do this then just looping, like I mentioned?

解决方案

One possible solution:

func = function(u,v)
{
    vec = with(DT2, (u-x)^2 + (v-y)^2)
    DT2[which.min(vec),]$Q
}

transform(DT1, Q=apply(DT1, 1, function(u) func(u[1], u[2])))

#   x y Q
#1: 1 3 a
#2: 2 4 d
#3: 3 5 d
#4: 4 6 e
#5: 5 7 e

这篇关于通过两个最近的变量合并data.table的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆