计算数据帧的每行与另一个数据帧中的所有其他行之间的欧几里得dist [英] calculating the euclidean dist between each row of a dataframe with all other rows in another dataframe

查看:144
本文介绍了计算数据帧的每行与另一个数据帧中的所有其他行之间的欧几里得dist的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要生成一个数据帧的每一行与另一个数据帧的所有其他行之间的最小欧几里得距离的数据帧。我的数据帧都很大(大约40,000行)。这是我到现在为止所能做的。 (c(3,6,3,4,8),nrow = 5,ncol = 7,byrow = c) (c(1,4,4,1,9),nrow = 5,ncol = 7,byrow = TRUE)


sed。 dist <-numeric(5)
for(i in 1 :( length(sed.dist))){
sed.dist [i] <-( sqrt(sum((y [ 1:7] - x [i,1:7])^ 2)))
}



但是,这只有当i = j时才起作用。我基本上需要的是通过循环遍历每一行(y [1,1:7],然后y [2,1:7])来找到最小的欧氏距离。 (x [i,1:7])的所有行的y数据帧中,直到i = 5),每次这样做,我需要它找到最小的欧氏距离对于y数据帧的第i行和x数据帧的所有行,每次计算并存储在另一个数据帧中。$ b

解决方案

扩展我对这个问题的评论,一个相当快的方法将是以下,虽然有40,000行,你将不得不等待一下,我猜:

$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ i,] - t(x))^ 2))))
#[1] 5.196152 5.385165 4.898979 4.898979 5.385165

和一个比较基准:

$ $ p $ x =矩阵(runif(1e2 * 5),1e2)
y = matrix(runif(1e2 * 5),1e2)
library(microbenchmark)
alex = function()unlist(lapply(seq_len(nrow(y)),
function (y)1(y)min(sqrt(colSums((y [i,] - t(x))^ 2))))
jlhoward = function (b,b,c)(b,b,c) [1] TRUE
microbenchmark(alex(),jlhoward(),times = 20)
#Unit:milliseconds
#expr min lq medi a uq max neval
#alex()3.369188 3.479011 3.600354 4.513114 4.789592 20
#jlhoward()422.198621 431.565643 436.561057 442.643181 602.929742 20


I need to generate a dataframe with minimum euclidean distance between each row of a dataframe and all other rows of another dataframe.Both my dataframes are large (approx 40,000 rows).This is what I could work out till now.

x<-matrix(c(3,6,3,4,8),nrow=5,ncol=7,byrow = TRUE)     
y<-matrix(c(1,4,4,1,9),nrow=5,ncol=7,byrow = TRUE)


sed.dist<-numeric(5)
for (i in 1:(length(sed.dist))) {
sed.dist[i]<-(sqrt(sum((y[i,1:7] - x[i,1:7])^2)))
}

But this only works when i=j.What I essentially need is to find the minimum euclidean distance by looping over every row one by one ( y[1,1:7],then y[2,1:7] and so on till i= 5 ) of the "y" dataframe with all the rows of the "x"dataframe(x[i,1:7]).Each time it does this,I need it to find the minimum euclidean distance for each computation of row i of the y dataframe and all the rows of the x dataframe and store it in another dataframe.

解决方案

Expanding on my comment on the question, a pretty fast approach would be the following, although with 40,000 rows you'll have to wait a bit, I guess:

unlist(lapply(seq_len(nrow(y)), function(i) min(sqrt(colSums((y[i, ] - t(x))^2)))))
#[1] 5.196152 5.385165 4.898979 4.898979 5.385165

And a comparing benchmarking:

x = matrix(runif(1e2*5), 1e2)
y = matrix(runif(1e2*5), 1e2)
library(microbenchmark)
alex = function() unlist(lapply(seq_len(nrow(y)), 
                           function(i) min(sqrt(colSums((y[i, ] - t(x))^2)))))
jlhoward = function() apply(y,1,function(y)
                                  min(apply(x,1,function(x,y)dist(rbind(x,y)),y)))
all.equal(alex(), jlhoward())
#[1] TRUE
microbenchmark(alex(), jlhoward(), times = 20)
#Unit: milliseconds
#       expr        min         lq     median         uq        max neval
#     alex()   3.369188   3.479011   3.600354   4.513114   4.789592    20
# jlhoward() 422.198621 431.565643 436.561057 442.643181 602.929742    20

这篇关于计算数据帧的每行与另一个数据帧中的所有其他行之间的欧几里得dist的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆