计算数据帧的每行与另一个数据帧中的所有其他行之间的欧几里得dist [英] calculating the euclidean dist between each row of a dataframe with all other rows in another dataframe
问题描述
sed。 dist <-numeric(5)
for(i in 1 :( length(sed.dist))){
sed.dist [i] <-( sqrt(sum((y [ 1:7] - x [i,1:7])^ 2)))
}
但是,这只有当i = j时才起作用。我基本上需要的是通过循环遍历每一行(y [1,1:7],然后y [2,1:7])来找到最小的欧氏距离。 (x [i,1:7])的所有行的y数据帧中,直到i = 5),每次这样做,我需要它找到最小的欧氏距离对于y数据帧的第i行和x数据帧的所有行,每次计算并存储在另一个数据帧中。$ b
扩展我对这个问题的评论,一个相当快的方法将是以下,虽然有40,000行,你将不得不等待一下,我猜:
$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ i,] - t(x))^ 2))))
#[1] 5.196152 5.385165 4.898979 4.898979 5.385165
和一个比较基准:
$ $ p $ x =矩阵(runif(1e2 * 5),1e2)
y = matrix(runif(1e2 * 5),1e2)
library(microbenchmark)
alex = function()unlist(lapply(seq_len(nrow(y)),
function (y)1(y)min(sqrt(colSums((y [i,] - t(x))^ 2))))
jlhoward = function (b,b,c)(b,b,c) [1] TRUE
microbenchmark(alex(),jlhoward(),times = 20)
#Unit:milliseconds
#expr min lq medi a uq max neval
#alex()3.369188 3.479011 3.600354 4.513114 4.789592 20
#jlhoward()422.198621 431.565643 436.561057 442.643181 602.929742 20
I need to generate a dataframe with minimum euclidean distance between each row of a dataframe and all other rows of another dataframe.Both my dataframes are large (approx 40,000 rows).This is what I could work out till now.
x<-matrix(c(3,6,3,4,8),nrow=5,ncol=7,byrow = TRUE)
y<-matrix(c(1,4,4,1,9),nrow=5,ncol=7,byrow = TRUE)
sed.dist<-numeric(5)
for (i in 1:(length(sed.dist))) {
sed.dist[i]<-(sqrt(sum((y[i,1:7] - x[i,1:7])^2)))
}
But this only works when i=j.What I essentially need is to find the minimum euclidean distance by looping over every row one by one ( y[1,1:7],then y[2,1:7] and so on till i= 5 ) of the "y" dataframe with all the rows of the "x"dataframe(x[i,1:7]).Each time it does this,I need it to find the minimum euclidean distance for each computation of row i of the y dataframe and all the rows of the x dataframe and store it in another dataframe.
Expanding on my comment on the question, a pretty fast approach would be the following, although with 40,000 rows you'll have to wait a bit, I guess:
unlist(lapply(seq_len(nrow(y)), function(i) min(sqrt(colSums((y[i, ] - t(x))^2)))))
#[1] 5.196152 5.385165 4.898979 4.898979 5.385165
And a comparing benchmarking:
x = matrix(runif(1e2*5), 1e2)
y = matrix(runif(1e2*5), 1e2)
library(microbenchmark)
alex = function() unlist(lapply(seq_len(nrow(y)),
function(i) min(sqrt(colSums((y[i, ] - t(x))^2)))))
jlhoward = function() apply(y,1,function(y)
min(apply(x,1,function(x,y)dist(rbind(x,y)),y)))
all.equal(alex(), jlhoward())
#[1] TRUE
microbenchmark(alex(), jlhoward(), times = 20)
#Unit: milliseconds
# expr min lq median uq max neval
# alex() 3.369188 3.479011 3.600354 4.513114 4.789592 20
# jlhoward() 422.198621 431.565643 436.561057 442.643181 602.929742 20
这篇关于计算数据帧的每行与另一个数据帧中的所有其他行之间的欧几里得dist的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!