(速度挑战)用欧几里得距离来计算两个矩阵的行之间的距离矩阵的更快方法? [英] (Speed Challenge) Any faster method to calculate distance matrix between rows of two matrices, in terms of Euclidean distance?
问题描述
首先,这不是不的问题,计算两个矩阵之间的欧几里得距离
First of all, this is NOT the problem of calculating Euclidean distance between two matrices.
假设我有两个矩阵x
和y
,例如
Assuming I have two matrices x
and y
, e.g.,
set.seed(1)
x <- matrix(rnorm(15), ncol=5)
y <- matrix(rnorm(20), ncol=5)
其中
> x
[,1] [,2] [,3] [,4] [,5]
[1,] -0.6264538 1.5952808 0.4874291 -0.3053884 -0.6212406
[2,] 0.1836433 0.3295078 0.7383247 1.5117812 -2.2146999
[3,] -0.8356286 -0.8204684 0.5757814 0.3898432 1.1249309
> y
[,1] [,2] [,3] [,4] [,5]
[1,] -0.04493361 0.59390132 -1.98935170 -1.4707524 -0.10278773
[2,] -0.01619026 0.91897737 0.61982575 -0.4781501 0.38767161
[3,] 0.94383621 0.78213630 -0.05612874 0.4179416 -0.05380504
[4,] 0.82122120 0.07456498 -0.15579551 1.3586796 -1.37705956
然后我要获取尺寸为3 x 4的距离矩阵distmat
,其中元素distmat[i,j]
是norm(x[1,]-y[2,],"2")
或dist(rbind(x[1,],y[2,]))
的值.
Then I want to get distance matrix distmat
of dimension 3-by-4, where the element distmat[i,j]
is the value from norm(x[1,]-y[2,],"2")
or dist(rbind(x[1,],y[2,]))
.
- 我的代码
distmat <- as.matrix(unname(unstack(within(idx<-expand.grid(seq(nrow(x)),seq(nrow(y))), d <-sqrt(rowSums((x[Var1,]-y[Var2,])**2))), d~Var2)))
给出
> distmat
[,1] [,2] [,3] [,4]
[1,] 3.016991 1.376622 2.065831 2.857002
[2,] 4.573625 3.336707 2.698124 1.412811
[3,] 3.764925 2.235186 2.743056 3.358577
但是当使用x
和y
大量行时,我认为我的代码不够优雅或高效.
but I don't think my code is elegant or efficient enough when with x
and y
of large number of rows.
- 目标
我期待使用 的 更快 和 更优雅 的代码此目标的基础R .预先感谢!
I am looking forward to a much faster and more elegant code with base R for this goal. Appreciated in advance!
- 基准模板(正在更新)
- Benchmark Template (in updating)
为方便起见,您可以使用以下代码进行基准测试,以查看代码是否更快:
For your convenience, you can use the following for benchmark to see if your code is faster:
set.seed(1)
x <- matrix(rnorm(15000), ncol=5)
y <- matrix(rnorm(20000), ncol=5)
# my customized approach
method_ThomasIsCoding_v1 <- function() {
as.matrix(unname(unstack(within(idx<-expand.grid(seq(nrow(x)),seq(nrow(y))), d <-sqrt(rowSums((x[Var1,]-y[Var2,])**2))), d~Var2)))
}
method_ThomasIsCoding_v2 <- function() {
`dim<-`(with(idx<-expand.grid(seq(nrow(x)),seq(nrow(y))), sqrt(rowSums((x[Var1,]-y[Var2,])**2))),c(nrow(x),nrow(y)))
}
method_ThomasIsCoding_v3 <- function() {
`dim<-`(with(idx1<-list(Var1 = rep(1:nrow(x), nrow(y)), Var2 = rep(1:nrow(y), each = nrow(x))), sqrt(rowSums((x[Var1,]-y[Var2,])**2))),c(nrow(x),nrow(y)))
}
# approach by AllanCameron
method_AllanCameron <- function()
{
`dim<-`(sqrt(rowSums((x[rep(1:nrow(x), nrow(y)),] - y[rep(1:nrow(y), each = nrow(x)),])^2)), c(nrow(x), nrow(y)))
}
# approach by F.Prive
method_F.Prive <- function() {
sqrt(outer(rowSums(x^2), rowSums(y^2), '+') - tcrossprod(x, 2 * y))
}
# an existing approach by A. Webb from https://stackoverflow.com/a/35107198/12158757
method_A.Webb <- function() {
euclidean_distance <- function(p,q) sqrt(sum((p - q)**2))
outer(
data.frame(t(x)),
data.frame(t(y)),
Vectorize(euclidean_distance)
)
}
bm <- microbenchmark::microbenchmark(
method_ThomasIsCoding_v1(),
method_ThomasIsCoding_v2(),
method_ThomasIsCoding_v3(),
method_AllanCameron(),
method_F.Prive(),
# method_A.Webb(),
unit = "relative",
check = "equivalent",
times = 10
)
bm
如此
Unit: relative
expr min lq mean median uq max neval
method_ThomasIsCoding_v1() 9.471806 8.838704 7.308433 7.567879 6.989114 5.429136 10
method_ThomasIsCoding_v2() 4.623405 4.469646 3.817199 4.024436 3.703473 2.854471 10
method_ThomasIsCoding_v3() 4.881620 4.832024 4.070866 4.134011 3.924366 3.367746 10
method_AllanCameron() 5.654533 5.279920 4.436071 4.772527 4.184927 3.157814 10
method_F.Prive() 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 10
推荐答案
method_XXX <- function() {
sqrt(outer(rowSums(x^2), rowSums(y^2), '+') - tcrossprod(x, 2 * y))
}
Unit: relative
expr min lq mean median uq max
method_ThomasIsCoding_v1() 12.151624 10.486417 9.213107 10.162740 10.235274 5.278517
method_ThomasIsCoding_v2() 6.923647 6.055417 5.549395 6.161603 6.140484 3.438976
method_ThomasIsCoding_v3() 7.133525 6.218283 5.709549 6.438797 6.382204 3.383227
method_AllanCameron() 7.093680 6.071482 5.776172 6.447973 6.497385 3.608604
method_XXX() 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000
这篇关于(速度挑战)用欧几里得距离来计算两个矩阵的行之间的距离矩阵的更快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!