Julia vs.MATLAB-距离矩阵-运行时测试 [英] Julia vs. MATLAB - Distance Matrix - Run Time Test

查看：65 发布时间：2021/2/14 19:11:59 matlab performance julia linear-algebra

本文介绍了Julia vs.MATLAB-距离矩阵-运行时测试的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

不久前，我开始学习Julia，因此我决定做一个简单的一个简单的计算欧几里得代码的比较Julia和Matlab之间的比较一组高维点的距离矩阵.

任务很简单，可以分为两种情况:

情况1:给定两个以n x d矩阵形式表示的数据集，例如X1和X2，计算X1中每个点与X2中所有点之间的成对欧几里德距离.如果X1的大小为n1 x d，而X2的大小为n2 x d，则所得的欧几里得距离矩阵D的大小将为n1 x n2.通常情况下，矩阵D不对称，对角线元素不等于零.

情况2:给定一个数据集，其形式为nxd矩阵X，计算X中所有n个点之间的成对欧几里德距离.结果，欧几里德距离矩阵D的大小为nxn，对称，主对角线上有零个元素.

我在Matlab和Julia中对这些功能的实现如下.请注意，所有实现都不依赖于任何类型的循环，而是依赖于简单的线性代数运算.另外，请注意，使用这两种语言的实现非常相似.

在对这些实现进行任何测试之前，我的期望是Julia代码将比Matlab代码快得多，而且幅度要大得多.令我惊讶的是，事实并非如此！

我的实验参数在下面的代码中给出.我的机器是MacBook Pro. (2015年中的15英寸)，带有2.8 GHz英特尔酷睿i7(四核)和16 GB 1600 MHz DDR3.

Matlab版本:R2018a

Julia版本:0.6.3
BLAS:libopenblas(USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK:libopenblas64_
LIBM:libopenlibm
LLVM:libLLVM-3.9.1(ORCJIT，haswell)

结果在下表(1)中给出.

表1:在计算两个不同数据集之间的欧几里得距离矩阵的30次试验中，平均时间(以标准差为单位)以秒为单位(第1组)，以及一个数据集中所有成对点之间(第2组).

          Two Datasets  ||  One Dataset

Matlab: 2.68(0.12)秒 1.88(0.04)秒.

Julia V1: 5.38(0.17)秒 4.74(0.05)秒.

Julia V2: 5.2(0.1)秒

我没想到两种语言之间会有这种显着差异.我希望朱莉娅比Matlab更快，或者至少与Matlab一样快.看到Matlab在这个特定任务中的速度比Julia快2.5倍，真是令人惊讶.由于某些原因，我不想基于这些结果得出任何早期结论.

第一个 ，虽然我认为我的Matlab实现效果很好，但我想知道我的Julia实现是否是完成此任务的最佳选择.我仍在学习Julia，希望有一个更高效的Julia代码可以为该任务带来更快的计算时间.特别是，Julia在此任务中的主要瓶颈在哪里?或者，为什么Matlab在这种情况下有优势?

第二，我当前的Julia程序包基于适用于MacOS的通用和标准BLAS和LAPACK程序包.我想知道基于Intel MKL的具有BLAS和LAPACK的JuliaPro是否会比我正在使用的当前版本更快.这就是为什么我选择从更多有知识的人那里获得有关StackOverflow的反馈的原因.

第三的原因是我想知道Julia的编译时间是否为包括在表1(第二行和第三行)中所示的时间中，以及是否有更好的方法来评估函数的执行时间.

如果对我之前的三个问题有任何反馈，我将不胜感激.

谢谢！

提示:该问题已被识别为StackOverflow上另一个问题的可能重复.但是，这并非完全正确.该问题具有三个方面，如以下答案所示.首先，是的，问题的一部分与OpenBLAS与MKL的比较有关.其次，事实证明，如答案之一所示，实施方式也可以得到改进.最后，可以使用BenchmarkTools.jl来对julia代码本身进行基准标记.

MATLAB

num_trials = 30;
dim = 1000;
n1 = 10000;
n2 = 10000;

T = zeros(num_trials,1);

XX1 = randn(n1,dim);
XX2 = rand(n2,dim);

%%% DIFEERENT MATRICES
DD2ds = zeros(n1,n2);

for (i = 1:num_trials)
  tic;
  DD2ds = distmat_euc2ds(XX1,XX2);
  T(i) = toc;
end

mt = mean(T);
st = std(T);

fprintf(1,'\nDifferent Matrices:: dim: %d, n1 x n2: %d x %d -> Avg. Time %f (+- %f) \n',dim,n1,n2,mt,st);

%%% SAME Matrix 
T = zeros(num_trials,1);

DD1ds = zeros(n1,n1);

for (i = 1:num_trials)
  tic;
  DD1ds = distmat_euc1ds(XX1);
  T(i) = toc;
end

mt = mean(T);
st = std(T);

fprintf(1,'\nSame Matrix:: dim: %d, n1 x n1 : %d x %d -> Avg. Time %f (+- %f) \n\n',dim,n1,n1,mt,st);

distmat_euc2ds.m

function [DD] = distmat_euc2ds (XX1,XX2)
    n1 = size(XX1,1);
    n2 = size(XX2,1);
    DD = sqrt(ones(n1,1)*sum(XX2.^2.0,2)' + (ones(n2,1)*sum(XX1.^2.0,2)')' - 2.*XX1*XX2');
end

distmat_euc1ds.m

function [DD] = distmat_euc1ds (XX)
    n1 = size(XX,1);
    GG = XX*XX';
    DD = sqrt(ones(n1,1)*diag(GG)' + diag(GG)*ones(1,n1) - 2.*GG);
end

朱莉娅

include("distmat_euc.jl")

num_trials = 30;

dim = 1000;
n1 = 10000;
n2 = 10000;

T = zeros(num_trials);

XX1 = randn(n1,dim)
XX2 = rand(n2,dim)

DD = zeros(n1,n2)

# Euclidean Distance Matrix: Two Different Matrices V1
# ====================================================
for i = 1:num_trials
    tic()
    DD = distmat_eucv1(XX1,XX2)
    T[i] = toq();
end

mt = mean(T)
st = std(T)

println("Different Matrices V1:: dim:$dim, n1 x n2: $n1 x $n2 -> Avg. Time $mt (+- $st)")


# Euclidean Distance Matrix: Two Different Matrices V2
# ====================================================
for i = 1:num_trials
    tic()
    DD = distmat_eucv2(XX1,XX2)
    T[i] = toq();
end

mt = mean(T)
st = std(T)

println("Different Matrices V2:: dim:$dim, n1 x n2: $n1 x $n2 -> Avg. Time $mt (+- $st)")


# Euclidean Distance Matrix: Same Matrix V1
# =========================================
for i = 1:num_trials
    tic()
    DD = distmat_eucv1(XX1)
    T[i] = toq();
end

mt = mean(T)
st = std(T)

println("Same Matrix V1:: dim:$dim, n1 x n2: $n1 x $n2 -> Avg. Time $mt (+- $st)")

distmat_euc.jl

function distmat_eucv1(XX1::Array{Float64,2},XX2::Array{Float64,2})
    (num1,dim1) = size(XX1)
    (num2,dim2) = size(XX2)

    if (dim1 != dim2)
        error("Matrices' 2nd dimensions must agree!")
    end

    DD = sqrt.((ones(num1)*sum(XX2.^2.0,2)') +
         (ones(num2)*sum(XX1.^2.0,2)')' - 2.0.*XX1*XX2');
end


function distmat_eucv2(XX1::Array{Float64,2},XX2::Array{Float64,2})
    (num1,dim1) = size(XX1)
    (num2,dim2) = size(XX2)

    if (dim1 != dim2)
        error("Matrices' 2nd dimensions must agree!")
    end

    DD = (ones(num1)*sum(Base.FastMath.pow_fast.(XX2,2.0),2)') +
         (ones(num2)*sum(Base.FastMath.pow_fast.(XX1,2.0),2)')' -
         Base.LinAlg.BLAS.gemm('N','T',2.0,XX1,XX2);

    DD = Base.FastMath.sqrt_fast.(DD)
end


function distmat_eucv1(XX::Array{Float64,2})
    n = size(XX,1)
    GG = XX*XX';
    DD = sqrt.(ones(n)*diag(GG)' + diag(GG)*ones(1,n) - 2.0.*GG)
end

解决方案

第一个问题:如果我这样重写julia distance函数:

function dist2(X1::Matrix, X2::Matrix)
    size(X1, 2) != size(X2, 2) && error("Matrices' 2nd dimensions must agree!")
    return sqrt.(sum(abs2, X1, 2) .+ sum(abs2, X2, 2)' .- 2 .* (X1 * X2'))
end

我将执行时间减少了40％以上.

对于单个数据集，您可以节省更多，如下所示:

function dist2(X::Matrix)
    G = X * X'
    dG = diag(G)
    return sqrt.(dG .+ dG' .- 2 .* G)
end

第三个问题:您应该使用BenchmarkTools.jl进行基准测试，并执行这样的基准测试(对于变量插值，请记住$):

julia> using BenchmarkTools
julia> @btime dist2($XX1, $XX2);

此外，您不应该使用浮点数来执行电源操作，例如:X.^2.0.写入X.^2更快，也同样正确.

对于乘法，在2.0 .* X和2 .* X之间没有速度差异，但是您仍然应该首选使用整数，因为它更通用.例如，如果X具有Float32个元素，则与2.0相乘将把数组提升为Float64 s，而与2相乘将保留eltype.

最后，请注意，在新版本的Matlab中，您也可以通过简单地将Mx1数组与1xN数组相加来获得广播行为.不需要先将它们乘以ones(...)来进行扩展.

I started learning Julia not a long time ago and I decided to do a simple comparison between Julia and Matlab on a simple code for computing Euclidean distance matrices from a set of high dimensional points.

The task is simple and can be divided into two cases:

Case 1: Given two datasets in the form of n x d matrices, say X1 and X2, compute the pair wise Euclidean distance between each point in X1 and all the points in X2. If X1 is of size n1 x d, and X2 is of size n2 x d, then the resulting Euclidean distance matrix D will be of size n1 x n2. In the general setting, matrix D is not symmetric, and diagonal elements are not equal to zero.

Case 2: Given one dataset in the form of n x d matrix X, compute the pair wise Euclidean distance between all the n points in X. The resulting Euclidean distance matrix D will be of size n x n, symmetric, with zero elements on the main diagonal.

My implementation of these functions in Matlab and in Julia is given below. Note that none of the implementations rely on loops of any sort, but rather simple linear algebra operations. Also, note that the implementation using both languages is very similar.

My expectations before running any tests for these implementations is that the Julia code will be much faster than the Matlab code, and by a significant margin. To my surprise, this was not the case!

The parameters for my experiments are given below with the code. My machine is a MacBook Pro. (15" Mid 2015) with 2.8 GHz Intel Core i7 (Quad Core), and 16 GB 1600 MHz DDR3.

Matlab version: R2018a

Julia version: 0.6.3
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)

The results are given in Table (1) below.

Table 1: Average time in seconds (with standard deviation) over 30 trials for computing Euclidean distance matrices between two different datasets (Col. 1), and between all pairwise points in one dataset (Col. 2).

          Two Datasets  ||  One Dataset

Matlab: 2.68 (0.12) sec. 1.88 (0.04) sec.

Julia V1: 5.38 (0.17) sec. 4.74 (0.05) sec.

Julia V2: 5.2 (0.1) sec.

I was not expecting this significant difference between both languages. I expected Julia to be faster than Matlab, or at least, as fast as Matlab. It was really a surprise to see that Matlab is almost 2.5 times faster than Julia in this particular task. I didn't want to draw any early conclusions based on these results for few reasons.

First, while I think that my Matlab implementation is as good as it can be, I'm wondering whether my Julia implementation is the best one for this task. I'm still learning Julia and I hope there is a more efficient Julia code that can yield faster computation time for this task. In particular, where is the main bottleneck for Julia in this task? Or, why does Matlab have an edge in this case?

Second, my current Julia package is based on the generic and standard BLAS and LAPACK packages for MacOS. I'm wondering whether JuliaPro with BLAS and LAPACK based on Intel MKL will be faster than the current version I'm using. This is why I opted to get some feedback from more knowledgeable people on StackOverflow.

The third reason is that I'm wondering whether the compile time for Julia was included in the timings shown in Table 1 (2nd and 3rd rows), and whether there is a better way to assess the execution time for a function.

I will appreciate any feedback on my previous three questions.

Thank you!

Hint: This question has been identified as a possible duplicate of another question on StackOverflow. However, this is not entirely true. This question has three aspects as reflected by the answers below. First, yes, one part of the question is related to the comparison of OpenBLAS vs. MKL. Second, it turns out that the implementation as well can be improved as shown by one of the answers. And last, bench-marking the julia code itself can be improved by using BenchmarkTools.jl.

MATLAB

num_trials = 30;
dim = 1000;
n1 = 10000;
n2 = 10000;

T = zeros(num_trials,1);

XX1 = randn(n1,dim);
XX2 = rand(n2,dim);

%%% DIFEERENT MATRICES
DD2ds = zeros(n1,n2);

for (i = 1:num_trials)
  tic;
  DD2ds = distmat_euc2ds(XX1,XX2);
  T(i) = toc;
end

mt = mean(T);
st = std(T);

fprintf(1,'\nDifferent Matrices:: dim: %d, n1 x n2: %d x %d -> Avg. Time %f (+- %f) \n',dim,n1,n2,mt,st);

%%% SAME Matrix 
T = zeros(num_trials,1);

DD1ds = zeros(n1,n1);

for (i = 1:num_trials)
  tic;
  DD1ds = distmat_euc1ds(XX1);
  T(i) = toc;
end

mt = mean(T);
st = std(T);

fprintf(1,'\nSame Matrix:: dim: %d, n1 x n1 : %d x %d -> Avg. Time %f (+- %f) \n\n',dim,n1,n1,mt,st);

distmat_euc2ds.m

function [DD] = distmat_euc2ds (XX1,XX2)
    n1 = size(XX1,1);
    n2 = size(XX2,1);
    DD = sqrt(ones(n1,1)*sum(XX2.^2.0,2)' + (ones(n2,1)*sum(XX1.^2.0,2)')' - 2.*XX1*XX2');
end

distmat_euc1ds.m

function [DD] = distmat_euc1ds (XX)
    n1 = size(XX,1);
    GG = XX*XX';
    DD = sqrt(ones(n1,1)*diag(GG)' + diag(GG)*ones(1,n1) - 2.*GG);
end

JULIA

include("distmat_euc.jl")

num_trials = 30;

dim = 1000;
n1 = 10000;
n2 = 10000;

T = zeros(num_trials);

XX1 = randn(n1,dim)
XX2 = rand(n2,dim)

DD = zeros(n1,n2)

# Euclidean Distance Matrix: Two Different Matrices V1
# ====================================================
for i = 1:num_trials
    tic()
    DD = distmat_eucv1(XX1,XX2)
    T[i] = toq();
end

mt = mean(T)
st = std(T)

println("Different Matrices V1:: dim:$dim, n1 x n2: $n1 x $n2 -> Avg. Time $mt (+- $st)")


# Euclidean Distance Matrix: Two Different Matrices V2
# ====================================================
for i = 1:num_trials
    tic()
    DD = distmat_eucv2(XX1,XX2)
    T[i] = toq();
end

mt = mean(T)
st = std(T)

println("Different Matrices V2:: dim:$dim, n1 x n2: $n1 x $n2 -> Avg. Time $mt (+- $st)")


# Euclidean Distance Matrix: Same Matrix V1
# =========================================
for i = 1:num_trials
    tic()
    DD = distmat_eucv1(XX1)
    T[i] = toq();
end

mt = mean(T)
st = std(T)

println("Same Matrix V1:: dim:$dim, n1 x n2: $n1 x $n2 -> Avg. Time $mt (+- $st)")

distmat_euc.jl

function distmat_eucv1(XX1::Array{Float64,2},XX2::Array{Float64,2})
    (num1,dim1) = size(XX1)
    (num2,dim2) = size(XX2)

    if (dim1 != dim2)
        error("Matrices' 2nd dimensions must agree!")
    end

    DD = sqrt.((ones(num1)*sum(XX2.^2.0,2)') +
         (ones(num2)*sum(XX1.^2.0,2)')' - 2.0.*XX1*XX2');
end


function distmat_eucv2(XX1::Array{Float64,2},XX2::Array{Float64,2})
    (num1,dim1) = size(XX1)
    (num2,dim2) = size(XX2)

    if (dim1 != dim2)
        error("Matrices' 2nd dimensions must agree!")
    end

    DD = (ones(num1)*sum(Base.FastMath.pow_fast.(XX2,2.0),2)') +
         (ones(num2)*sum(Base.FastMath.pow_fast.(XX1,2.0),2)')' -
         Base.LinAlg.BLAS.gemm('N','T',2.0,XX1,XX2);

    DD = Base.FastMath.sqrt_fast.(DD)
end


function distmat_eucv1(XX::Array{Float64,2})
    n = size(XX,1)
    GG = XX*XX';
    DD = sqrt.(ones(n)*diag(GG)' + diag(GG)*ones(1,n) - 2.0.*GG)
end

解决方案

First question: If I re-write the julia distance function like so:

function dist2(X1::Matrix, X2::Matrix)
    size(X1, 2) != size(X2, 2) && error("Matrices' 2nd dimensions must agree!")
    return sqrt.(sum(abs2, X1, 2) .+ sum(abs2, X2, 2)' .- 2 .* (X1 * X2'))
end

I shave >40% off the execution time.

For a single dataset you can save a bit more, like this:

function dist2(X::Matrix)
    G = X * X'
    dG = diag(G)
    return sqrt.(dG .+ dG' .- 2 .* G)
end

Third question: You should do your benchmarking with BenchmarkTools.jl, and perform the benchmarking like this (remember $ for variable interpolation):

julia> using BenchmarkTools
julia> @btime dist2($XX1, $XX2);

Additionally, you should not do powers using floats, like this: X.^2.0. It is faster, and equally correct to write X.^2.

For multiplication there is no speed difference between 2.0 .* X and 2 .* X, but you should still prefer using an integer, because it is more generic. As an example, if X has Float32 elements, multiplying with 2.0 will promote the array to Float64s, while multiplying with 2 will preserve the eltype.

And finally, note that in new versions of Matlab, too, you can get broadcasting behaviour by simply adding Mx1 arrays with 1xN arrays. There is no need to first expand them by multiplying with ones(...).

这篇关于Julia vs.MATLAB-距离矩阵-运行时测试的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Julia vs.MATLAB-距离矩阵-运行时测试 [英] Julia vs. MATLAB - Distance Matrix - Run Time Test

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Julia vs.MATLAB-距离矩阵-运行时测试 [英] Julia vs. MATLAB - Distance Matrix - Run Time Test

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭