自定义距离的距离矩阵 [英] Distance matrix for custom distance

查看:82
本文介绍了自定义距离的距离矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我所知,scipy 函数 scipy.spatial.distance_matrix 从提供的向量矩阵返回任意一对向量的闵可夫斯基距离.有没有办法在不同的距离上获得相同的结果?看起来像 distance_matrix(X, Y, distance_function) 的东西?

From what I understand, the scipy function scipy.spatial.distance_matrix returns the Minkowski distance for any pair of vectors from the provided matrices of vectors. Is there a way to get the same result for a different distance? Something that would look like distance_matrix(X, Y, distance_function) ?

我假设 scipy 在幕后做了某种优化.由于我正在处理非常大的向量,我宁愿通过实现我自己的 distance_matrix 函数来失去这些优化的好处.

I assume that scipy does some sort of optimization under the hood. Since I am dealing with very large vectors, I would rather not lose the benefit of these optimizations by implementing my own distance_matrix function.

推荐答案

自己实现很简单

此外,性能很可能比 scipy 中已经实现的距离函数要好.

It is quite straight forward to implement it yourself

Also the performance will very likely be better than the distance functions already implemented in scipy.

大多数距离函数都对所有对应用一个函数并将它们相加,例如.(A_ik-B_jk)**n 用于 Minkowski 距离,最后应用了一些其他函数,例如.acc**(1/n).

Most of the distance functions are applying one function on all pairs and sum them up eg. (A_ik-B_jk)**n for Minkowski distance and at the end there is some other function applied eg. acc**(1/n).

模板函数

您无需在此处更改任何内容即可实现各种距离函数.

You don't have to change anything here to implement various distance functions.

import numpy as np
import numba as nb

def gen_cust_dist_func(kernel_inner,kernel_outer,parallel=True):

    kernel_inner_nb=nb.njit(kernel_inner,fastmath=True)
    kernel_outer_nb=nb.njit(kernel_outer,fastmath=True)

    def cust_dot_T(A,B):
        assert B.shape[1]==A.shape[1]

        out=np.empty((A.shape[0],B.shape[0]),dtype=A.dtype)
        for i in nb.prange(A.shape[0]):
            for j in range(B.shape[0]):
                acc=0
                for k in range(A.shape[1]):
                    acc+=kernel_inner_nb(A[i,k],B[j,k])
                out[i,j]=kernel_outer_nb(acc)
        return out

    if parallel==True:
        return nb.njit(cust_dot_T,fastmath=True,parallel=True)
    else:
        return nb.njit(cust_dot_T,fastmath=True,parallel=False)

示例和时间

#Implement for example a Minkowski distance and euclidian distance
#Minkowski distance p=20
inner=lambda A,B:(A-B)**20
outer=lambda acc:acc**(1./20)
my_minkowski_dist=gen_cust_dist_func(inner,outer,parallel=True)

#Euclidian distance
inner=lambda A,B:(A-B)**2
outer=lambda acc:np.sqrt(acc)
my_euclidian_dist=gen_cust_dist_func(inner,outer,parallel=True)

from scipy.spatial.distance import cdist

A=np.random.rand(1000,50)
B=np.random.rand(1000,50)

#Minkowski p=20
%timeit res_1=cdist(A,B,'m',p=20)
#1.44 s ± 8.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit res_2=my_minkowski_dist(A,B)
#10.8 ms ± 105 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
res_1=cdist(A,B,'m',p=20)
res_2=my_minkowski_dist(A,B)
print(np.allclose(res_1,res_2))
#True

#Euclidian
%timeit res_1=cdist(A,B,'euclidean')
#39.3 ms ± 307 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit res_2=my_euclidian_dist(A,B)
#3.61 ms ± 22.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
res_1=res_1=cdist(A,B,'euclidean')
res_2=my_euclidian_dist(A,B)
print(np.allclose(res_1,res_2))
#True

这篇关于自定义距离的距离矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆