在 Python 中计算加权成对距离矩阵 [英] Calculate weighted pairwise distance matrix in Python

查看:81
本文介绍了在 Python 中计算加权成对距离矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图找到在 Python 中执行以下成对距离计算的最快方法.我想使用距离根据它们的相似性对 list_of_objects 进行排名.

I am trying to find the fastest way to perform the following pairwise distance calculation in Python. I want to use the distances to rank a list_of_objects by their similarity.

list_of_objects 中的每个项目都有四个测量值 a、b、c、d,它们是在非常不同的尺度上进行的,例如:

Each item in the list_of_objects is characterised by four measurements a, b, c, d, which are made on very different scales e.g.:

object_1 = [0.2, 4.5, 198, 0.003]
object_2 = [0.3, 2.0, 999, 0.001]
object_3 = [0.1, 9.2, 321, 0.023]
list_of_objects = [object_1, object_2, object_3]

目的是获得list_of_objects中对象的成对距离矩阵.但是,我希望能够通过权重向量在我的距离计算中指定每个度量的相对重要性",每个度量一个权重,例如:

The aim is to get a pairwise distance matrix of the objects in list_of_objects. However, I want to be able to specify the 'relative importance' of each measurement in my distance calculation via a weights vector with one weight per measurement, e.g.:

weights = [1, 1, 1, 1]

表示所有测量的权重相等.在这种情况下,我希望每次测量对物体之间的距离的贡献相等,而不管测量比例如何.或者:

would indicate that all measurements are equally weighted. In this case I want each measurement to contribute equally to the distance between objects, regardless of the measurement scale. Alternatively:

weights = [1, 1, 1, 10]

表示我希望测量值 d 对物体之间距离的贡献是其他测量值的 10 倍.

would indicate that I want measurement d to contribute 10x more than the other measurements to the distance between objects.

我目前的算法是这样的:

My current algorithm looks like this:

  1. 为每次测量计算成对距离矩阵
  2. 标准化每个距离矩阵,使最大值为 1
  3. 将每个距离矩阵乘以 weights
  4. 中的适当权重
  5. 对距离矩阵求和以生成单个成对矩阵
  6. 使用 4 中的矩阵提供来自 list_of_objects
  7. 的对象对的排名列表
  1. Calculate a pairwise distance matrix for each measurement
  2. Normalise each distance matrix so that the maximum is 1
  3. Multiply each distance matrix by the appropriate weight from weights
  4. Sum the distance matrices to generate a single pairwise matrix
  5. Use the matrix from 4 to provide a ranked list of pairs of objects from list_of_objects

这很好用,并为我提供了对象之间城市街区距离的加权版本.

This works fine, and gives me a weighted version of the city-block distance between objects.

我有两个问题:

  1. 在不改变算法的情况下,在 SciPy、NumPy 或 SciKit-Learn 中执行初始距离矩阵计算的最快实现是什么.

  1. Without changing the algorithm, what's the fastest implementation in SciPy, NumPy or SciKit-Learn to perform the initial distance matrix calculations.

是否有一种现有的多维距离方法可以为我完成所有这些工作?

Is there an existing multi-dimensional distance approach that does all of this for me?

对于 Q 2,我已经查看过,但找不到任何以我想要的方式执行相对重要性"的内置步骤.

For Q 2, I have looked, but couldn't find anything with a built-in step that does the 'relative importance' in the way that I want.

欢迎提出其他建议.很高兴澄清我是否遗漏了细节.

Other suggestions welcome. Happy to clarify if I've missed details.

推荐答案

scipy.spatial.distance 是您想要查看的模块.它有许多不同的规范,可以轻松应用.

scipy.spatial.distance is the module you'll want to have a look at. It has a lot of different norms that can be easily applied.

我建议使用加权 M​​onkowski Metrik

I'd recommend using the weighted Monkowski Metrik

加权 Minkowski Metrik

您可以使用此包中的 pdist 方法进行成对距离计算.

You can do pairwise distance calculation by using the pdist method from this package.

例如

import numpy as np
from scipy.spatial.distance import pdist, wminkowski, squareform

object_1 = [0.2, 4.5, 198, 0.003]
object_2 = [0.3, 2.0, 999, 0.001]
object_3 = [0.1, 9.2, 321, 0.023]
list_of_objects = [object_1, object_2, object_3]

# make a 3x4 array from the list of objects
X = np.array(list_of_objects)

#calculate pairwise distances, using weighted Minkowski norm
distances = pdist(X,wminkowski,2, [1,1,1,10])

#make a square matrix from result
distances_as_2d_matrix = squareform(distances)

print distances
print distances_as_2d_matrix

这将打印

[ 801.00390786  123.0899671   678.0382942 ]
[[   0.          801.00390786  123.0899671 ]
 [ 801.00390786    0.          678.0382942 ]
 [ 123.0899671   678.0382942     0.        ]]

这篇关于在 Python 中计算加权成对距离矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆