2个阵列上的成对Wasserstein距离 [英] Pairwise Wasserstein distance on 2 arrays
问题描述
我尝试比较运动形式,因此需要比较点(x,y)坐标的相似分布最终如何将它们聚类.我正在使用以下形式的3D阵列:
I try to compare sports formations and therefore need to compare how similar distributions of points (x, y) coordinates are to eventually cluster them. I am working with a 3D array of the following form:
import scipy.spatial.distance as distance
from scipy.optimize import linear_sum_assignment
from sklearn.metrics import pairwise_distances
import numpy as np
data = np.array([[[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]],
[[5, 6], [7, 8], [5, 6], [7, 8], [5, 6], [7, 8], [5, 6], [7, 8], [5, 6], [7, 8]]])
我为wasserstein距离实现了以下自定义指标(l和k只是用于数据的不同缩放以比较密度不同的形式):
I have implemented the following custom metric for the wasserstein distance (the l and k is just for different scaling of the data to compare formations of varying density):
def wasserstein_distance_function(f1, f2):
min_cost = np.inf
f1 = f1.reshape((10, 2))
f2 = f2.reshape((10, 2))
for l in np.linspace(0.8, 1.2, 3):
for k in np.linspace(0.8, 1.2, 3):
cost = distance.cdist(l * f1, k * f2, 'sqeuclidean')
row_ind, col_ind = linear_sum_assignment(cost)
curr_cost = cost[row_ind, col_ind].sum()
if curr_cost < min_cost:
min_cost = curr_cost
return min_cost
我的问题是:到目前为止,我如何通过sklearn实现成对比较.
My question is: how to I implement the pairwise comparison via sklearn, so far I got to:
def pairwise_wasserstein(points):
"""
Helper function to perform the pairwise distance function of all points within 'points' parameter
"""
# return pairwise_distances(points, metric=wasserstein_distance_function)
# print(points)
return pairwise_distances(points, metric=wasserstein_distance_function)
这让我要么丢了
ValueError: setting an array element with a sequence.
或
ValueError: Found array with dim 3. check_pairwise_arrays expected <= 2.
错误.我知道这与我的数据是3D np阵列有关,但是对于简单的比较,例如f.ex.
error. I understand that it has to do with my data being a 3D np array, but for simple comparisons, f.ex.
wasserstein_distance_function(data[0], data[1])
返回一个有效值.关于如何使其适用于大约1000个编队的阵列的任何线索,以寻找可以输入到AgglomerativeClustering算法中的成对距离?非常感谢!
returns a valid value. Any clues on how to get it to work for an array for maybe 1000 formations to find pairwise distances that I can feed into an AgglomerativeClustering algorithm? Thanks a lot!
推荐答案
问题是您的 wasserstein_distance_function()
要求输入为2D,但 pairwise_wasserstein()
也需要2D输入.由于 pairwise_wasserstein()
将输入拆分为成对计算,因此它将把2D数据拆分为一维数据,这将不再与您的 wasserstein_distance_function()
一起使用.而且,当您向 pairwise_wasserstein()
提供3D数据时,它会引发错误,因为它无法使用该数据.
The problem is that your wasserstein_distance_function()
requires the input to be 2D, but pairwise_wasserstein()
requires 2D input as well. And since pairwise_wasserstein()
splits your input to compute it pairwise, it will split the 2D data into 1-dimensional data, which won't work with your wasserstein_distance_function()
anymore. And when you provide 3D data to the pairwise_wasserstein()
, it throws an error because it can't work with that.
我建议您编写自己的帮助器方法,该方法成对地遍历所有数据点,并为您计算wasserstein距离.
I would suggest to just write your own helper method, which iterates through all your data points pairwise, and computes the wasserstein distance for you.
建议一个可能的解决方案:
def pairwise_wasserstein(points):
"""
Helper function to perform the pairwise distance function of all points within 'points' parameter
"""
for first_index in range(0,points.shape[0]):
for second_index in range(first_index+1,points.shape[0]):
print("First index: ", first_index, ", Second index: ", second_index, ", Distance: ",wasserstein_distance_function(points[first_index],points[second_index]))
带有4个数据点的示例输入:
data = np.array([[[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]],
[[5, 6], [7, 8], [5, 6], [7, 8], [5, 6], [7, 8], [5, 6], [7, 8], [5, 6], [7, 8]],
[[1, 15], [3, 2], [1, 2], [5, 4], [1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]],
[[5, 1], [7, 8], [5, 6], [7, 1], [5, 6], [7, 8], [5, 1], [7, 8], [5, 6], [7, 8]]])
示例输出:
First index: 0 , Second index: 1 , Distance: 100.80000000000005
First index: 0 , Second index: 2 , Distance: 76.4
First index: 0 , Second index: 3 , Distance: 96.32000000000002
First index: 1 , Second index: 2 , Distance: 215.00000000000003
First index: 1 , Second index: 3 , Distance: 55.68000000000002
First index: 2 , Second index: 3 , Distance: 186.88
这篇关于2个阵列上的成对Wasserstein距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!