2个阵列上的成对Wasserstein距离 [英] Pairwise Wasserstein distance on 2 arrays

查看:89
本文介绍了2个阵列上的成对Wasserstein距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试比较运动形式,因此需要比较点(x,y)坐标的相似分布最终如何将它们聚类.我正在使用以下形式的3D阵列:

I try to compare sports formations and therefore need to compare how similar distributions of points (x, y) coordinates are to eventually cluster them. I am working with a 3D array of the following form:

import scipy.spatial.distance as distance
from scipy.optimize import linear_sum_assignment
from sklearn.metrics import pairwise_distances
import numpy as np

data = np.array([[[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]],
                 [[5, 6], [7, 8], [5, 6], [7, 8], [5, 6], [7, 8], [5, 6], [7, 8], [5, 6], [7, 8]]])

我为wasserstein距离实现了以下自定义指标(l和k只是用于数据的不同缩放以比较密度不同的形式):

I have implemented the following custom metric for the wasserstein distance (the l and k is just for different scaling of the data to compare formations of varying density):

def wasserstein_distance_function(f1, f2):
    min_cost = np.inf
    f1 = f1.reshape((10, 2))
    f2 = f2.reshape((10, 2))
    for l in np.linspace(0.8, 1.2, 3):
        for k in np.linspace(0.8, 1.2, 3):
            cost = distance.cdist(l * f1, k * f2, 'sqeuclidean')
            row_ind, col_ind = linear_sum_assignment(cost)
            curr_cost = cost[row_ind, col_ind].sum()
            if curr_cost < min_cost:
                min_cost = curr_cost
    return min_cost

我的问题是:到目前为止,我如何通过sklearn实现成对比较.

My question is: how to I implement the pairwise comparison via sklearn, so far I got to:

def pairwise_wasserstein(points):
    """
    Helper function to perform the pairwise distance function of all points within 'points' parameter

    """
    # return pairwise_distances(points, metric=wasserstein_distance_function)
    # print(points)
    return pairwise_distances(points, metric=wasserstein_distance_function)

这让我要么丢了

ValueError: setting an array element with a sequence.

ValueError: Found array with dim 3. check_pairwise_arrays expected <= 2.

错误.我知道这与我的数据是3D np阵列有关,但是对于简单的比较,例如f.ex.

error. I understand that it has to do with my data being a 3D np array, but for simple comparisons, f.ex.

wasserstein_distance_function(data[0], data[1])

返回一个有效值.关于如何使其适用于大约1000个编队的阵列的任何线索,以寻找可以输入到AgglomerativeClustering算法中的成对距离?非常感谢!

returns a valid value. Any clues on how to get it to work for an array for maybe 1000 formations to find pairwise distances that I can feed into an AgglomerativeClustering algorithm? Thanks a lot!

推荐答案

问题是您的 wasserstein_distance_function()要求输入为2D,但 pairwise_wasserstein()也需要2D输入.由于 pairwise_wasserstein()将输入拆分为成对计算,因此它将把2D数据拆分为一维数据,这将不再与您的 wasserstein_distance_function()一起使用.而且,当您向 pairwise_wasserstein()提供3D数据时,它会引发错误,因为它无法使用该数据.

The problem is that your wasserstein_distance_function() requires the input to be 2D, but pairwise_wasserstein() requires 2D input as well. And since pairwise_wasserstein() splits your input to compute it pairwise, it will split the 2D data into 1-dimensional data, which won't work with your wasserstein_distance_function() anymore. And when you provide 3D data to the pairwise_wasserstein(), it throws an error because it can't work with that.

我建议您编写自己的帮助器方法,该方法成对地遍历所有数据点,并为您计算wasserstein距离.

I would suggest to just write your own helper method, which iterates through all your data points pairwise, and computes the wasserstein distance for you.

建议一个可能的解决方案:

def pairwise_wasserstein(points):
    """
    Helper function to perform the pairwise distance function of all points within 'points' parameter
    """
    for first_index in range(0,points.shape[0]):
      for second_index in range(first_index+1,points.shape[0]):
        print("First index: ", first_index, ", Second index: ", second_index, ", Distance: ",wasserstein_distance_function(points[first_index],points[second_index]))

带有4个数据点的示例输入:

data = np.array([[[1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]],
                 [[5, 6], [7, 8], [5, 6], [7, 8], [5, 6], [7, 8], [5, 6], [7, 8], [5, 6], [7, 8]],
                 [[1, 15], [3, 2], [1, 2], [5, 4], [1, 2], [3, 4], [1, 2], [3, 4], [1, 2], [3, 4]],
                 [[5, 1], [7, 8], [5, 6], [7, 1], [5, 6], [7, 8], [5, 1], [7, 8], [5, 6], [7, 8]]])

示例输出:

First index:  0 , Second index:  1 , Distance:  100.80000000000005
First index:  0 , Second index:  2 , Distance:  76.4
First index:  0 , Second index:  3 , Distance:  96.32000000000002
First index:  1 , Second index:  2 , Distance:  215.00000000000003
First index:  1 , Second index:  3 , Distance:  55.68000000000002
First index:  2 , Second index:  3 , Distance:  186.88

这篇关于2个阵列上的成对Wasserstein距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆