Python-如何生成成对汉明距离矩阵 [英] Python - How to generate the Pairwise Hamming Distance Matrix

查看:452
本文介绍了Python-如何生成成对汉明距离矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Python的入门者.所以我在尝试仅使用numpy库来计算输入矩阵的行之间的二进制成对汉明顿距离矩阵时遇到了麻烦.我应该避免循环并使用向量化.例如,如果我有类似的东西:

beginner with Python here. So I'm having trouble trying to calculate the resulting binary pairwise hammington distance matrix between the rows of an input matrix using only the numpy library. I'm supposed to avoid loops and use vectorization. If for instance I have something like:

   [ 1,  0,  0,  1,  1,  0]
   [ 1,  0,  0,  0,  0,  0]
   [ 1,  1,  1,  1,  0,  0]

矩阵应类似于:

   [ 0,  2,  3]
   [ 2,  0,  3]
   [ 3,  3,  0]

即,如果原始矩阵为A,汉明距离矩阵为B.B [0,1] =汉明距离(A [0]和A [1]).在这种情况下,答案是2,因为它们只有两个不同的元素.

ie if the original matrix was A and the hammingdistance matrix is B. B[0,1] = hammingdistance (A[0] and A[1]). In this case the answer is 2 as they only have two different elements.

所以对于我的代码来说,是这样的

So for my code is something like this

def compute_HammingDistance(X):

     hammingDistanceMatrix = np.zeros(shape = (len(X), len(X)))
     hammingDistanceMatrix = np.count_nonzero ((X[:,:,None] != X[:,:,None].T))
     return hammingDistanceMatrix

但是,似乎只是返回标量值而不是预期的矩阵.我知道阵列/向量广播可能做错了什么,但我不知道如何解决.我尝试使用np.sum而不是np.count_nonzero,但是它们几乎都给了我类似的东西.

However it seems to just be returning a scalar value instead of the intended matrix. I know I'm probably doing something wrong with the array/vector broadcasting but I can't figure out how to fix it. I've tried using np.sum instead of np.count_nonzero but they all pretty much gave me something similar.

推荐答案

尝试这种方法,沿着axis = 1创建一个新轴,然后进行广播并使用sum计算true或非零:

Try this approach, create a new axis along axis = 1, and then do broadcasting and count trues or non zero with sum:

(arr[:, None, :] != arr).sum(2)

# array([[0, 2, 3],
#        [2, 0, 3],
#        [3, 3, 0]])


def compute_HammingDistance(X):
    return (X[:, None, :] != X).sum(2)

说明:

1)创建一个形状为(3,1,6)的3d数组

1) Create a 3d array which has shape (3,1,6)

arr[:, None, :]
#array([[[1, 0, 0, 1, 1, 0]],
#       [[1, 0, 0, 0, 0, 0]],
#       [[1, 1, 1, 1, 0, 0]]])

2)这是一个二维数组,形状为(3,6)

2) this is a 2d array has shape (3, 6)

arr   
#array([[1, 0, 0, 1, 1, 0],
#       [1, 0, 0, 0, 0, 0],
#       [1, 1, 1, 1, 0, 0]])

3)由于形状不匹配而触发广播,并且首先沿3d数组 arr [:, None,:] <的0轴广播2d数组 arr /em>,然后针对(3,6)广播形状为(1,6)的数组.这两个广播步骤一起对原始数组进行笛卡尔比较.

3) This triggers broadcasting since their shape doesn't match, and the 2d array arr is firstly broadcasted along the 0 axis of 3d array arr[:, None, :], and then we have array of shape (1, 6) be broadcasted against (3, 6). The two broadcasting steps together make a cartesian comparison of the original array.

arr[:, None, :] != arr 
#array([[[False, False, False, False, False, False],
#        [False, False, False,  True,  True, False],
#        [False,  True,  True, False,  True, False]],
#       [[False, False, False,  True,  True, False],
#        [False, False, False, False, False, False],
#        [False,  True,  True,  True, False, False]],
#       [[False,  True,  True, False,  True, False],
#        [False,  True,  True,  True, False, False],
#        [False, False, False, False, False, False]]], dtype=bool)

4)沿着第三轴的sum计算有多少个元素不相等,即为true,它给出汉明距离.

4) the sum along the third axis count how many elements are not equal, i.e, trues which gives the hamming distance.

这篇关于Python-如何生成成对汉明距离矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆