对于某些特定索引,scipy 的 pdist 函数是否有特定用途? [英] Is there a specific use of pdist function of scipy for some particular indexes?
问题描述
我的问题是关于 scipy.spatial.distance 的 pdist 函数的使用.尽管我必须计算 1x64 向量与存储在二维数组中的其他数百万个 1x64 向量中的每一个向量之间的汉明距离,但我无法使用 pdist 来完成.因为它返回同一二维数组内任意两个向量之间的汉明距离.我想知道是否有任何方法可以让它计算特定索引向量与所有其他向量之间的汉明距离.
my question is about use of pdist function of scipy.spatial.distance. Although I have to calculate the hamming distances between a 1x64 vector with each and every one of other millions of 1x64 vectors that are stored in a 2D-array, I cannot do it with pdist. Because it returns hamming distances between any two vector inside the same 2D array. I wonder if there is any way to make it calculate hamming distances between a specific index' vector and all others each.
这是我当前的代码,我现在使用 1000x64,因为大数组会出现内存错误.
Here is my current code, I use 1000x64 for now because memory error shows up with big arrays.
import numpy as np
from scipy.spatial.distance import pdist
ph = np.load('little.npy')
print pdist(ph, 'hamming').shape
输出为
(499500,)
little.npy 有一个 1000x64 的数组.例如,如果我只想查看 31. vector 和所有其他人的汉明距离.我该怎么办?
little.npy has a 1000x64 array. For example, if I want only to see the hamming distances with 31. vector and all others. What should I do?
推荐答案
您可以使用 cdist
.例如,
You can use cdist
. For example,
In [101]: from scipy.spatial.distance import cdist
In [102]: x
Out[102]:
array([[0, 1, 1, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 0, 1, 0],
[0, 0, 0, 1, 1, 1, 0, 0],
[1, 0, 1, 1, 0, 1, 1, 0],
[1, 0, 1, 1, 0, 1, 1, 1],
[0, 1, 0, 1, 0, 0, 0, 1],
[1, 0, 0, 0, 0, 1, 0, 0],
[1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 0, 0, 1, 1, 1, 0],
[1, 0, 0, 1, 1, 0, 0, 1]])
In [103]: index = 3
In [104]: cdist(x[index:index+1], x, 'hamming')
Out[104]:
array([[ 0.625, 0.375, 0.5 , 0. , 0.125, 0.75 , 0.375, 0.375,
0.5 , 0.625]])
这给出了索引 3 处的行与所有其他行(包括索引 3 处的行)之间的汉明距离.结果是一个二维数组,只有一行.您可能希望立即拉出该行,以便结果为 1D:
That gives the Hamming distance between the row at index 3 and all the other rows (including the row at index 3). The result is a 2D array, with a single row. You might want to immediately pull out that row so the result is 1D:
In [105]: cdist(x[index:index+1], x, 'hamming')[0]
Out[105]:
array([ 0.625, 0.375, 0.5 , 0. , 0.125, 0.75 , 0.375, 0.375,
0.5 , 0.625])
我使用了 x[index:index+1]
而不是 x[index]
所以输入是一个二维数组(只有一行):
I used x[index:index+1]
instead of just x[index]
so that input is a 2D array (with just a single row):
In [106]: x[index:index+1]
Out[106]: array([[1, 0, 1, 1, 0, 1, 1, 0]])
如果你使用 x[index]
,你会得到一个错误.
You'll get an error if you use x[index]
.
这篇关于对于某些特定索引,scipy 的 pdist 函数是否有特定用途?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!