带有稀疏矩阵的 scipy cdist [英] scipy cdist with sparse matrices

查看：54 发布时间：2021/7/16 20:55:58 python numpy scipy

本文介绍了带有稀疏矩阵的 scipy cdist的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要计算两组向量之间的距离，source_matrix 和 target_matrix.

I need to calculate the distances between two sets of vectors, source_matrix and target_matrix.

我有以下行，当 source_matrix 和 target_matrix 都是 scipy.sparse.csr.csr_matrix 类型时:

I have the following line, when both source_matrix and target_matrix are of type scipy.sparse.csr.csr_matrix:

distances = sp.spatial.distance.cdist(source_matrix, target_matrix)

我最终得到以下部分异常回溯:

And I end up getting the following partial exception traceback:

 File "/usr/local/lib/python2.7/site-packages/scipy/spatial/distance.py", line 2060, in cdist
    [XA] = _copy_arrays_if_base_present([_convert_to_double(XA)])
  File "/usr/local/lib/python2.7/site-packages/scipy/spatial/distance.py", line 146, in _convert_to_double
    X = X.astype(np.double)
ValueError: setting an array element with a sequence.

这似乎表明稀疏矩阵被视为密集的numpy矩阵，这既失败又错过了使用稀疏矩阵的意义.

Which seem to indicate the sparse matrices are being treated as dense numpy matrices, which both fails and misses the point of using sparse matrices.

有什么建议吗?

推荐答案

我很欣赏这篇文章已经很老了，但正如建议的评论之一，您可以使用 sklearn 实现，它接受稀疏向量和矩阵.

I appreciate this post is quite old, but as one of the comments suggested, you could use the sklearn implementation which accepts sparse vectors and matrices.

以两个随机向量为例

a = scipy.sparse.rand(m=1,n=100,density=0.2,format='csr')
b = scipy.sparse.rand(m=1,n=100,density=0.2,format='csr')
sklearn.metrics.pairwise.pairwise_distances(X=a, Y=b, metric='euclidean')
>>> array([[ 3.14837228]]) # example output

或者即使 a 是一个矩阵而 b 是一个向量:

Or even if a is a matrix and b is a vector:

a = scipy.sparse.rand(m=500,n=100,density=0.2,format='csr')
b = scipy.sparse.rand(m=1,n=100,density=0.2,format='csr')
sklearn.metrics.pairwise.pairwise_distances(X=a, Y=b, metric='euclidean')
>>> array([[ 2.9864606 ], # example output
   [ 3.33862248],
   [ 3.45803465],
   [ 3.15453179],
   ...

Scipy spatial.distance 不支持稀疏矩阵，所以 sklearn 将是这里的最佳选择.您还可以将 n_jobs 参数传递给 sklearn.metrics.pairwise.pairwise_distances，如果您的向量非常大，它会分布计算.

Scipy spatial.distance does not support sparse matrices, so sklearn would be the best choice here. You can also pass the n_jobs argument to sklearn.metrics.pairwise.pairwise_distances which distributes the computation if your vectors are very large.

希望有帮助

这篇关于带有稀疏矩阵的 scipy cdist的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

带有稀疏矩阵的 scipy cdist [英] scipy cdist with sparse matrices

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

带有稀疏矩阵的 scipy cdist [英] scipy cdist with sparse matrices

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭