获取numpy的稀疏矩阵的行范 [英] Get norm of numpy sparse matrix rows

查看：1712 发布时间：2016/6/1 20:20:44 python arrays numpy matrix norm

本文介绍了获取numpy的稀疏矩阵的行范的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有我使用Sklearn的TfidfVectorizer对象获得一个稀疏矩阵：

I have a sparse matrix that I obtained by using Sklearn's TfidfVectorizer object:

vect = TfidfVectorizer(sublinear_tf=True, max_df=0.5, analyzer='word', vocabulary=my_vocab, stop_words='english')
tfidf = vect.fit_transform([my_docs])

稀疏矩阵（取出号码一般性）：

The sparse matrix is (taking out the numbers for generality):

<sparse matrix of type '<type 'numpy.float64'>'
with stored elements in Compressed Sparse Row format>]

我试图得到一个数值为每一行告诉我的文档有多高了我要找的条款。我真的不关心它所含的话，我只是想知道有多少遏制。所以，我想获得每或行* row.T的常态。不过，我有一个非常艰难的时间与numpy的合作，获得此。

I am trying to get a numeric value for each row to tell me how high a document had the terms I am looking for. I don't really care about which words it contained, I just want to know how many it contained. So I want to get the norm of each or the row*row.T. However, I am having a very hard time working with numpy to obtain this.

我的第一种方法是只是简单地做：

My first approach was to just simply do:

tfidf[i] * numpy.transpose(tfidf[i])

然而，numpy的显然将不小于一维转置的阵列，以便将刚刚方的载体。所以我尝试这样做的：

However, numpy will apparently not transpose an array with less than one dimension so that will just square the vector. So I tried doing:

tfidf[i] * numpy.transpose(numpy.atleast_2d(tfidf[0]))

但numpy.transpose（numpy.atleast_2d（TFIDF [0]））仍然不会转行。

But numpy.transpose(numpy.atleast_2d(tfidf[0])) still would not transpose the row.

我提出来试图获得该行的标准（这种方法可能会更好反正）。我最初的做法是使用numpy.linalg。

I moved on to trying to get the norm of the row (that approach is probably better anyways). My initial approach was using numpy.linalg.

numpy.linalg.norm(tfidf[0])

但是，这给了我一个尺寸不匹配错误。所以我试图手动计算标准。我开始通过只设置一个变量等于该稀疏矩阵的numpy的阵列版本和打印出第一行的的len

But that gave me a "dimension mismatch" error. So I tried to calculate the norm manually. I started by just setting a variable equal to a numpy array version of the sparse matrix and printing out the len of the first row:

my_array = numpy.array(tfidf)
print my_array
print len(my_array[0])

它打印出正确my_array，但当我尝试访问LEN它告诉我：

It prints out my_array correctly, but when I try to access the len it tells me:

IndexError: 0-d arrays can't be indexed

我只是单纯想通过fit_transform返回的稀疏矩阵每一行的数字值。获取规范将是最好的。任何帮助是非常AP preciated。

I just simply want to get a numeric value of each row in the sparse matrix returned by fit_transform. Getting the norm would be best. Any help here is very appreciated.

推荐答案

一些简单的假数据：

a = np.arange(9.).reshape(3,3)
s = sparse.csr_matrix(a)

要从稀疏得到每行的常态，你可以使用：

To get the norm of each row from the sparse, you can use:

np.sqrt(s.multiply(s).sum(1))

和重整化取值是

s.multiply(1/np.sqrt(s.multiply(s).sum(1)))

或重新正规化之前保持稀疏的：

or to keep it sparse before renormalizing:

s.multiply(sparse.csr_matrix(1/np.sqrt(s.multiply(s).sum(1))))

要从中得到普通的矩阵或阵列，使用：

To get ordinary matrix or array from it, use:

m = s.todense()
a = s.toarray()

如果您有密集的版本足够的内存，你可以得到的每一行与规范：

If you have enough memory for the dense version, you can get the norm of each row with:

n = np.sqrt(np.einsum('ij,ij->i',a,a))

或

n = np.apply_along_axis(np.linalg.norm, 1, a)

要正常化，你可以做

an = a / n[:, None]

，或者原来的阵列正常化到位：

or, to normalize the original array in place:

a /= n[:, None]

的 [:,无] 的事情基本上调换 N 是一个垂直排列。

The [:, None] thing basically transposes n to be a vertical array.

这篇关于获取numpy的稀疏矩阵的行范的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

获取numpy的稀疏矩阵的行范 [英] Get norm of numpy sparse matrix rows

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

获取numpy的稀疏矩阵的行范 [英] Get norm of numpy sparse matrix rows

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭