(Python Scipy)如何展平一个csr_matrix并将其附加到另一个csr_matrix? [英] (Python Scipy) How to flatten a csr_matrix and append it to another csr_matrix?

查看:604
本文介绍了(Python Scipy)如何展平一个csr_matrix并将其附加到另一个csr_matrix?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将每个XML文档表示为csr_matrix格式的功能矩阵.现在,我已经拥有约3000个XML文档,我得到了csr_matrices的列表.我想将这些矩阵中的每一个展平以成为特征向量,然后我想将所有这些特征向量组合起来以形成一个表示所有XML文档为一个的csr_matrix,其中每一行是一个文档,每一列是一个特征.

I am representing each XML document as a feature matrix in a csr_matrix format. Now that I have around 3000 XML documents, I got a list of csr_matrices. I want to flatten each of these matrices to become feature vectors, then I want to combine all of these feature vectors to form one csr_matrix representing all the XML documents as one, where each row is a document and each column is a feature.

实现此目标的一种方法是通过以下代码

One way to achieve this is through this code

X= csr_matrix([a.toarray().ravel().tolist() for a in ls])

其中ls是csr_matrices的列表,但是效率很低,因为对于3000个文档,这简直就是崩溃!

where ls is the list of csr_matrices, however, this is highly inefficient, as with 3000 documents, this simply crashes!

换句话说,我的问题是,如何将列表'ls'中的每个csr_matrix展平而不必将其变成数组,以及如何将展平的csr_matrices附加到另一个csr_matrix中.

In other words, my question is, how to flatten each csr_matrix in that list 'ls' without having to turn it into an array, and how to append the flattened csr_matrices into another csr_matrix.

请注意,我在Scipy中使用python

Please note that I am using python with Scipy

提前谢谢!

推荐答案

为什么对每个XML使用csr_matrix,也许最好使用lillil_matrix支持重塑方法,这是一个示例:

Why you use csr_matrix for each XML, maybe it's better to use lil, lil_matrix support reshape method, here is an example:

N, M, K = 100, 200, 300
matrixs = [sparse.rand(N, M, format="csr") for i in xrange(K)]
matrixs2 = [m.tolil().reshape((1, N*M)) for m in matrixs]
m1 = sparse.vstack(matrixs2).tocsr()

# test with dense array
#m2 = np.vstack([m.toarray().reshape(-1) for m in matrixs])
#np.allclose(m1.toarray(), m2)

这篇关于(Python Scipy)如何展平一个csr_matrix并将其附加到另一个csr_matrix?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆