将python稀疏矩阵dict转换为scipy稀疏矩阵 [英] Converting python sparse matrix dict to scipy sparse matrix
问题描述
我正在使用python scikit-learn
进行文档聚类,并且在 dict
对象中存储了一个稀疏矩阵:
I am using python scikit-learn
for document clustering and I have a sparse matrix stored in a dict
object:
例如:
doc_term_dict = { ('d1','t1'): 12, \
('d2','t3'): 10, \
('d3','t2'): 5 \
} # from mysql data table
<type 'dict'>
我想使用scikit-learn
进行输入矩阵类型为scipy.sparse.csr.csr_matrix
I want to use scikit-learn
to do the clustering where the input matrix type is scipy.sparse.csr.csr_matrix
示例:
(0, 2164) 0.245793088885
(0, 2076) 0.205702177467
(0, 2037) 0.193810934784
(0, 2005) 0.14547028437
(0, 1953) 0.153720023365
...
<class 'scipy.sparse.csr.csr_matrix'>
我找不到将dict
转换为此csr矩阵的方法(我从未使用过scipy
.)
I can't find a way to convert dict
to this csr-matrix (I have never used scipy
.)
推荐答案
非常简单.首先阅读字典并将键转换为适当的行和列. Scipy支持(并为此目的建议)
Pretty straightforward. First read the dictionary and convert the keys to the appropriate row and column. Scipy supports (and recommends for this purpose) the COO-rdinate format for sparse matrices.
将其传递给data
,row
和column
,其中A[row[k], column[k] = data[k]
(对于所有k)定义矩阵.然后让Scipy转换为CSR.
Pass it data
, row
, and column
, where A[row[k], column[k] = data[k]
(for all k) defines the matrix. Then let Scipy do the conversion to CSR.
请检查一下,我是否按照您想要的方式排列了行和列,可能已将它们转置了.我还假设输入将为1索引.
Please check, that I have rows and columns in the way you want them, I might have them transposed. I also assumed that the input would be 1-indexed.
我的下面的代码打印出来:
My code below prints:
(0, 0) 12
(1, 2) 10
(2, 1) 5
代码:
#!/usr/bin/env python3
#http://stackoverflow.com/questions/26335059/converting-python-sparse-matrix-dict-to-scipy-sparse-matrix
from scipy.sparse import csr_matrix, coo_matrix
def convert(term_dict):
''' Convert a dictionary with elements of form ('d1', 't1'): 12 to a CSR type matrix.
The element ('d1', 't1'): 12 becomes entry (0, 0) = 12.
* Conversion from 1-indexed to 0-indexed.
* d is row
* t is column.
'''
# Create the appropriate format for the COO format.
data = []
row = []
col = []
for k, v in term_dict.items():
r = int(k[0][1:])
c = int(k[1][1:])
data.append(v)
row.append(r-1)
col.append(c-1)
# Create the COO-matrix
coo = coo_matrix((data,(row,col)))
# Let Scipy convert COO to CSR format and return
return csr_matrix(coo)
if __name__=='__main__':
doc_term_dict = { ('d1','t1'): 12, \
('d2','t3'): 10, \
('d3','t2'): 5 \
}
print(convert(doc_term_dict))
这篇关于将python稀疏矩阵dict转换为scipy稀疏矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!