将python稀疏矩阵dict转换为scipy稀疏矩阵 [英] Converting python sparse matrix dict to scipy sparse matrix

查看:139
本文介绍了将python稀疏矩阵dict转换为scipy稀疏矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用python scikit-learn进行文档聚类,并且在 dict 对象中存储了一个稀疏矩阵:

I am using python scikit-learn for document clustering and I have a sparse matrix stored in a dict object:

例如:

doc_term_dict = { ('d1','t1'): 12,             \
                  ('d2','t3'): 10,             \
                  ('d3','t2'):  5              \
                  }                            # from mysql data table 
<type 'dict'>

我想使用scikit-learn进行输入矩阵类型为scipy.sparse.csr.csr_matrix

I want to use scikit-learn to do the clustering where the input matrix type is scipy.sparse.csr.csr_matrix

示例:

(0, 2164)   0.245793088885
(0, 2076)   0.205702177467
(0, 2037)   0.193810934784
(0, 2005)   0.14547028437
(0, 1953)   0.153720023365
...
<class 'scipy.sparse.csr.csr_matrix'>

我找不到将dict转换为此csr矩阵的方法(我从未使用过scipy.)

I can't find a way to convert dict to this csr-matrix (I have never used scipy.)

推荐答案

非常简单.首先阅读字典并将键转换为适当的行和列. Scipy支持(并为此目的建议)

Pretty straightforward. First read the dictionary and convert the keys to the appropriate row and column. Scipy supports (and recommends for this purpose) the COO-rdinate format for sparse matrices.

将其传递给datarowcolumn,其中A[row[k], column[k] = data[k](对于所有k)定义矩阵.然后让Scipy转换为CSR.

Pass it data, row, and column, where A[row[k], column[k] = data[k] (for all k) defines the matrix. Then let Scipy do the conversion to CSR.

请检查一下,我是否按照您想要的方式排列了行和列,可能已将它们转置了.我还假设输入将为1索引.

Please check, that I have rows and columns in the way you want them, I might have them transposed. I also assumed that the input would be 1-indexed.

我的下面的代码打印出来:

My code below prints:

(0, 0)        12
(1, 2)        10
(2, 1)        5

代码:

#!/usr/bin/env python3
#http://stackoverflow.com/questions/26335059/converting-python-sparse-matrix-dict-to-scipy-sparse-matrix

from scipy.sparse import csr_matrix, coo_matrix

def convert(term_dict):
    ''' Convert a dictionary with elements of form ('d1', 't1'): 12 to a CSR type matrix.
    The element ('d1', 't1'): 12 becomes entry (0, 0) = 12.
    * Conversion from 1-indexed to 0-indexed.
    * d is row
    * t is column.
    '''
    # Create the appropriate format for the COO format.
    data = []
    row = []
    col = []
    for k, v in term_dict.items():
        r = int(k[0][1:])
        c = int(k[1][1:])
        data.append(v)
        row.append(r-1)
        col.append(c-1)
    # Create the COO-matrix
    coo = coo_matrix((data,(row,col)))
    # Let Scipy convert COO to CSR format and return
    return csr_matrix(coo)

if __name__=='__main__':
    doc_term_dict = { ('d1','t1'): 12,             \
                ('d2','t3'): 10,             \
                ('d3','t2'):  5              \
                }   
    print(convert(doc_term_dict))

这篇关于将python稀疏矩阵dict转换为scipy稀疏矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆