在numpy python中从稀疏矩阵生成密集矩阵 [英] Generating a dense matrix from a sparse matrix in numpy python

查看：263 发布时间：2020/5/18 19:36:38 python arrays numpy scipy sparse-matrix

本文介绍了在numpy python中从稀疏矩阵生成密集矩阵的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个Sqlite数据库，其中包含以下类型的架构:

I have a Sqlite database that contains following type of schema:

termcount(doc_num, term , count)

此表包含术语及其在文档中的各自计数. 喜欢

This table contains terms with their respective counts in the document. like

(doc1 , term1 ,12)
(doc1, term 22, 2)
.
.
(docn,term1 , 10)

该矩阵可以被视为稀疏矩阵，因为每个文档都包含很少的具有非零值的项.

This matrix can be considered as sparse matrix as each documents contains very few terms that will have a non-zero value.

如何使用numpy从稀疏矩阵中创建密集矩阵，因为我必须使用余弦相似度来计算文档之间的相似度.

How would I create a dense matrix from this sparse matrix using numpy as I have to calculate the similarity among documents using cosine similarity.

这个密集矩阵看起来像一个表格，其第一列为docid，所有术语将列为第一行，其余单元格将包含计数.

This dense matrix will look like a table that have docid as the first column and all the terms will be listed as the first row.and remaining cells will contain counts.

推荐答案

我使用Pandas解决了此问题.因为我们要保留文档ID和术语ID.

I solved this problem using Pandas. Because we want to keep the document ids and term ids.

from pandas import DataFrame 

# A sparse matrix in dictionary form (can be a SQLite database). Tuples contains doc_id        and term_id. 
doc_term_dict={('d1','t1'):12, ('d2','t3'):10, ('d3','t2'):5}

#extract all unique documents and terms ids and intialize a empty dataframe.
rows = set([d for (d,t) in doc_term_dict.keys()])  
cols = set([t for (d,t) in doc_term_dict.keys()])
df = DataFrame(index = rows, columns = cols )
df = df.fillna(0)

#assign all nonzero values in dataframe
for key, value in doc_term_dict.items():
    df[key[1]][key[0]] = value   

print df

输出:

    t2  t3  t1
d2  0  10   0
d3  5   0   0
d1  0   0  12

这篇关于在numpy python中从稀疏矩阵生成密集矩阵的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在numpy python中从稀疏矩阵生成密集矩阵 [英] Generating a dense matrix from a sparse matrix in numpy python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在numpy python中从稀疏矩阵生成密集矩阵 [英] Generating a dense matrix from a sparse matrix in numpy python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭