高效将gensim TransformedCorpus数据转换为数组 [英] Efficient transformation of gensim TransformedCorpus data to array

查看:118
本文介绍了高效将gensim TransformedCorpus数据转换为数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

与下面的逐行方法相比,从gensim.interfaces.TransformedCorpus对象获取主题概率数据到numpy数组(或熊猫数据框)中,是否有更直接或有效的方法?

Is there a more direct or efficient method for getting the topic probabilities data from a gensim.interfaces.TransformedCorpus object into a numpy array (or alternatively, pandas dataframe) than the by-row method below?

from gensim import models
import numpy as np

num_topics = 5
model = models.LdaMulticore(corpus, num_topics=num_topics, minimum_probability=0.0)

all_topics = model.get_document_topics(corpus)
num_docs = len(all_topics)

lda_scores = np.empty([num_docs, num_topics])

for i in range(0, num_docs):
    lda_scores[i] = np.array(all_topics[i]).transpose()[1]

推荐答案

可能为时已晚,但是gensim具有用于与numpy/scipy数组进行相互转换的辅助函数.

Might be too late, but gensim has a helper function for converting to and from numpy/scipy arrays.

您要寻找的是:

gensim.matutils.corpus2csc

然后,您可以根据需要将输出转换为numpy数组或pandas df.

You can then can convert the output to a numpy array or pandas df as you wish.

import gensim
import numpy as np

all_topics_csr = gensim.matutils.corpus2csc(all_topics)
all_topics_numpy = all_topics_csr.T.toarray()

这篇关于高效将gensim TransformedCorpus数据转换为数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆