sklearn如何从每个主题中获取10个单词 [英] Sklearn how to get the 10 words from each topic

查看:38
本文介绍了sklearn如何从每个主题中获取10个单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想得到每个话题的前10个词频,在我使用TfidfTransformer之后,我得到:并且类型是scipy.sparse.csr.csr_matrix

I want to get the top 10 frequency of words from each topic, and after I use TfidfTransformer, I get: and the type is scipy.sparse.csr.csr_matrix

但是我不知道如何从每个列表中获得最高的十个,在数据中,(0, ****) 表示 0 列表,直到 (5170, *****) 表示 5170 列表.

But I don't know how to get the highest ten from each list, in the data, (0, ****) means the 0 list, until (5170, *****) means the 5170 list.

我尝试将其转换为 numpy,但失败了.

I've tried to convert it into numpy, but it fails.

  (0, 19016)    0.024214182003181053
  (0, 28002)    0.03661443306612277
  (0, 6710) 0.02292100371816788
  (0, 27683)    0.013973969726506812
  (0, 27104)    0.02236713272585597
  (0, 6889) 0.0403281034949193
.
.
.
 (5169, 3236)   0.014432449220428715
  (5169, 19134) 0.014346823328868169
  (5169, 32915) 0.002047199186262409
  (5170, 35899) 0.49931779368675605
  (5170, 36444) 0.3479717717856863
  (5170, 15014) 0.5608169649159123

推荐答案

您可以使用 TfidfVectorizer 来公开 get_feature_names 方法.转换器没有这种方法,但文档明确指出 Vectorizer 等同于 CountVectorizer 后跟转换器.如果您不想使用它,那么我认为您将在矢量化之前被困在构建查找中.

You can use the TfidfVectorizer to expose the get_feature_names method. The transformer doesn't have this method, but the docs clearly state that the Vectorizer is equivalent to CountVectorizer followed by the transformer. If you don't want to use this, then I think you're going to be stuck building a lookup before you vectorize.

文档中的 TfidfVectorizer:https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

TfidfVectorizer in the docs: https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html

TfidfVectorizer 中的 fit_transform 的输出进行排序和切片应该可以正常工作.

to sort and slice the output of fit_transform from the TfidfVectorizer normal sparse matrix operations should work.

这篇关于sklearn如何从每个主题中获取10个单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆