在Scikit学习分类器中找到最常用的术语 [英] Find the Most common term in Scikit-learn classifier

查看：59 发布时间：2020/5/18 19:45:09 python python-2.7 numpy scipy scikit-learn

本文介绍了在Scikit学习分类器中找到最常用的术语的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在遵循Scikit中的示例在某些数据集上使用CountVectorizer.

I'm following the example in Scikit learn docs where CountVectorizer is used on some dataset.

问题:count_vect.vocabulary_.viewitems()列出了所有术语及其频率.您如何根据发生次数对它们进行排序?

Question: count_vect.vocabulary_.viewitems() lists all the terms and their frequencies. How do you sort them by the number of occurances?

sorted( count_vect.vocabulary_.viewitems() )似乎不起作用.

推荐答案

vocabulary_.viewitems()实际上并未列出术语及其频率，而是从术语到其索引的映射.频率(每个文档)由fit_transform方法返回，该方法返回一个稀疏(coo)矩阵，其中行是文档，单词是列(列索引通过vocabulary_映射到单词).您可以通过以下方式获取总频率:

vocabulary_.viewitems() does not in fact list the terms and their frequencies, instead its a mapping from terms to their indexes. The frequencies (per document) are returned by the fit_transform method, which returns a sparse (coo) matrix, where the rows are documents and columns the words (with column indexes mapped to words via vocabulary_). You can get the total frequencies for example by

matrix = count_vect.fit_transform(doc_list)
freqs = zip(count_vect.get_feature_names(), matrix.sum(axis=0))    
# sort from largest to smallest
print sorted(freqs, key=lambda x: -x[1])

这篇关于在Scikit学习分类器中找到最常用的术语的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Scikit学习分类器中找到最常用的术语 [英] Find the Most common term in Scikit-learn classifier

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在Scikit学习分类器中找到最常用的术语 [英] Find the Most common term in Scikit-learn classifier

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭