tf-idf 特征权重使用 sklearn.feature_extraction.text.TfidfVectorizer [英] tf-idf feature weights using sklearn.feature_extraction.text.TfidfVectorizer

查看：164 发布时间：2021/7/16 19:50:50 python scikit-learn tf-idf

本文介绍了tf-idf 特征权重使用 sklearn.feature_extraction.text.TfidfVectorizer的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这个页面:http://scikit-learn.org/stable/modules/feature_extraction.html 提及:

由于 tf–idf 经常用于文本特征，因此还有另一个名为 TfidfVectorizer 的类，它结合了 CountVectorizer 和 TfidfTransformer<的所有选项/strong> 在单个模型中.

As tf–idf is a very often used for text features, there is also another class called TfidfVectorizer that combines all the option of CountVectorizer and TfidfTransformer in a single model.

然后我按照代码在我的语料库中使用 fit_transform() .如何获得fit_transform()计算的每个特征的权重?

then I followed the code and use fit_transform() on my corpus. How to get the weight of each feature computed by fit_transform()?

我试过了:

In [39]: vectorizer.idf_ --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-39-5475eefe04c0> in <module>() ----> 1 vectorizer.idf_ AttributeError: 'TfidfVectorizer' object has no attribute 'idf_'

但是缺少此属性.

谢谢

推荐答案

从 0.15 版本开始，可以通过 TfidfVectorizer<的属性 idf_ 检索每个特征的 tf-idf 分数/code> 对象:
Since version 0.15, the tf-idf score of each feature can be retrieved via the attribute idf_ of the TfidfVectorizer object: from sklearn.feature_extraction.text import TfidfVectorizer corpus = ["This is very strange", "This is very nice"] vectorizer = TfidfVectorizer(min_df=1) X = vectorizer.fit_transform(corpus) idf = vectorizer.idf_ print dict(zip(vectorizer.get_feature_names(), idf)) 输出: {u'is': 1.0, u'nice': 1.4054651081081644, u'strange': 1.4054651081081644, u'this': 1.0, u'very': 1.0} <小时> 正如评论中所讨论的，在 0.15 版本之前，一种解决方法是通过假定隐藏的 _tfidf(TfidfTransformer) 的矢量化器: idf = vectorizer._tfidf.idf_ print dict(zip(vectorizer.get_feature_names(), idf)) 应该给出与上面相同的输出. which should give the same output as above. 这篇关于tf-idf 特征权重使用 sklearn.feature_extraction.text.TfidfVectorizer的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

tf-idf 特征权重使用 sklearn.feature_extraction.text.TfidfVectorizer [英] tf-idf feature weights using sklearn.feature_extraction.text.TfidfVectorizer

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

tf-idf 特征权重使用 sklearn.feature_extraction.text.TfidfVectorizer [英] tf-idf feature weights using sklearn.feature_extraction.text.TfidfVectorizer

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭