tf-idf 特征权重使用 sklearn.feature_extraction.text.TfidfVectorizer [英] tf-idf feature weights using sklearn.feature_extraction.text.TfidfVectorizer

查看:164
本文介绍了tf-idf 特征权重使用 sklearn.feature_extraction.text.TfidfVectorizer的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个页面:http://scikit-learn.org/stable/modules/feature_extraction.html 提及:

由于 tf–idf 经常用于文本特征,因此还有另一个名为 TfidfVectorizer 的类,它结合了 CountVectorizerTfidfTransformer<的所有选项/strong> 在单个模型中.

As tf–idf is a very often used for text features, there is also another class called TfidfVectorizer that combines all the option of CountVectorizer and TfidfTransformer in a single model.

然后我按照代码在我的语料库中使用 fit_transform() .如何获得fit_transform()计算的每个特征的权重?

then I followed the code and use fit_transform() on my corpus. How to get the weight of each feature computed by fit_transform()?

我试过了:

In [39]: vectorizer.idf_
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-39-5475eefe04c0> in <module>()
----> 1 vectorizer.idf_

AttributeError: 'TfidfVectorizer' object has no attribute 'idf_'

但是缺少此属性.

谢谢

推荐答案

从 0.15 版本开始,可以通过 TfidfVectorizer<的属性 idf_ 检索每个特征的 tf-idf 分数/code> 对象:

Since version 0.15, the tf-idf score of each feature can be retrieved via the attribute idf_ of the TfidfVectorizer object:

from sklearn.feature_extraction.text import TfidfVectorizer
corpus = ["This is very strange",
          "This is very nice"]
vectorizer = TfidfVectorizer(min_df=1)
X = vectorizer.fit_transform(corpus)
idf = vectorizer.idf_
print dict(zip(vectorizer.get_feature_names(), idf))

输出:

{u'is': 1.0,
 u'nice': 1.4054651081081644,
 u'strange': 1.4054651081081644,
 u'this': 1.0,
 u'very': 1.0}

<小时>

正如评论中所讨论的,在 0.15 版本之前,一种解决方法是通过假定隐藏的 _tfidf(TfidfTransformer) 的矢量化器:

idf = vectorizer._tfidf.idf_
print dict(zip(vectorizer.get_feature_names(), idf))

应该给出与上面相同的输出.

which should give the same output as above.

这篇关于tf-idf 特征权重使用 sklearn.feature_extraction.text.TfidfVectorizer的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆