TF * IDF搜索查询 [英] TFIDF for Search Queries*

查看：204 发布时间：2020/5/18 0:37:22 python nlp nltk scikit-learn tf-idf

本文介绍了TF * IDF搜索查询的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

好的，所以我一直关注TF * IDF上的这两篇文章，但有点困惑: http: //css.dzone.com/articles/machine-learning-text-feature

Okay, so I have been following these two posts on TF*IDF but am little confused : http://css.dzone.com/articles/machine-learning-text-feature

基本上，我想创建一个搜索查询，其中包含对多个文档的搜索.我想使用scikit-learn工具包以及用于Python的NLTK库

Basically, I want to create a search query that contains searches through multiple documents. I would like to use the scikit-learn toolkit as well as the NLTK library for Python

问题是我看不到这两个TF * IDF向量来自何处.我需要一个搜索查询和多个文档来搜索.我发现我针对每个查询计算了每个文档的TF * IDF分数，找到了它们之间的余弦相似度，然后通过按分数降序对它们进行排名.但是，该代码似乎没有提供正确的向量.

The problem is that I don't see where the two TF*IDF vectors come from. I need one search query and multiple documents to search. I figured that I calculate the TF*IDF scores of each document against each query and find the cosine similarity between them, and then rank them by sorting the scores in descending order. However, the code doesn't seem to come up with the right vectors.

每当我将查询简化为一个搜索时，它都会返回一个庞大的0列表，这确实很奇怪.

Whenever I reduce the query to only one search, it is returning a huge list of 0's which is really strange.

以下是代码:

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfTransformer
from nltk.corpus import stopwords

train_set = ("The sky is blue.", "The sun is bright.") #Documents
test_set = ("The sun in the sky is bright.") #Query
stopWords = stopwords.words('english')

vectorizer = CountVectorizer(stop_words = stopWords)
transformer = TfidfTransformer()

trainVectorizerArray = vectorizer.fit_transform(train_set).toarray()
testVectorizerArray = vectorizer.transform(test_set).toarray()
print 'Fit Vectorizer to train set', trainVectorizerArray
print 'Transform Vectorizer to test set', testVectorizerArray

transformer.fit(trainVectorizerArray)
print transformer.transform(trainVectorizerArray).toarray()

transformer.fit(testVectorizerArray)

tfidf = transformer.transform(testVectorizerArray)
print tfidf.todense()

TF * IDF搜索查询 [英] TFIDF for Search Queries*

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

TF * IDF搜索查询 [英] TF*IDF for Search Queries

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

TF * IDF搜索查询 [英] TFIDF for Search Queries*

登录关闭