IDF对几个文档有何不同? [英] How can IDF be different for several documents?

查看:87
本文介绍了IDF对几个文档有何不同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用LETOR制作信息检索系统.他们使用TF和IDF. 我确定TF是查询相关的.但是IDF应该去,但是:

"请注意,IDF是独立于文档的,因此查询下的所有文档都具有 相同的IDF值."

但这没有意义,因为IDF是功能列表的一部分.如何计算每个文档的IDF?

解决方案

IDF是特定于术语的.任何给定术语的IDF都是独立于文档的,但是TF是特定于文档的.

换句话说. 假设我们有3个文档.

文档ID 1 那只敏捷的棕狐跳过了那只懒狗"

文档ID 2 安纳波利斯狡猾的狐狸酒吧位于教堂圈上"

文档ID 3 位于历史区中心的教堂圈内"

现在,如果IDF为(文档数)/(包含术语t的文档数) 则无论搜索内容或文档是什么,术语fox的IDF均为3/2.因此IDF是t的函数.

另一方面,TF是t和d的一个函数.因此,文档ID 1的'the'TF为2.

I am using LETOR to make an information retrieval system. They use TF and IDF. I am sure TF is query-dependent. But IDF should be to, but:

"Note that IDF is document independent, and so all the documents under a query have same IDF values."

But that does not make sense because IDF is part of the feature list. How will IDF for each document be calculated?

解决方案

IDF is term specific. The IDF of any given term is document independent, but the TF is document specific.

To say it differently. Let's say we have 3 documents.

doc id 1 "The quick brown fox jumps over the lazy dog"

doc id 2 "The Sly Fox Pub Annapolis is located on church circle"

doc id 3 "Located on Church Circle, in the heart of the Historic District"

Now if IDF is (number of documents) / (number of documents containing term t) then the IDF for the term fox is 3/2 regardless of what the search is or what the document is. So IDF is a function of t.

TF on the other hand is a funciton on t and d. So the TF of 'the' for doc id 1 is 2.

这篇关于IDF对几个文档有何不同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆