Lucene相似度的高级解释? [英] High level explanation of Similarity Class for Lucene?

查看:107
本文介绍了Lucene相似度的高级解释?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你知道我在哪里可以找到 Lucene相似度算法。我想理解它,而不必破译搜索和索引所涉及的所有数学和术语。

Do you know where I can find a high level explanation of Lucene Similarity Class algorithm. I will like to understand it without having to decipher all the math and terms involved with searching and indexing.

推荐答案

Lucene的内置相似性是一个相当标准的逆文档频率评分算法。维基百科的文章很简短,但涵盖了基础知识。 Lucene in Action 这本书打破了这本书更详细地介绍Lucene公式;它并没有完美地反映当前的Lucene公式,但是解释了所有主要概念。

Lucene's built-in Similarity is a fairly standard "Inverse Document Frequency" scoring algorithm. The Wikipedia article is brief, but covers the basics. The book Lucene in Action breaks down the Lucene formula in more detail; it doesn't mirror the current Lucene formula perfectly, but all of the main concepts are explained.

主要是,得分随着当前术语出现的次数而变化文档(术语频率)和反向,其中术语出现在所有文档中的次数(文档频率)。公式中的其他因素是次要因素,调整分数以尝试使不同查询的分数相互比较。

Primarily, the score varies with number of times that term occurs in the current document (the term frequency), and inversely with the number of times a term occurs in all documents (the document frequency). The other factors in the formula are secondary, adjusting the score in attempt to make scores from different queries fairly comparable to each other.

这篇关于Lucene相似度的高级解释?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆