Lucene中的规范是什么 [英] What are norms in Lucene

查看:70
本文介绍了Lucene中的规范是什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不了解它们是什么,真的很希望能得到一个简单的解释,来说明它们为世界带来了什么价值,而没有太多关于它们如何工作的实现细节.

I don't understand what they are, and would really appreciate a simple explanation showing what value they bring to the world without too much implementation detail of how they work.

推荐答案

规范是分数计算的一部分.可以根据您的喜好来计算该范数.使规范与众不同的主要因素是它在索引时间进行计算.通常,会根据文档与查询的匹配程度,在查询时计算其他影响得分的因素. norm通过与文档一起存储来节省查询性能.

A norm is part of the calculation of a score. The norm could be calculated however you like, really. The main thing that sets the norm apart, is it's calculated at index-time. Generally, other factors influencing score are calculated at query time, based on how well the document matches the query. The norm saves on query performance by being stored along with the document, instead.

可以在Lucene的

The standard implementation can be found, and well described, in Lucene's TFIDFSimilarity. There, it is the product of the set field boost (or the product of all fields boosts, if multiple have been set on the field) and "lengthNorm" (which is a calculated factor designed to weigh matches on shorter documents more heavily). Neither of these is dependent on the makeup of the query, and so are good choices to be calculated and stored at index time instead.

然后将它们以压缩的,高度有损的单字节格式存储(精度大约为1个有效十进制数字).

They are then stored in a compressed, and highly lossy, single-byte format (with approx. 1 significant decimal digit of accuracy).

这篇关于Lucene中的规范是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆