Solr/Lucene中的场长如何定义? [英] How is field length defined in Solr/Lucene?

查看:66
本文介绍了Solr/Lucene中的场长如何定义?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我了解,给定文档的字段长度是在给定文档的字段中索引的术语数.但是,似乎字段长度永远不会是整数.例如,我看到一个文档的内容字段中有两个术语,但是Solr计算的内容字段长度实际上是2.56,而不是我期望的2.在Solr/Lucene中如何真正计算字段长度?

As I understand it, a field length of a given document is the number of terms indexed in the field of the given document. However, it seems that the field length is never an integer. For instance, I've seen a document with two terms in its content field, but the content field length as calculated by Solr is actually 2.56, not 2 as I've expected. How is a field length really being calculated in Solr/Lucene?

我指的是根据BM25相似度函数计算分数时使用的字段长度,但我认为正在为其他排名方案计算字段长度.

I'm referring to the field length as it is used when calculating the score according to the BM25 similarity function, but I think that field lengths are being calculated for other ranking schemes.

推荐答案

详细说明先前的答案"fieldLength"是通过复杂的数学归一化(编码/解码)方程(将32位整数压缩为8位进行计算的在存储SmallFloat.java类中节省磁盘空间).

Elaborating the previous answer "fieldLength" is calculated via complicated mathematical normalization (encoding/decoding) equation (basically compressing 32 bit integers to 8 bits to save disk space while storing the data) in class SmallFloat.java.

这是对decodeNormValue()函数的描述,该函数计算BM25中的fieldLength:

This is description of decodeNormValue() function which calculates the fieldLength in BM25:

默认评分实现,其中{@link encodeNormValue(float) 在存储之前,将规范值编码为单个字节.搜索时 时间,从索引{@link读取规范字节值 org.apache.lucene.store.Directory目录}和{@link 将DecodeNormValue(long)解码}返回浮点数 norm 值. 这种编码/解码在减小索引大小的同时,还附带了 精度损失的价格-无法保证 decode(encode(x))= x .例如, decode(encode(0.89))= 0.875

Default scoring implementation which {@link encodeNormValue(float) encodes} norm values as a single byte before being stored. At search time, the norm byte value is read from the index {@link org.apache.lucene.store.Directory directory} and {@link decodeNormValue(long) decoded} back to a float norm value. This encoding/decoding, while reducing index size, comes with the price of precision loss - it is not guaranteed that decode(encode(x)) = x. For instance, decode(encode(0.89)) = 0.875

希望这会有所帮助.

这篇关于Solr/Lucene中的场长如何定义?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆