Lucene 4.0中的文档长度 [英] document length in lucene 4.0

查看:69
本文介绍了Lucene 4.0中的文档长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了Lucene 4.0的文档,现在该库存储了一些统计信息,以便计算不同的评分模型,其中之一是bm25.除了获取文档之外,还有什么方法可以获取其长度吗?

as I've read the documentation of the lucene 4.0, now this library stores some statistics as in order to compute different scoring models, one of them bm25. Is there a way, besides fetching a document, to fetch its length too?

推荐答案

您可以将FieldInvertState中所需的内容存储到范数"中,并且也不必是8位浮点数.

You can store whatever you want from FieldInvertState into the 'norm', and it doesn't have to be a 8 bit float either.

默认值是长度的有损存储,如果您想要实际的确切长度,则可能选择每个文档使用短(16位)或其他格式.

The default is a lossy storage of the length, if you want the actual exact length, maybe you choose to use a short (16bits) per document or something else instead.

请参阅Sametime.computeNorm

See Similarity.computeNorm

这篇关于Lucene 4.0中的文档长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆