在构建索引时提升 Lucene 术语 [英] Boosting Lucene Terms When Building the Index

查看:21
本文介绍了在构建索引时提升 Lucene 术语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以在创建索引时(而不是在查询时)确定特定术语比其他术语更重要?

Is it possible to determine that specific terms are more important then other when creating the index (not when querying it) ?

以同义词过滤器为例:
文档 1:这是一辆不错的车"
文档 2:这是一辆不错的车"

Consider for example a synonym filter:
doc 1: "this is a nice car"
doc 2: "this is a nice vehicle"

我想将术语车辆"添加到第一个文档,将术语汽车"添加到第二个文档,但我希望如果稍后使用单词 car 查询索引,那么第一个文档的得分将高于第二个文档,如果查询车辆,则相反.

I want to add the term vehicle to the first doc and the term car to the second doc, but I want that if later the index is queried with the word car then the first document will be scored higher then the second one and if queried for vehicle it will be the other way around.

在将字段添加到各自的文档之前对字段调用 setBoost 会起作用吗?

Will calling setBoost on the fields before adding them to their respective documents do the trick?

或者我应该将同义词添加到不同的字段名称中?

Or maybe I should add the synonyms to a different field name?

还是我从错误的角度看待这个?

Or am I looking at this from a wrong point of view ?

谢谢

推荐答案

在某个字段上设置 boost 会影响该字段中的所有术语,因此这不适用于您的情况.

Setting boost on a filed affects all terms in that field so this wouldn't work in your case.

但是它应该可以使用 Lucene 有效负载(可以为每个术语设置的字节数组).您可以使用它们来设置术语特定的提升(例如,文档 1 的车辆为 0.5).然后,您将实现自己的 Similarity 并覆盖 scorePayload() 方法来解码该提升,然后使用 PayloadTermQuery 这允许您为根据您在该学期的有效载荷中拥有的靴子得分.

But it should be posible using Lucene payloads (a byte array that can be set for every term). You would use them to set term specific boosts (vehicle to 0.5 for doc 1, for example). Then you'll implement your own Similarity and override scorePayload() method to decode that boost and then use PayloadTermQuery which allows you to contribute to the score based on the boots you have in the payload for that term.

这篇关于在构建索引时提升 Lucene 术语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆