建立指数时提升Lucene条款 [英] Boosting Lucene Terms When Building the Index

查看:96
本文介绍了建立指数时提升Lucene条款的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在创建索引时是否可以确定特定术语比其他术语更重要(而不是在查询时)?



例如考虑同义词过滤器: br>
doc 1:这是一辆好车

doc 2:这是一辆不错的车



我想要的将术语车辆添加到第一个文档,将术语汽车添加到第二个文档,
,但我希望如果稍后使用单词car查询索引,则第一个文档将得分高于第二个文档,如果查询车辆将是另一种方式。



在将字段添加到各自的文档之前调用字段上的setBoost可以解决问题吗?



或者我应该将同义词添加到不同的字段名称?



或者我是从错误的角度来看这个? / p>

谢谢

解决方案

对某个字段设置提升会影响其中的所有条款因此,这不适用于你的ca. se。



但它应该是可用的Lucene有效载荷(可以为每个术语设置的字节数组)。您可以使用它们来设置术语特定的提升(例如,对于文档1,车辆为0.5)。然后你将实现自己的 Similarity 并覆盖 scorePayload()方法来解码该提升,然后使用 PayloadTermQuery ,它允许您根据该术语的有效负载中的引导为分数做出贡献。


Is it possible to determine that specific terms are more important then other when creating the index (not when querying it) ?

Consider for example a synonym filter:
doc 1: "this is a nice car"
doc 2: "this is a nice vehicle"

I want to add the term vehicle to the first doc and the term car to the second doc, but I want that if later the index is queried with the word car then the first document will be scored higher then the second one and if queried for vehicle it will be the other way around.

Will calling setBoost on the fields before adding them to their respective documents do the trick?

Or maybe I should add the synonyms to a different field name?

Or am I looking at this from a wrong point of view ?

Thanks

解决方案

Setting boost on a filed affects all terms in that field so this wouldn't work in your case.

But it should be posible using Lucene payloads (a byte array that can be set for every term). You would use them to set term specific boosts (vehicle to 0.5 for doc 1, for example). Then you'll implement your own Similarity and override scorePayload() method to decode that boost and then use PayloadTermQuery which allows you to contribute to the score based on the boots you have in the payload for that term.

这篇关于建立指数时提升Lucene条款的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆