弹性搜索 - 提高基于字段值的相关性 [英] ElasticSearch -- boosting relevance based on field value

查看:145
本文介绍了弹性搜索 - 提高基于字段值的相关性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

需要在ElasticSearch中找到一种方法,以提高基于字段特定值的文档的相关性。具体来说,在我的所有文档中都有一个特殊字段,其中字段值越高,包含该文档的文档的相关性就越高,无论搜索如何。

Need to find a way in ElasticSearch to boost the relevance of a document based on a particular value of a field. Specifically, there is a special field in all my documents where the higher the field value is, the more relevant the doc that contains it should be, regardless of the search.

考虑以下文档结构:

{
    "_all" : {"enabled" : "true"},
    "properties" : {
        "_id":            {"type" : "string",  "store" : "yes", "index" : "not_analyzed"},
        "first_name":     {"type" : "string",  "store" : "yes", "index" : "yes"},
        "last_name":      {"type" : "string",  "store" : "yes", "index" : "yes"},
        "boosting_field": {"type" : "integer", "store" : "yes", "index" : "yes"}
        }
}

我希望booster_field值更高的文档本质上更相关比那些具有较低boosting_field值的那些。这只是一个起点 - 在确定搜索中每个文档的最终相关性分数时,也会考虑查询和其他字段之间的匹配。但是,所有其他方面相同,增强字段越高,文档越相关。

I'd like documents with a higher boosting_field value to be inherently more relevant than those with a lower boosting_field value. This is just a starting point -- the matching between the query and the other fields will also be taken into account in determining the final relevance score of each doc in the search. But, all else being equal, the higher the boosting field, the more relevant the document.

任何人都有这样的想法?

Anyone have an idea on how to do this?

非常感谢!

推荐答案

您可以在索引时间或查询时间提升。我通常更喜欢查询时间提升,即使它使查询有点慢,否则我需要重建索引,每次我想改变我的提升因素,这通常需要微调,需要非常灵活。

You can either boost at index time or query time. I usually prefer query time boosting even though it makes queries a little bit slower, otherwise I'd need to reindex every time I want to change my boosting factors, which usally need fine-tuning and need to be pretty flexible.

使用弹性搜索查询DSL应用查询时间提升有不同的方法:

There are different ways to apply query time boosting using the elasticsearch query DSL:

  • Boosting Query
  • Custom Filters Score Query
  • Custom Boost Factor Query
  • Custom Score Query

前三个查询是有用的,如果你想特别提升文档的wh匹配特定的查询或过滤器。例如,如果您只想提高上个月发布的文件。你可以用boosting_field来使用这种方法,但是你需要手动定义一些boosting_field间隔,并给它们提供不同的提升,这不是很好。

The first three queries are useful if you want to give a specific boost to the documents which match specific queries or filters. For example, if you want to boost only the documents published during the last month. You could use this approach with your boosting_field but you'd need to manually define some boosting_field intervals and give them a different boost, which isn't that great.

最好的解决方案是使用自定义分数查询,它允许您使用脚本进行查询并自定义其分数。这是非常强大的,脚本你可以直接修改分数本身。首先,我将boosting_field值缩放到0到1之间的值,以便您的最终得分不会成为大数。为了做到这一点,您需要预测什么是或多或少的这个字段可以包含的最小值和最大值。我们假设最小为0,最多为100000。如果将boosting_field值缩放到介于0和1之间的数字,则可以将结果添加到实际分数,如下所示:

The best solution would be to use a Custom Score Query, which allows you to make a query and customize its score using a script. It's quite powerful, with the script you can directly modify the score itself. First of all I'd scale the boosting_field values to a value from 0 to 1 for example, so that your final score doesn't become a big number. In order to do that you need to predict what are more or less the minimum and the maximum values that the field can contain. Let's say minimum 0 and maximum 100000 for instance. If you scale the boosting_field value to a number between 0 and 1, then you can add the result to the actual score like this:

{
    "query" : {
        "custom_score" : {
            "query" : {
                "match_all" : {}
            },
            "script" : "_score + (1 * doc.boosting_field.doubleValue / 100000)"
        }
    }
}

您还可以考虑使用boosting_field作为增强因子( _score * 而不是 _score + ),但是您需要将其缩放到最小值为1的间隔(只需添加+1)。

You can also consider to use the boosting_field as a boost factor (_score * rather than _score +), but then you'd need to scale it to an interval with minimum value 1 (just add a +1).

您甚至可以调整结果,以便更改其重要性,为您用于影响分数的值添加权重。如果您需要将多个增强因子结合在一起,以便给予不同的体重,那么您将需要更多。

You can even tune the result in order the change its importance adding a weight to the value that you use to influence the score. You are going to need this even more if you need to combine multiple boosting factors together in order to give them a different weight.

这篇关于弹性搜索 - 提高基于字段值的相关性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆