弹性搜索:按整体标签重量进行搜索/排序 [英] Elasticsearch: search/order by overall tag weight

查看:114
本文介绍了弹性搜索:按整体标签重量进行搜索/排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须解决一个令我非常基本的弹性搜索问题的问题。



我有一组对象 - 每个对象都有一组标签。喜欢:

  obj_1 = [a,b,c] 
obj_2 = [a ,b]
obj_3 = [c,b]

我想使用加权标签搜索对象。例如:

  search_tags = {'a':1.0,'c':1.5} 

我希望搜索标记是OR查询。那就是 - 我不想排除没有所有查询标签的文档。但是我希望他们被最重的人排序(排序:每个匹配的标签乘以它的权重)。



使用上面的例子返回的结果将是:




  • obj_1(score:1.0 + 1.5)

  • obj_3 :1.5)

  • obj_2(score:1.0)



这关于文档的结构和查询ES的正确方法?



这里有一个类似的问题:弹性搜索 - 标记强度(嵌套/小孩文档提升),只有我不想指定索引时的权重 - 我希望在搜索时完成。



我目前的设置如下。



对象:

  [
title:1,tags:[a,b,c],
title:2标签:[a,b],
title:3,tags:[c,b],
title ,tags:[b]
]

我的查询: p

  {
查询:{
custom_filters_score:{
query
条款:{
标签:[a,c],
minimum_match:1
}
},
过滤器:[
{filter:{term:{tags:a}},boost:1.0},
{filter:{term:{tags:c }},boost:1.5}
],
score_mode:total
}
}
}

问题是它只返回对象1和3.它应该匹配对象2(有标签a),或者我是做错了什么?



建议更新



好的。更改为脚本以计算最小值。删除最小匹配。我的要求:

  {
查询:{
custom_filters_score:{
query:{
terms:{
tags:[a,c]
}
},
filters [
{filter:{term:{tags:a}},script:1.0},
{filter:{term tag:c}},script:1.5}
],
score_mode:total
}
}
}

回应:

 code $ {
_shards:{
failed:0,
success:5,
total:5
}
hits:{
hits:[
{
_id:3,
_index:test,
_score:0.23837921,
_source:{
tags:[
c,
b
],
title:3
},
_type:bit
},
{
_id:1,
_index:test,
_score:0.042195037,
_source:{
tags
a,
b,
c
],
title:1
},
_type:bit
}
],
max_score:0.23837921,
total:2
},
timed_out :false,
taken:3
}

订单仍然出错一个结果缺失。 obj_1应该在obj_3之前(因为它有两个标签),而obj_2仍然完全丢失。这是怎么回事?

解决方案

我的例子有2个问题。


  1. a术语是一个停用词,因此被丢弃,只有c术语被使用。

  2. custom_filters_score查询必须包含constant_score查询,以便所有条款在升级前具有相同的权重。

现在它的作品!


I have to solve a problem that exeeds my very basic knowhow of elasticsearch.

I have a set of objects - each one has a set of tags. Like:

obj_1 = ["a", "b", "c"]
obj_2 = ["a", "b"]
obj_3 = ["c", "b"]

I want to search the objects using weighted tags. For example:

search_tags = {'a': 1.0, 'c': 1.5}

I want the search tags to be an OR query. That is - I don't want to exclude documents that don't have all of the queried tags. But I want them to be ordered by the one that has the most weight (sort of: each matched tag multiplied by its weight).

Using the example above the order of the ducuments returned would be:

  • obj_1 (score: 1.0+1.5)
  • obj_3 (score: 1.5)
  • obj_2 (score: 1.0)

What would be the best approach to this regarding the document's structure and the correct way to query ES?

There is a similar question here: Elastic search - tagging strength (nested/child document boosting) only that I do not want to specify the weight when indexing - I want it done when searching.

My current setup is as follows.

The objects:

[
   "title":"1", "tags" : ["a", "b", "c"],
   "title":"2", "tags" : ["a", "b"],
   "title":"3", "tags" : ["c", "b"],
   "title":"4", "tags" : ["b"]
]

And my query:

{ 
    "query": {
        "custom_filters_score": {
            "query": { 
                "terms": {
                    "tags": ["a", "c"],
                    "minimum_match": 1
                }
            },
            "filters": [
                {"filter":{"term":{"tags":"a"}}, "boost":1.0},    
                {"filter":{"term":{"tags":"c"}}, "boost":1.5}    
            ],
            "score_mode": "total"
        }
    }
}

The problem is that it only returns object 1 and 3. It should match object 2 (has tag "a") as well, or am I doing something wrong?

UPDATE AS SUGGESTED

Ok. Changed boost to script to calculate the minimum. Removed minimum match. My request:

{
    "query": {
        "custom_filters_score": {
            "query": {
                "terms": {
                    "tags": ["a", "c"]
                }
            },
            "filters": [
                {"filter":{"term":{"tags":"a"}}, "script":"1.0"},
                {"filter":{"term":{"tags":"c"}}, "script":"1.5"}
            ],
            "score_mode": "total"
        }
    }
}

Response:

{
    "_shards": {
        "failed": 0,
        "successful": 5,
        "total": 5
    },
    "hits": {
        "hits": [
            {
                "_id": "3",
                "_index": "test",
                "_score": 0.23837921,
                "_source": {
                    "tags": [
                        "c",
                        "b"
                    ],
                    "title": "3"
                },
                "_type": "bit"
            },
            {
                "_id": "1",
                "_index": "test",
                "_score": 0.042195037,
                "_source": {
                    "tags": [
                        "a",
                        "b",
                        "c"
                    ],
                    "title": "1"
                },
                "_type": "bit"
            }
        ],
        "max_score": 0.23837921,
        "total": 2
    },
    "timed_out": false,
    "took": 3
}

Still getting wrong order and one result missing. obj_1 should be before obj_3 (because it has both tags) and obj_2 is still missing completely. How can this be?

解决方案

There were 2 problems with my example.

  1. The "a" term is a stopword so it was discarded and only "c" term was being used.
  2. The custom_filters_score query has to include "constant_score" query so that all terms have the same weight before boosting.

Now it works!

这篇关于弹性搜索:按整体标签重量进行搜索/排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆