通过整数字段提升结果 [英] boost results by integer field

查看:57
本文介绍了通过整数字段提升结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建目的地并自动完成目的地,并且希望通过受欢迎程度整数字段来提高搜索结果.

I'm trying to create and autocomplete of destinations and i want to boost results by a popularity integer field.

我正在尝试使用此function_score查询

i'm trying with this function_score query

'query' => [
                'function_score' => [
                    'query' => [
                        "bool" => [
                            "should" => [   
                                 [
                                    "multi_match"=>[
                                        "query"=>$text,
                                        "fields"=>[
                                           "destination_name_*"
                                        ],
                                        "type"=>"most_fields",
                                        "boost" => 2
                                    ]
                                ],
                                [
                                    "multi_match"=>[
                                        "query"=>$text,
                                        "fields"=>[
                                           "destination_name_*"
                                        ],
                                        "fuzziness" => "1",
                                        "prefix_length"=> 2                                   
                                    ]
                                ],
                                [
                                    "multi_match"=>[
                                        "query"=>$text,
                                        "fields"=>[
                                           "destination_name_*.exact"
                                        ],
                                        "boost" => 2                                   
                                    ]
                                ]
                            ]
                        ]
                    ],
                    'field_value_factor' => [
                        'field'=>'popularity'
                    ]
                ],
            ],

映射和设置:

'settings' => [ 
                'analysis' => [     
                    'filter' =>  [
                        'ngram_filter' => [
                            'type' => 'edge_ngram',
                            'min_gram' => 2,
                            'max_gram' => 20,
                        ]
                    ],
                    'analyzer' => [
                        'ngram_analyzer' => [
                            'type'      => 'custom',
                            "tokenizer" => "standard",
                            'filter'    => ['lowercase', 'ngram_filter'],
                        ]

                    ]
                ],   
            ],
            'mappings' =>[
                'doc' => [
                    "properties"=> [
                        "destination_name_en"=> [
                           "type"=> "text",
                           "term_vector"=> "yes",
                           "analyzer"=> "ngram_analyzer",
                           "search_analyzer"=> "standard",
                           "fields" => [
                                "exact" => [
                                    "type" => "text",
                                    "analyzer" => "standard"
                                ]
                           ]
                        ],
                        "destination_name_es"=> [
                           "type"=> "text",
                           "term_vector"=> "yes",
                           "analyzer"=> "ngram_analyzer",
                           "search_analyzer"=> "standard",
                           "fields" => [
                                "exact" => [
                                    "type" => "text",
                                    "analyzer" => "standard"
                                ]
                           ]
                        ],
                        "destination_name_pt"=> [
                           "type"=> "text",
                           "term_vector"=> "yes",
                           "analyzer"=> "ngram_analyzer",
                           "search_analyzer"=> "standard",
                           "fields" => [
                                "exact" => [
                                    "type" => "text",
                                    "analyzer" => "standard"
                                ]
                           ]
                        ],
                        "popularity"=> [
                           "type"=> "integer",
                        ]
                    ]
                ]
            ] 

我将坎昆的流行度值设置为10,当我开始写"ca"时,第一个选项是坎昆.这项工作符合预期...

I set to 10 the value of popularity in cancún and when I start I write "ca" the first option is cancún. This work as expected ...

但是,当我尝试找到其他流行度值为0的城市(如Puerto Vallarta)时,问题就来了.当我写"Puerto Va"时,我得到以下结果:

But the problem comes when I try to find other city where the popularity value is 0 like Puerto Vallarta. When I write "Puerto Va" I obtain the following results:

1.-瓦尔达奥斯塔 2.波多黎各·洛佩兹(Puerto Lopez) 3.-布里斯托尔-弗吉尼亚州 还有很多其他...(但不是vallarta港)

1.-Val d´Aosta 2.-Puerto Lopez 3.-Bristol - VA and many others ... (But not puerto vallarta)

需要强调的是,在此查询中将列出功能分数和field_value_factor,以期达到预期效果(返回第一个位于vallarta的位置).

It is important to emphasize that whitout the funtion score and field_value_factor this query works how to expect (return in the first position puerto vallarta.)

我想用整数值增加热门城市的容量.

I want to add the capacity of boost popular cities with a integer value.

有什么建议吗?

谢谢!

推荐答案

默认情况下,您的field_value_factor将自然得分乘以字段popularity的值.因此,如果Puerto Vallarta的值为0,则其分数将始终为0.它将匹配,但永远不会出现在第一个结果中.

By default, your field_value_factor will multiply the natural score by the value of the field popularity. So if the value is 0 for Puerto Vallarta then its score will always be 0. It will match but will never be in the first results.

加上您的提升将是线性的,这肯定不是您想要的,因为热门城市将完全压倒结果列表.

Plus your boost will be linear, it's certainly not what you want since popular cities will completely overwhelm the results list.

然后,您应该使用字段值因子

You should then use the property modifier of the field value factor doc here.

如果将其设置为log2p,它将可以正常工作.在应用对数函数之前,修饰符log2p将为popularity字段的值加2.这样,在2个受欢迎的城市和4个受欢迎的城市之间的提升差异就很明显了.但是随着人气分数的提高,差异会减小

If you set it to log2p it should work as expected. The modifier log2p will add 2 to the value of the popularity field, before applying a log function. Then the difference in boost between a 2 popularity city and a 4 will be sensible. But the difference will decrease when the popularity score rise

Ex:

popularity 2 => log(4) => 0.6
popularity 4 => log(6) => 0.77
popularity 20 => log(22) => 1.34
popularity 22 => log(24) => 1.38

将此添加到您的查询中:

Add this to your query :

                'field_value_factor' => [
                    'field'=>'popularity',
                    'modifier' => 'log2p' <== add this
                ]

这篇关于通过整数字段提升结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆