如何处理弹性搜索结构化查询中的通配符 [英] How to handle wildcards in elastic search structured queries

查看:221
本文介绍了如何处理弹性搜索结构化查询中的通配符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的用例需要使用尾随通配符查询我们的弹性搜索域。我想在查询中了解处理这种通配符的最佳做法。



你认为添加以下子句是查询的一个好习惯:

 query:{
query_string:{
query:attribute:postfix * ,
analyze_wildcard:true,
allow_leading_wildcard:false,
use_dis_max:false
}
}

我不允许领先的通配符,因为它是一个沉重的操作。但是,我想从长远来看,为每个查询请求分析通配符有多好。我的理解是,如果查询实际上没有任何通配符,则分析通配符将不会有任何影响。这是正确的吗?

解决方案

如果您有可能更改映射类型和索引设置,正确的方法是创建一个自定义分析器,其中包含 edge-n-gram令牌过滤器将索引属性字段的所有前缀。

  curl -XPUT http:// localhost:9200 / your_index -d'{
settings:{
analysis:{
filter:{
edge_filter:{
type:edgeNGram,
min_gram:1,
max_gram:15
}
},
analyzer:{
attr_analyzer:{
type:custom,
tokenizer:standard,
filter:[lowca se $

映射:{
your_type:{
properties:{
attribute:{
type:string,
analyzer:attr_analyzer,
search_analyzer:standard
}
}
}
}
}'

然后,当您索引文档时,属性字段值(例如) postfixing 将被索引为以下令牌: p po pos post postf postfi postfix postfixi postfixin postfixing

最后,您可以轻松查询属性使用简单的匹配查询这样的postfix 值。不需要在查询字符串查询中使用效果不佳的通配符。

  {
query:{
match:{
attribute:postfix
}
}
}


My use case requires to query for our elastic search domain with trailing wildcards. I wanted to get your opinion on the best practices of handling such wildcards in the queries.

Do you think adding the following clauses is a good practice for the queries:

"query" : { 
    "query_string" : { 
        "query" :   "attribute:postfix*",
        "analyze_wildcard" : true,
        "allow_leading_wildcard" : false,
        "use_dis_max" : false
    } 
}

I've disallowed leading wildcards since it is a heavy operation. However I wanted to how good is analyzing wildcard for every query request in the long run. My understanding is, analyze wildcard would have no impact if the query doesn't actually have any wildcards. Is that correct?

解决方案

If you have the possibility of changing your mapping type and index settings, the right way to go is to create a custom analyzer with an edge-n-gram token filter that would index all prefixes of the attribute field.

curl -XPUT http://localhost:9200/your_index -d '{
    "settings": {
        "analysis": {
            "filter": {
                "edge_filter": {
                    "type": "edgeNGram",
                    "min_gram": 1,
                    "max_gram": 15
                }
            },
            "analyzer": {
                "attr_analyzer": {
                    "type": "custom",
                    "tokenizer": "standard",
                    "filter": ["lowercase", "edge_filter"]
                }
            }
        }
    },
    "mappings": {
        "your_type": {
            "properties": {
                "attribute": {
                    "type": "string",
                    "analyzer": "attr_analyzer",
                    "search_analyzer": "standard"
                }
            }
        }
    }
}'

Then, when you index a document, the attribute field value (e.g.) postfixing will be indexed as the following tokens: p, po, pos, post, postf, postfi, postfix, postfixi, postfixin, postfixing.

Finally, you can then easily query the attribute field for the postfix value using a simple match query like this. No need to use an under-performing wildcard in a query string query.

{
  "query": {
     "match" : {
        "attribute" : "postfix"
     }
  }
}

这篇关于如何处理弹性搜索结构化查询中的通配符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆