如何处理弹性搜索结构化查询中的通配符 [英] How to handle wildcards in elastic search structured queries
问题描述
我的用例需要使用尾随通配符查询我们的弹性搜索域。我想在查询中了解处理这种通配符的最佳做法。
你认为添加以下子句是查询的一个好习惯:
query:{
query_string:{
query:attribute:postfix * ,
analyze_wildcard:true,
allow_leading_wildcard:false,
use_dis_max:false
}
}
我不允许领先的通配符,因为它是一个沉重的操作。但是,我想从长远来看,为每个查询请求分析通配符有多好。我的理解是,如果查询实际上没有任何通配符,则分析通配符将不会有任何影响。这是正确的吗?
如果您有可能更改映射类型和索引设置,正确的方法是创建一个自定义分析器,其中包含 edge-n-gram令牌过滤器将索引属性
字段的所有前缀。
curl -XPUT http:// localhost:9200 / your_index -d'{
settings:{
analysis:{
filter:{
edge_filter:{
type:edgeNGram,
min_gram:1,
max_gram:15
}
},
analyzer:{
attr_analyzer:{
type:custom,
tokenizer:standard,
filter:[lowca se $
映射:{
your_type:{
properties:{
attribute:{
type:string,
analyzer:attr_analyzer,
search_analyzer:standard
}
}
}
}
}'
然后,当您索引文档时,属性
字段值(例如) postfixing
将被索引为以下令牌: p
, po
, pos
, post
, postf
, postfi
, postfix
, postfixi
, postfixin
, postfixing
。
最后,您可以轻松查询属性
使用简单的匹配
查询这样的postfix 值。不需要在查询字符串查询中使用效果不佳的通配符。
{
query:{
match:{
attribute:postfix
}
}
}
My use case requires to query for our elastic search domain with trailing wildcards. I wanted to get your opinion on the best practices of handling such wildcards in the queries.
Do you think adding the following clauses is a good practice for the queries:
"query" : {
"query_string" : {
"query" : "attribute:postfix*",
"analyze_wildcard" : true,
"allow_leading_wildcard" : false,
"use_dis_max" : false
}
}
I've disallowed leading wildcards since it is a heavy operation. However I wanted to how good is analyzing wildcard for every query request in the long run. My understanding is, analyze wildcard would have no impact if the query doesn't actually have any wildcards. Is that correct?
If you have the possibility of changing your mapping type and index settings, the right way to go is to create a custom analyzer with an edge-n-gram token filter that would index all prefixes of the attribute
field.
curl -XPUT http://localhost:9200/your_index -d '{
"settings": {
"analysis": {
"filter": {
"edge_filter": {
"type": "edgeNGram",
"min_gram": 1,
"max_gram": 15
}
},
"analyzer": {
"attr_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "edge_filter"]
}
}
}
},
"mappings": {
"your_type": {
"properties": {
"attribute": {
"type": "string",
"analyzer": "attr_analyzer",
"search_analyzer": "standard"
}
}
}
}
}'
Then, when you index a document, the attribute
field value (e.g.) postfixing
will be indexed as the following tokens: p
, po
, pos
, post
, postf
, postfi
, postfix
, postfixi
, postfixin
, postfixing
.
Finally, you can then easily query the attribute
field for the postfix
value using a simple match
query like this. No need to use an under-performing wildcard in a query string query.
{
"query": {
"match" : {
"attribute" : "postfix"
}
}
}
这篇关于如何处理弹性搜索结构化查询中的通配符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!