Wilcard搜索或Elastic搜索中的部分匹配 [英] Wilcard search or partial matching in Elastic search
问题描述
我正在尝试向最终用户提供搜索类型,因为它们更像是sqlserver。
我能够为给定的sql方案实现ES查询:
I am trying to provide the search to end user with type as they go which is is more like sqlserver. I was able to implement ES query for the given sql scenario:
select * from table where name like '%pete%' and type != 'xyz and type!='abc'
但ES查询不为这个sql查询工作
But the ES query doesnt work for this sql query
select * from table where name like '%peter tom%' and type != 'xyz and type!='abc'
在我的弹性搜索中,通配符查询我还需要执行一些布尔过滤查询
In my elastic search alongwith the wildcard query i also need to perform some boolean filtered query
{
"query": {
"filtered": {
"filter": {
"bool": {
"should": [
{
"query": {
"wildcard": {
"name":
{ "value": "*pete*" }
}
}
}
],
"must_not": [
{
"match":
{ "type": "xyz" }
}, {
"match":
{ "type": "abc" }
}
]
}
}
}
}
}
上述弹性查询与通配符搜索工作正常得到我所有与pete匹配的文档,不是xyz和abc类型的文档。但是当我尝试使用2分隔字符的空格执行通配符时,相同的查询将返回空白,如下所示。例如
The above elastic query with wildcard search works fine and gets me all the documents that matches pete and are not of type xyz and abc .But when i try perform the wildcard with 2 seprate words seprated by space then the same query returns me empty as shown below.For example
{
"query": {
"filtered": {
"filter": {
"bool": {
"should": [
{
"query": {
"wildcard": {
"name":
{ "value": "*peter tom*" }
}
}
}
],
"must_not": [
{
"match":
{ "type": "xyz" }
}, {
"match":
{ "type": "abc" }
}
]
}
}
}
}
}
我的映射如下:
{
"properties": {
"name": {
"type": "string"
}
"type": {
"type": "string"
}
}
}
为了让空格分隔的单词可以进行通配符搜索,我应该使用t查询。
What query should i use in order to make wild card search possible for words seprated by spaces
推荐答案
最有效的解决方案是利用 ngram tokenizer 以便标记部分名称
字段。例如,如果您有一个名称,如 peter tomson
,则ngram tokenizer将进行标记和索引,如下所示:
The most efficient solution involves leveraging an ngram tokenizer in order to tokenize portions of your name
field. For instance, if you have a name like peter tomson
, the ngram tokenizer will tokenize and index it like this:
- pe
- pet
- pete
- peter
- peter t
- peter to
- 彼得·汤姆
- 彼得·托姆斯>
- peter tomso
- eter tomson
- ter tomson
- er tomson
- r tomson
- tomson
- tomson
- omson
- mson
- 儿子
- on
- pe
- pet
- pete
- peter
- peter t
- peter to
- peter tom
- peter toms
- peter tomso
- eter tomson
- ter tomson
- er tomson
- r tomson
- tomson
- tomson
- omson
- mson
- son
- on
所以,当这被索引时,搜索任何这些令牌将检索您的文档,其中包含 peter thomson
。
So, when this has been indexed, searching for any of those tokens will retrieve your document with peter thomson
in it.
让我们创建索引:
PUT likequery
{
"settings": {
"analysis": {
"analyzer": {
"my_ngram_analyzer": {
"tokenizer": "my_ngram_tokenizer"
}
},
"tokenizer": {
"my_ngram_tokenizer": {
"type": "nGram",
"min_gram": "2",
"max_gram": "15"
}
}
}
},
"mappings": {
"typename": {
"properties": {
"name": {
"type": "string",
"fields": {
"search": {
"type": "string",
"analyzer": "my_ngram_analyzer"
}
}
},
"type": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
然后,您可以使用简单高效的术语
查询:
You'll then be able to search like this with a simple and very efficient term
query:
POST likequery/_search
{
"query": {
"bool": {
"should": [
{
"term": {
"name.search": "peter tom"
}
}
],
"must_not": [
{
"match": {
"type": "xyz"
}
},
{
"match": {
"type": "abc"
}
}
]
}
}
}
这篇关于Wilcard搜索或Elastic搜索中的部分匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!