Elasticsearch:查找子串匹配 [英] Elasticsearch: Find substring match

查看:183
本文介绍了Elasticsearch:查找子串匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想执行精确的字匹配和部分字/子串匹配。例如,如果我搜索男士剃须刀,那么我应该可以在结果中找到男士剃须刀。但是,如果我搜索en的剃须刀,那么我也应该能够在结果中找到男士剃须刀。
我使用以下设置和映射:



索引设置

  PUT / my_index 
{
设置:{
number_of_shards:1,
分析:{
filter:{
autocomplete_filter:{
type:edge_ngram,
min_gram:1,
max_gram:20
}
},
analyzer:{
autocomplete:{
type:custom,
tokenizer:standard,
过滤器:[
小写,
autocomplete_filter
]
}
}
}
}
}

映射:

  PUT / my_index / my_type / _mappin g 
{
my_type:{
properties:{
name:{
type:string,
index_analyzer:autocomplete,
search_analyzer:standard
}
}
}
}
/ pre>

插入记录:

  POST / my_index / my_type / _bulk 
{index:{_id:1}}
{name:男士剃须刀}
{index:{_id:2}}
{name:women's shaver}

查询:



1。要搜索完整的词组匹配 - >男士

  POST / my_index / my_type / _search 
{
query:{
match:{
name:men's
}
}
}

以上查询在返回结果中返回男士剃须刀。



2。通过部分词匹配搜索 - >en's

  POST / my_index / my_type / _search 
{
query:{
match:{
name:en's
}
}
}

以上查询不返回任何内容。



我也尝试过以下查询

  POST / my_index / my_type / _search 
{
查询:{
通配符:{
name:{
value:%en's%
}
}
}
}

仍然没有得到任何东西。
我认为是因为Index上没有找到partial word / sbustring match的edge_ngram类型过滤器。
我尝试过n-gram类型的过滤器,但它正在减慢搜索的速度。



请建议我如何实现两个排除短语匹配和

解决方案

要搜索部分字段匹配和完全匹配,如果您定义字段为未分析或作为关键字(而不是文本),然后使用通配符查询



另见此



要使用通配符查询,请在要搜索的字符串的两端追加*:

  POST / my_index / my_type / _search 
{
query:{
wildcard:{
name:{
value:* en'
}
}
}
}

case insensitivity ,使用具有小写过滤器和关键字标记器的自定义分析器。



自定义分析器

 custom_analyzer:{
tokenizer:keyword,
filter:[smallcase]
}

使搜索字符串小写



如果您将搜索字符串设为 AsD :将其更改为 * asd *


I want to perform both exact word match and partial word/substring match. For example if I search for "men's shaver" then I should be able to find "men's shaver" in the result. But in case case I search for "en's shaver" then also I should be able to find "men's shaver" in the result. I using following settings and mappings:

Index settings:

PUT /my_index
{
    "settings": {
        "number_of_shards": 1, 
        "analysis": {
            "filter": {
                "autocomplete_filter": { 
                    "type":     "edge_ngram",
                    "min_gram": 1,
                    "max_gram": 20
                }
            },
            "analyzer": {
                "autocomplete": {
                    "type":      "custom",
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "autocomplete_filter" 
                    ]
                }
            }
        }
    }
}

Mappings:

PUT /my_index/my_type/_mapping
{
    "my_type": {
        "properties": {
            "name": {
                "type":            "string",
                "index_analyzer":  "autocomplete", 
                "search_analyzer": "standard" 
            }
        }
    }
}

Insert records:

POST /my_index/my_type/_bulk
{ "index": { "_id": 1            }}
{ "name": "men's shaver" }
{ "index": { "_id": 2            }}
{ "name": "women's shaver" }

Query:

1. To search by exact phrase match --> "men's"

POST /my_index/my_type/_search
{
    "query": {
        "match": {
            "name": "men's"
        }
    }
}

Above query returns "men's shaver" in the return result.

2. To search by Partial word match --> "en's"

POST /my_index/my_type/_search
{
    "query": {
        "match": {
            "name": "en's"
        }
    }
}

Above query DOES NOT return anything.

I have also tried following query

POST /my_index/my_type/_search
{
    "query": {
        "wildcard": {
           "name": {
              "value": "%en's%"
           }
        }
    }
}

Still not getting anything. I figured it is because of "edge_ngram" type filter on Index which is not able to find "partial word/sbustring match". I tried "n-gram" type filter as well but it is slowing down the search alot.

Please suggest me how to achieve both excact phrase match and partial phrase match using same index setting.

解决方案

To search for partial field matches and exact matches, it will work better if you define the fields as "not analyzed" or as keywords (rather than text), then use a wildcard query.

See also this.

To use a wildcard query, append * on both ends of the string you are searching for:

POST /my_index/my_type/_search
{
"query": {
    "wildcard": {
       "name": {
          "value": "*en's*"
       }
    }
}
}

To use with case insensitivity, use a custom analyzer with a lowercase filter and keyword tokenizer.

Custom Analyzer:

"custom_analyzer": {
            "tokenizer": "keyword",
            "filter": ["lowercase"]
        }

Make the search string lowercase

If you get search string as AsD: change it to *asd*

这篇关于Elasticsearch:查找子串匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆