带有前缀查询的Elastic Search不区分大小写查询 [英] Elastic Search Case Insensitive query with prefix query

查看:153
本文介绍了带有前缀查询的Elastic Search不区分大小写查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是弹性搜索的新手。我在下面查询

I am new to elastic search. I have below query

GET deals2/_search 
{
  "size": 200,
  "_source": ["acquireInfo"],
   "query": {
    "bool": {

      "must": [
        {

         "query_string": {
           "fields": ["acquireInfo.company_name.keyword"],
           "query": "az*"
         }
        }
      ]
    }
  }

}

在这里我希望Elastic应该给出不区分大小写的结果,例如字符串以下面的开头,例如

Here I want Elastic should gives results like case insensitive Like string start with below like

"Az" 
"AZ" 
"az"
"aZ"
"Az"

但是我没有像这样得到所有结果。因此,任何人都可以帮助我。

But I am not getting all results like this way. So Anyone can please help me on that.

示例:-我有4个文档

1)Aziia Avto Ust-Kamenogorsk OOO 
2)AZ Infotech Inc 
3)AZURE Midstream Partners LP 
4)State Oil Fund of the Republic of Azerbaijan

现在搜索 az 时,应仅返回前3个文档作为它们以 az 开头,忽略这里的大小写,而不是第4个,后者也有 az 而不是开头。

Now searching on az , should return only first 3 docs as they start with az ignoring case here and not the 4th one, which also has az but not at the beginning.

推荐答案

这是在您使用关键字字段为 company_name 在您的应用程序中。

This is happening as you are using the keyword field to index the company_name in your application.

关键字分析器是一个空分析器,它以单个令牌的形式返回整个输入字符串,例如公司名称,由 foo Foo fOo 组成仅使用大小写存储并搜索 foo ,将仅匹配 foo 作为弹性搜索终极y适用于令牌匹配(区分大小写)。

The keyword analyzer is a "noop" analyzer which returns the entire input string as a single token for example, company name, consist of foo, Foo, fOo will be stored with case only and searching for foo, will only match foo as elastic search ultimately works on tokens match(which is case sensitive).

您需要的是使用标准分析器或其他自定义分析器,它也可以解决您的其他用例并使用小写标记过滤器并使用 match 查询,该查询已被分析并使用相同的分析器用来对字段建立索引,这样您的搜索查询将生成相同的令牌,该令牌存储在索引中,并且您的搜索将变成大小写

What you need is to use a standard analyzer or some other custom analyzer which solves your other use-cases as well and uses lowercase token filter on the field and use the match query which is analyzed, and uses the same analyzer which is used to index the field, this way your search query will generate the same tokens, which is stored in the index and your search will become case-insensitive.

编辑:在聊天中与用户进行了讨论,并更新了满足其要求的答案,具体如下:-

Had a discussion with the user in chat and updating the answer to suit his requirements, which are below:-

步骤1:-定义索引的设置和映射。

端点:-http :// {{hostname}}:{{port}} / {{index}}

Endpoint :- http://{{hostname}}:{{port}}/{{index}}

{
  "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "char_filter": [],
          "filter": "lowercase"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "company_name": {
        "type": "keyword",
        "normalizer": "my_normalizer"
      }
    }
  }
}

步骤2:索引所有文档

端点:http:// {{hostname}}:{{port}} / {{index}} / _doc /-> 1,2,3,4等等

Endpoint: http://{{hostname}}:{{port}}/{{index}}/_doc/ --> 1,2,3,4 etc

{
    "company_name" : "State Oil Fund of the Republic of Azerbaijan"
}

Step3:-搜索查询

Step3 :- Search query

端点:-http:// {{hostname}}:{{port}} / {{index}} / _search

Endpoint:- http://{{hostname}}:{{port}}/{{index}}/_search

{ "query": {
    "prefix" : { "company_name" : "az" }
  }
}

这将带来以下预期结果:-

This would bring the below expected results:-

{
    "took": 870,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 3,
            "relation": "eq"
        },
        "max_score": 1,
        "hits": [
            {
                "_index": "prerfixsearch",
                "_type": "_doc",
                "_id": "2ec9df0fc-dc04-47bb-914f-91a9f20d09efd15f2506-293f-4fb2-bdc3-925684a930b5",
                "_score": 1,
                "_source": {
                    "company_name": "AZ Infotech Inc"
                }
            },
            {
                "_index": "prerfixsearch",
                "_type": "_doc",
                "_id": "160d01183-a308-4408-8ac1-a85da950f285edefaca2-0b68-41c6-ba34-21bbef57f84f",
                "_score": 1,
                "_source": {
                    "company_name": "Aziia Avto Ust-Kamenogorsk OOO"
                }
            },
            {
                "_index": "prerfixsearch",
                "_type": "_doc",
                "_id": "1da878175-7db5-4332-baa7-ac47bd39b646f81c1770-7ae1-4536-baed-0a4f6b20fa38",
                "_score": 1,
                "_source": {
                    "company_name": "AZURE Midstream Partners LP"
                }
            }
        ]
    }
}

说明:,因为早期的OP没有提及在搜索结果中排除第4个文档,所以我建议创建一个文本字段,以便生成个人令牌,但现在因为只需要前缀搜索,所以我们不需要单个令牌,我们只需要1个令牌,但是应该将其小写以支持不区分大小写的搜索,这就是我在 company_name 字段。

Explanation:, As earlier OP didn;t mention the exclusion of 4th doc in the search result, that's the reason I suggested creating a text field, so that individuals tokens are generated but now as requirement is only the prefix search, we don't need the individual tokens and we would want only 1 token but it should be lowercased to support the case insensitive search, that's the reason I applied the custom normalizer on company_name field.

这篇关于带有前缀查询的Elastic Search不区分大小写查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆