如何在Elasticsearch中同时搜索单数形式和复数形式的单词? [英] How to search both singular and plural form of word in elasticsearch?

查看:146
本文介绍了如何在Elasticsearch中同时搜索单数形式和复数形式的单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Q对象进行弹性查询,并且已经建立了索引文档,其中一个文档包含"jbl说话者很棒",但是我的查询中却包含"speaker"而不是说话者,我该如何使用查询字符串查找此文档./p>

我尝试了match_phrase,但是找不到该文档,当我尝试使用query_string时,抛出了一个错误,提示"query_string不支持某些键".我也尝试过通配符,但这也不能用于

之类的查询.

  {询问": {布尔":{必须": [{"match_phrase":{"prod_group":"06"}},{"match_phrase":{"prod_group":服装"}},{通配符":{"prod_cat_for_search":"+发言人*"}},{范围": {日期": {"gte":"2018-04-07"}}}]}}} 

  Q('match_phrase',prod_cat_for_search ='speaker') 

我希望输出文档包含发言人,但实际输出是没有包含发言人的文件

解决方案

要查找的搜索类型可以通过使用

对于上面映射中的 description 字段,我们将分析器用作 my_analyzer .该分析器将应用令牌过滤器小写 my_stemmer . my_stemmer 将对输入值应用 english .

例如如果我们将文档编入索引如下:

  {描述":"JBL扬声器完美地构建"} 

将被索引的令牌为:

  jbl扬声器建造和完美的 

通知<扬声器> 索引为<扬声器> 和<完美扬声器>完美扬声器.

现在,如果您搜索 speakers speaker ,两者都将匹配.同样,如果您搜索 perfect ,则上述文档将匹配.

为什么扬声器完美会匹配,这可能是您想到的一个问题.原因是默认情况下,弹性搜索会应用与在搜索时建立索引时所使用的分析器相同的分析器.因此,如果您搜索 perfection ,它将实际上是在搜索 perfect ,从而找到匹配项.

有关梗塞的更多信息.

I am making elastic query using Q object and I have indexed documents, one of the documents contains "jbl speakers are great", but my query has "speaker" instead of speakers how can I find this document with query string.

I have tried match_phrase but it is unable to find this document and when I had tried query_string it threw an error saying "query_string does not support for some key". I have also tried wildcard but that is also not working with query like

{
  "query": {
    "bool": {
      "must": [
        {
          "match_phrase": {
            "prod_group": "06"
          }
        },
        {
          "match_phrase": {
            "prod_group": "apparel"
          }
        },
        {
          "wildcard": {
            "prod_cat_for_search": "+speaker*"
          }
        },
        {
          "range": {
            "date": {
              "gte": "2018-04-07"
            }
          }
        }
      ]
    }
  }
}

Q('match_phrase', prod_cat_for_search='speaker')

I expect the output document containing speakers but actual output is no document containing speakers

解决方案

The type of search you are looking for can be achieved by using stemmer token filter at the time of indexing.

Lets see how it work using the example mapping as below:

PUT test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "filter": [
            "lowercase",
            "my_stemmer"
          ],
          "tokenizer": "whitespace"
        }
      },
      "filter": {
        "my_stemmer": {
          "type": "stemmer",
          "name": "english"
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "description": {
          "type": "text",
          "analyzer": "my_analyzer",
          "fields": {
            "keyword": {
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

For the field description in above mapping we have used analyzer as my_analyzer. This analyzer will apply token filters lowercase and my_stemmer. The my_stemmer will apply english stemming on the input value.

For e.g. if we index a document as below:

{
   "description": "JBL speakers build with perfection"
}

The tokens that will get indexed are:

jbl
speaker
build
with
perfect

Notice speakers is indexed as speaker and perfection as perfect.

Now if you search for speakers or speaker both will match. Similarly, if you search for perfect the above document will match.

Why speakers or perfection will match might be a question arising in your mind. The reason for this is that by default elastic search apply the same analyzer that was used while indexing at the time of searching as well. So if you search for perfection it will be actually searching for perfect and hence the match.

More on stemming.

这篇关于如何在Elasticsearch中同时搜索单数形式和复数形式的单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆