弹性搜索 - 用连字符搜索 [英] ElasticSearch - Searching with hyphens

查看:144
本文介绍了弹性搜索 - 用连字符搜索的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

弹性搜索1.6

我想索引包含连字符的文本,例如U-12,U-17,WU-12,T恤...并且能够使用简单查询字符串查询来搜索它们。

I want to index text that contains hyphens, for example U-12, U-17, WU-12, t-shirt... and to be able to use a "Simple Query String" query to search on them.

数据样本(简化):

{"title":"U-12 Soccer",
 "comment": "the t-shirts are dirty"}

由于已经有很多关于连字符的问题,我已经尝试了以下解决方案:

As there are quite a lot of questions already about hyphens, I tried the following solution already:

使用字符过滤器: ElasticSearch - 使用连字符搜索名称

所以我去了这个映射:

{
  "settings":{
    "analysis":{
      "char_filter":{
        "myHyphenRemoval":{
          "type":"mapping",
          "mappings":[
            "-=>"
          ]
        }
      },
      "analyzer":{
        "default":{
          "type":"custom",
          "char_filter":  [ "myHyphenRemoval" ],
          "tokenizer":"standard",
          "filter":[
            "standard",
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings":{
    "test":{
      "properties":{
        "title":{
          "type":"string"
        },
        "comment":{
          "type":"string"
        }
      }
    }
  }
}

搜索使用以下查询完成:

Searching is done with the following query:

{"_source":true,
  "query":{
    "simple_query_string":{
      "query":"<Text>",
      "default_operator":"AND"
    }
  }
}




  1. 有什么工作:

  1. What works:

U-12,U *,t *,ts *

"U-12", "U*", "t*", "ts*"

没有工作:

U- *,u-1 *,t- *,t -sh *,...

"U-*", "u-1*", "t-*", "t-sh*", ...

所以似乎char搜索不是在搜索字符串上执行?
我可以做些什么来使这项工作?

So it seems the char filter is not executed on search strings? What could I do to make this work?

推荐答案

答案很简单:

Igor Motov的报价:配置标准tokenizer

Quote from Igor Motov: Configuring the standard tokenizer


默认情况下,simple_query_string查询不会用通配符分析单词
。因此,它搜索以
i-ma开始的所有令牌。单词i-mac与此请求不匹配,因为在
分析期间,它分为两个令牌i和mac,而这些
令牌都不会以i-ma开头。为了使这个查询找到i-mac你
需要使其分析通配符:

By default the simple_query_string query doesn't analyze the words with wildcards. As a result it searches for all tokens that start with i-ma. The word i-mac doesn't match this request because during analysis it's split into two tokens i and mac and neither of these tokens starts with i-ma. In order to make this query find i-mac you need to make it analyze wildcards:



{
  "_source":true,
  "query":{
    "simple_query_string":{
      "query":"u-1*",
      "analyze_wildcard":true,
      "default_operator":"AND"
    }
  }
}

这篇关于弹性搜索 - 用连字符搜索的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆