Elasticsearch的一个Custom Analyzer中有多个标记器 [英] Multiple tokenizers inside one Custom Analyser in Elasticsearch

查看:182
本文介绍了Elasticsearch的一个Custom Analyzer中有多个标记器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用具有ngram标记程序的Custom NGRAM Analyzer。我还使用了小写过滤器。该查询对于没有字符的搜索工作正常。但是当我搜索某些符号时,它失败了。由于我使用了小写的分词器,因此Elasticsearch不会分析符号。我知道空白令牌生成器可以帮助我解决问题。如何在一个分析器中使用两个标记器?映射如下:

I am using Custom NGRAM Analyzer which has a ngram tokenizer. I have also used lowercase filter. The query is working fine for searches without characters. But when I am searching for certain symbols, it fails. Since I have used lower case tokenizers, Elasticsearch doesn't analyse symbols. I know whitespace tokenizer can help me solve the issue. How can I use two tokenizers in a single analyzer?Below is the mapping:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer":"my_tokenizer",
          "filter":"lowercase"
        }
      },
      "tokenizer": {
        "my_tokenizer": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 3,
          "token_chars": [
            "letter", 
            "digit"
          ]
        }
      }
    }
  },
    "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "my_analyzer"
        }
      }
    }
  }

}

是否可以解决此问题?

推荐答案

根据elasticsearch的文档,

As per the documentation of elasticsearch,

一个分析器必须恰好有一个标记器。

但是,您可以在设置中定义多个分析器,并且可以为每个字段配置单独的分析器。

However, you can have multiple analyzer defined in settings, and you can configure separate analyzer for each field.

如果要使用不同的分析器来使用单个字段本身,一种选择是按照链接

If you want to have single field itself to be used using different analyzer, one of the option is to make that field multi-field as per this link

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "analyzer": "whitespace"
          "fields": {
            "ngram": { 
              "type":  "text",
              "analyzer": "my_analyzer"
            }
          }
        }
      }
    }
  }
}

因此,如果您按照上述配置进行查询,则需要使用 title title.ngram 字段。

So if you configure as above your query need to make use of title and title.ngram fields.

GET my_index/_search
{
  "query": {
    "multi_match": {
      "query": "search @#$ whatever",
      "fields": [ 
        "title",
        "title.ngram"
      ],
      "type": "most_fields" 
    }
  }
}

作为另一种选择,这是您可以做的

As another option, here is what you can do


  • 创建两个索引。

  • 第一个索引的字段为 title ,分析器为 my_analyzer

  • 第二个索引具有字段 title 和分析器 whitespace

  • 为它们两个都创建了相同的别名,如下所示

  • Create two indexes.
  • The first index have field title with analyzer my_analyzer
  • Second index have field title with analyzer whitespace
  • Have same alias created for both of them as below

执行如下:

POST _aliases
{  
   "actions":[  
      {  
         "add":{  
            "index":"index A",
            "alias":"index"
         }
      },
      {  
         "add":{  
            "index":"index B",
            "alias":"index"
         }
      }
   ]
}

因此,当您最终编写查询时,它必须指向该别名,而该别名又将查询多个索引。

So when you eventually write a query, it must be pointing to this alias which in turn would be querying multiple indexes.

希望这会有所帮助!

这篇关于Elasticsearch的一个Custom Analyzer中有多个标记器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆