自定义分析器，用于打破特殊字符和小写/大写的令牌 [英] custom analyzer which breaks the tokens on special characters and lowercase/uppercase

查看：167 发布时间：2017/8/7 0:06:13 elasticsearch elasticsearch-plugin analyzer

本文介绍了自定义分析器，用于打破特殊字符和小写/大写的令牌的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试编写一个自定义分析器，它打破特殊字符的标记，并在索引之前将其转换成大写，如果我用小写搜索，我应该可以得到结果。

例如，如果我提供的数据@源代码 - 它应该用空格替换@任何特殊的字符，它应该替换为空格，并给我的结果像数据源。

这是我如何实现。

  PUT声音
 {
设置：{
分析：{
analyzer：{
my_analyzer：{
tokenizer：standard，
char_filter：[
my_char_filter
] ，
过滤器：[
大写
] 
} 
}，
char_filter：{
my_char_filter {
type：pattern_replace，
pattern：（\\d +） - （？= \\d），
replacement $ 1
} 
} 
} 
} 
} 
 
 
 POST声/ _analyze 
 {
analyzer：my_analyzer，
text：data-source& abc
}

它分开了令牌，如 -

  {
tokens： [
 {
token：DATA，
start_offset：0，
end_offset：4，
type < ALPHANUM>，
position：0 
}，
 {
token：SOURCE，
start_offset b $ bend_offset：11，
type：< ALPHANUM>，
position：1 
}，
 {
 ：ABC，
start_offset：12，
end_offset：15，
type：< ALPHANUM>，
position 
} 
] 
}

但是如果我用小写搜索甚至大写在这里，它不工作..像：

  GET sound / _search？text =data
 
 GET sound / _search？text =data
 
 GET / sound / _search 
 {
query：{
match ：{
text：data
} 
} 
}

如果我像上面的查询一样搜索，它不会给我结果。

解决方案

你只是需要稍微使用一些不同的语法用于您的搜索：

  GET sound / _search？q = data 
 
 GET sound / _search ？q = data 
 
 POST声/ _search 
 {
查询：{
match：{
NAME_OF_YOUR_FIELD：data 
} 
} 
}

NAME_OF_YOUR_FIELD 需要是您正在存储数据的字段的名称。更多关于这里匹配查询

I am trying to write a custom analyzer which breaks the token on special characters and convert it into uppercase before indexing and I should be able to get result if I search with lowercase also..

for example if I am giving data@source - it should replace @ with whitespace - any special character it should replace with whitespace and give me result like data source.

Here is how I tried implementing.

PUT sound
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "standard",
          "char_filter": [
            "my_char_filter"
          ],
          "filter": [
            "uppercase"
            ]
        }
      },
      "char_filter": {
        "my_char_filter": {
          "type": "pattern_replace",
          "pattern": "(\\d+)-(?=\\d)",
          "replacement": "$1 "
        }
      }
    }
  }
}


POST sound/_analyze
{
  "analyzer": "my_analyzer",
  "text": "data-source&abc"
}

It splits the tokens well , like -

{
   "tokens": [
      {
         "token": "DATA",
         "start_offset": 0,
         "end_offset": 4,
         "type": "<ALPHANUM>",
         "position": 0
      },
      {
         "token": "SOURCE",
         "start_offset": 5,
         "end_offset": 11,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "ABC",
         "start_offset": 12,
         "end_offset": 15,
         "type": "<ALPHANUM>",
         "position": 2
      }
   ]
}

But if I search with lowercase or even uppercase in this, it is not working.. like:

GET sound/_search?text="data"

GET sound/_search?text="data"

GET /sound/_search
{
  "query": {
    "match": {
      "text": "data"
    }
  }
}

It is not giving me the result if I search like the above queries..

解决方案

You just need to use some slightly different syntax for your searches:

GET sound/_search?q=data

GET sound/_search?q=data

POST sound/_search
{
  "query": {
    "match": {
      "NAME_OF_YOUR_FIELD": "data"
    }
  }
}

NAME_OF_YOUR_FIELD needs to be the name of the field you are storing your data in. More infor on the match query here

这篇关于自定义分析器，用于打破特殊字符和小写/大写的令牌的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

自定义分析器，用于打破特殊字符和小写/大写的令牌 [英] custom analyzer which breaks the tokens on special characters and lowercase/uppercase

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

自定义分析器，用于打破特殊字符和小写/大写的令牌 [英] custom analyzer which breaks the tokens on special characters and lowercase/uppercase

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭