自定义分析器,用于打破特殊字符和小写/大写的令牌 [英] custom analyzer which breaks the tokens on special characters and lowercase/uppercase
问题描述
例如,如果我提供的数据@源代码 - 它应该用空格替换@任何特殊的字符,它应该替换为空格,并给我的结果像数据源。
这是我如何实现。
PUT声音
{
设置:{
分析:{
analyzer:{
my_analyzer:{
tokenizer:standard,
char_filter:[
my_char_filter
] ,
过滤器:[
大写
]
}
},
char_filter:{
my_char_filter {
type:pattern_replace,
pattern:(\\d +) - (?= \\d),
replacement $ 1
}
}
}
}
}
POST声/ _analyze
{
analyzer:my_analyzer,
text:data-source& abc
}
它分开了令牌,如 -
{
tokens: [
{
token:DATA,
start_offset:0,
end_offset:4,
type < ALPHANUM>,
position:0
},
{
token:SOURCE,
start_offset b $ bend_offset:11,
type:< ALPHANUM>,
position:1
},
{
:ABC,
start_offset:12,
end_offset:15,
type:< ALPHANUM>,
position
}
]
}
但是如果我用小写搜索甚至大写在这里,它不工作..像:
GET sound / _search?text =data
GET sound / _search?text =data
GET / sound / _search
{
query:{
match :{
text:data
}
}
}
如果我像上面的查询一样搜索,它不会给我结果。
你只是需要稍微使用一些不同的语法用于您的搜索:
GET sound / _search?q = data
GET sound / _search ?q = data
POST声/ _search
{
查询:{
match:{
NAME_OF_YOUR_FIELD:data
}
}
}
NAME_OF_YOUR_FIELD
需要是您正在存储数据的字段的名称。更多关于这里匹配查询
I am trying to write a custom analyzer which breaks the token on special characters and convert it into uppercase before indexing and I should be able to get result if I search with lowercase also..
for example if I am giving data@source - it should replace @ with whitespace - any special character it should replace with whitespace and give me result like data source.
Here is how I tried implementing.
PUT sound
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "standard",
"char_filter": [
"my_char_filter"
],
"filter": [
"uppercase"
]
}
},
"char_filter": {
"my_char_filter": {
"type": "pattern_replace",
"pattern": "(\\d+)-(?=\\d)",
"replacement": "$1 "
}
}
}
}
}
POST sound/_analyze
{
"analyzer": "my_analyzer",
"text": "data-source&abc"
}
It splits the tokens well , like -
{
"tokens": [
{
"token": "DATA",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "SOURCE",
"start_offset": 5,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "ABC",
"start_offset": 12,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 2
}
]
}
But if I search with lowercase or even uppercase in this, it is not working.. like:
GET sound/_search?text="data"
GET sound/_search?text="data"
GET /sound/_search
{
"query": {
"match": {
"text": "data"
}
}
}
It is not giving me the result if I search like the above queries..
You just need to use some slightly different syntax for your searches:
GET sound/_search?q=data
GET sound/_search?q=data
POST sound/_search
{
"query": {
"match": {
"NAME_OF_YOUR_FIELD": "data"
}
}
}
NAME_OF_YOUR_FIELD
needs to be the name of the field you are storing your data in. More infor on the match query here
这篇关于自定义分析器,用于打破特殊字符和小写/大写的令牌的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!