弹性搜索标记“H& R Blocks” “H”,“R”,“H”和“R”,“块” [英] elasticsearch tokenize "H&R Blocks" as "H", "R", "H&R", "Blocks"
问题描述
我想保留令牌中的特殊字符,同时仍然标记特殊字符。说我有一个字
H& R Blocks
我想将其标记为
H,R ,H& R,块
尝试使用 word_delimiter
令牌过滤器
阅读文档的使用你设置参数 preserve_original:true
来完成你想要的(即H& R=> H& R
H
R
)
我将这样设置:
设置:{
pre>
分析:{
filter:{
special_character_spliter:{
type:word_delimiter,
preserve_original:true
},
analyzer:{
your_analyzer:{
type:custom,
tokenizer:whitespace
filter:[smallcase,special_character_spliter]
}
}
}
}
祝你好运!
I want to preserve the special character in the token, meanwhile still tokenize special characters. Say I have the word
"H&R Blocks"
I want to tokenize it as
"H", "R", "H&R", "Blocks"
I read this post http://www.fullscale.co/blog/2013/03/04/preserving_specific_characters_during_tokenizing_in_elasticsearch.html . It explained how to preserve the special character.
解决方案Try using the
word_delimiter
token filter.Reading the docs on its use you an set the parameter
preserve_original: true
to do exactly what you want (i.e. "H&R" =>H&R
H
R
).I would set it up like this:
"settings" : { "analysis" : { "filter" : { "special_character_spliter" : { "type" : "word_delimiter", "preserve_original": "true" } }, "analyzer" : { "your_analyzer" : { "type" : "custom", "tokenizer" : "whitespace", "filter" : ["lowercase", "special_character_spliter"] } } } }
Good luck!
这篇关于弹性搜索标记“H& R Blocks” “H”,“R”,“H”和“R”,“块”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!