弹性搜索标记“H& R Blocks” “H”,“R”,“H”和“R”,“块” [英] elasticsearch tokenize "H&R Blocks" as "H", "R", "H&R", "Blocks"

查看:155
本文介绍了弹性搜索标记“H& R Blocks” “H”,“R”,“H”和“R”,“块”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想保留令牌中的特殊字符,同时仍然标记特殊字符。说我有一个字

 H& R Blocks

我想将其标记为

 H,R ,H& R,块

我读了这篇文章 http://www.fullscale.co/blog/2013/03/04/preserving_specific_characters_during_tokenizing_in_elasticsearch.html

解决方案

尝试使用 word_delimiter 令牌过滤器



阅读文档的使用你设置参数 preserve_original:true 来完成你想要的(即H& R=> H& R H R



我将这样设置:

 设置:{
分析:{
filter:{
special_character_spliter:{
type:word_delimiter,
preserve_original:true

},
analyzer:{
your_analyzer:{
type:custom,
tokenizer:whitespace
filter:[smallcase,special_character_spliter]
}
}
}
}
pre>

祝你好运!


I want to preserve the special character in the token, meanwhile still tokenize special characters. Say I have the word

"H&R Blocks"

I want to tokenize it as

"H", "R", "H&R", "Blocks"

I read this post http://www.fullscale.co/blog/2013/03/04/preserving_specific_characters_during_tokenizing_in_elasticsearch.html . It explained how to preserve the special character.

解决方案

Try using the word_delimiter token filter.

Reading the docs on its use you an set the parameter preserve_original: true to do exactly what you want (i.e. "H&R" => H&R H R).

I would set it up like this:

"settings" : {
    "analysis" : {
        "filter" : {
            "special_character_spliter" : {
                "type" : "word_delimiter",
                "preserve_original": "true"
            }   
        },
        "analyzer" : {
            "your_analyzer" : {
                "type" : "custom",
                "tokenizer" : "whitespace",
                "filter" : ["lowercase", "special_character_spliter"]
            }
        }
    }
}

Good luck!

这篇关于弹性搜索标记“H& R Blocks” “H”,“R”,“H”和“R”,“块”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆