弹性搜索标记“H& R Blocks” “H”，“R”，“H”和“R”，“块” [英] elasticsearch tokenize "H&R Blocks" as "H", "R", "H&R", "Blocks"

查看：155 发布时间：2017/8/7 3:03:31 elasticsearch token tokenize

本文介绍了弹性搜索标记“H& R Blocks” “H”，“R”，“H”和“R”，“块”的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想保留令牌中的特殊字符，同时仍然标记特殊字符。说我有一个字

 H& R Blocks

我想将其标记为

 H，R ，H& R，块

我读了这篇文章 http://www.fullscale.co/blog/2013/03/04/preserving_specific_characters_during_tokenizing_in_elasticsearch.html 。

解决方案

尝试使用 word_delimiter 令牌过滤器

阅读文档的使用你设置参数 preserve_original：true 来完成你想要的（即H& R=> H& R H R ）

我将这样设置：

设置：{ 分析：{ filter：{ special_character_spliter：{ type：word_delimiter， preserve_original：true }， analyzer：{ your_analyzer：{ type：custom， tokenizer：whitespace filter：[smallcase，special_character_spliter] } } } } pre>

祝你好运！

I want to preserve the special character in the token, meanwhile still tokenize special characters. Say I have the word

"H&R Blocks"

I want to tokenize it as

"H", "R", "H&R", "Blocks"

I read this post http://www.fullscale.co/blog/2013/03/04/preserving_specific_characters_during_tokenizing_in_elasticsearch.html . It explained how to preserve the special character.

解决方案

Try using the word_delimiter token filter.

Reading the docs on its use you an set the parameter preserve_original: true to do exactly what you want (i.e. "H&R" => H&R H R).

I would set it up like this:

"settings" : {
    "analysis" : {
        "filter" : {
            "special_character_spliter" : {
                "type" : "word_delimiter",
                "preserve_original": "true"
            }   
        },
        "analyzer" : {
            "your_analyzer" : {
                "type" : "custom",
                "tokenizer" : "whitespace",
                "filter" : ["lowercase", "special_character_spliter"]
            }
        }
    }
}

Good luck!

这篇关于弹性搜索标记“H& R Blocks” “H”，“R”，“H”和“R”，“块”的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

弹性搜索标记“H& R Blocks” “H”，“R”，“H”和“R”，“块” [英] elasticsearch tokenize "H&R Blocks" as "H", "R", "H&R", "Blocks"

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

弹性搜索标记“H&amp; R Blocks” “H”，“R”，“H”和“R”，“块” [英] elasticsearch tokenize &quot;H&amp;R Blocks&quot; as &quot;H&quot;, &quot;R&quot;, &quot;H&amp;R&quot;, &quot;Blocks&quot;

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

弹性搜索标记“H& R Blocks” “H”，“R”，“H”和“R”，“块” [英] elasticsearch tokenize "H&R Blocks" as "H", "R", "H&R", "Blocks"

登录关闭