令牌字符映射到Ngram过滤器ElasticSearch NEST [英] Token Chars Mapping to Ngram Filter ElasticSearch NEST

查看:152
本文介绍了令牌字符映射到Ngram过滤器ElasticSearch NEST的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用NEST复制以下映射,并且在将令牌字符映射到令牌生成器时遇到问题.

I'm trying to replicate the below mappings using NEST and facing an issue while mapping the token chars to the tokenizer.

{
   "settings": {
      "analysis": {
         "filter": {
            "nGram_filter": {
               "type": "nGram",
               "min_gram": 2,
               "max_gram": 20,
               "token_chars": [
                  "letter",
                  "digit",
                  "punctuation",
                  "symbol"
               ]
            }
         },
         "analyzer": {
            "nGram_analyzer": {
               "type": "custom",
               "tokenizer": "whitespace",
               "filter": [
                  "lowercase",
                  "asciifolding",
                  "nGram_filter"
               ]
            }
         }
      }
   }

我能够复制除令牌字符部分之外的所有内容.有人可以帮忙吗?下面是我的代码复制上面的映射.(令牌字符部分除外)

I was able to replicate everything except the token chars part. Can some one help in doing so. Below is my code replicating the above mappings. (except for the token chars part)

 var nGramFilters1 = new List<string> { "lowercase", "asciifolding", "nGram_filter" };
 var tChars = new List<string> { "letter", "digit", "punctuation", "symbol" };

    var createIndexResponse = client.CreateIndex(defaultIndex, c => c
                 .Settings(st => st
                 .Analysis(an => an
                 .Analyzers(anz => anz
                 .Custom("nGram_analyzer", cc => cc
                 .Tokenizer("whitespace").Filters(nGramFilters1)))
               .TokenFilters(tf=>tf.NGram("nGram_filter",ng=>ng.MinGram(2).MaxGram(20))))));

参考

  1. SO问题
  2. GitHub问题

推荐答案

NGram Tokenizer supports token characters (token_chars), using these to determine which characters should be kept in tokens and split on anything that isn't represented in the list.

NGram令牌过滤器 另一方面对令牌生成器生成的令牌进行操作,因此只有应生成的最小和最大克数选项.

NGram Token Filter on the other hand operates on the tokens produced by a tokenizer, so only has options for the min and max grams that should be produced.

根据您当前的分析链,您可能想要以下内容

Based on your current analysis chain, it's likely you want something like the following

var createIndexResponse = client.CreateIndex(defaultIndex, c => c
    .Settings(st => st
        .Analysis(an => an
            .Analyzers(anz => anz
                .Custom("ngram_analyzer", cc => cc
                    .Tokenizer("ngram_tokenizer")
                    .Filters(nGramFilters))
                )
            .Tokenizers(tz => tz
                .NGram("ngram_tokenizer", td => td
                    .MinGram(2)
                    .MaxGram(20)
                    .TokenChars(
                        TokenChar.Letter,
                        TokenChar.Digit,
                        TokenChar.Punctuation,
                        TokenChar.Symbol
                    )
                )          
            )
        )
    )
);

这篇关于令牌字符映射到Ngram过滤器ElasticSearch NEST的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆