如何配置Elasticsearch以在单词的最后一个字符（而不是中间）找到子字符串？ [英] How do I configure Elasticsearch to find substrings at the beginning OR at the end of a word (but not in middle)?

查看：432 发布时间：2017/8/7 5:07:53 elasticsearch

本文介绍了如何配置Elasticsearch以在单词的最后一个字符（而不是中间）找到子字符串？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我开始学习弹性搜索，现在我试图写我的第一个分析仪配置。我想要实现的是，如果它们在一个单词的开头或结尾，则会发现子字符串。如果我有stackoverflow这个词，我搜索stack我想找到它，当我搜索flow我想找到它，但是我不想想要找到它搜索ackov（在我的用例情况下这是没有意义的）。

我知道有边缘n克标记器，但一个分析器只能有一个标记器和边缘n-gram可以是前面还是后面（但不能同时两个）。

如果我理解正确，应用两个版本的过滤器（正面和背面）分析器，那么我也找不到，因为这两个过滤器都需要返回true，不是吗？因为堆栈不会在单词的结尾，所以后边缘格式过滤器将返回false，并且将找不到stackoverflow一词。

那么，如何配置我的分析器来找到一个单词的结尾或开头的子串，而不是在中间？

解决方案

可以做的是定义两个分析器，一个用于在字符串开始匹配，另一个用于在字符串末尾进行匹配。在下面的索引设置中，我命名前一个 prefix_edge_ngram_analyzer ，后一个 suffix_edge_ngram_analyzer 。这两个分析器可以应用于多字段字符串字段到 text.prefix 子字段，分别到 text.suffix string field。

  {
settings：{
analysis 
analyzer：{
prefix_edge_ngram_analyzer：{
tokenizer：prefix_edge_ngram_tokenizer，
filter：[smallcase] 
}，
suffix_edge_ngram_analyzer：{
tokenizer：keyword，
filter：[smallcase，reverse，suffix_edge_ngram_filter，reverse] 
} 
}，
tokenizer：{
prefix_edge_ngram_tokenizer：{
type：edgeNGram，
min_gram：2，
max_gram：25
} 
}，
filter：{
suffix_edge_ngram_filter：{
type：edgeNGram 
min_gram：2，
max_gram：25 
 } 
} 
} 
}，
mappings：{
test_type：{
properties：{
text ：{
type：string，
fields：{
前缀：{
type：string，
分析器：prefix_edge_ngram_analyzer
}，
后缀：{
type：string，
analyzer：suffix_edge_ngram_analyzer
} 
} 
} 
} 
} 
} 
}

然后让我们索引以下测试文档：

  PUT test_index / test_type / 1 
 {text：stackoverflow}

然后我们可以通过前缀或后缀使用以下查询：

 ＃input为stack=> 1个结果
 GET test_index / test_type / _search？q = text.prefix：stack或text.suffix：stack 
 
＃input为flow=> 1个结果
 GET test_index / test_type / _search？q = text.prefix：flow OR text.suffix：flow 
 
＃input是ackov=> 0结果
 GET test_index / test_type / _search？q = text.prefix：ackov或text.suffix：ackov

使用查询DSL查询的另一种方法：

  POST test_index / test_type / _search 
 {
query：{
multi_match：{
query：stack，
fields：[text。*] 
} 
} 
}

更新

如果您已经有一个字符串字段，则可以将其升级到多字段，并使用其分析器创建两个必需的子字段。执行此操作的方法是按顺序执行此操作：

关闭索引以创建分析器
```
  POST test_index / _close 
  
```

更新索引设置

  PUT test_index / _settings 
 {
分析：{
analyzer：{
prefix_edge_ngram_analyzer：{
tokenizer：prefix_edge_ngram_tokenizer，
filter：[smallcase] 
 } 
suffix_edge_ngram_analyzer：{
tokenizer：keyword，
filter：[smallcase，reverse，suffix_edge_ngram_filter 
 $ b}，
tokenizer：{
prefix_edge_ngram_tokenizer：{
type：edgeNGram，
min_gram：2 ，
max_gram：25
} 
}，
过滤器：{
suffix_edge_ngram_filter：{
type edgeNGram，
min_gram：2，
max_gram：25 
} 
} 
} 
}

重新打开您的索引
```
  POST test_index / _open 
  
```

最后，更新文本字段的映射

  PUT test_index / _mapping / test_type 
 {
properties：{
text：{
type：string，
fields：{
 b $ btype：string，
analyzer：prefix_edge_ngram_analyzer
}，
suffix：{
type：string b $ banalyzer：suffix_edge_ngram_analyzer
} 
} 
} 
} 
}

您仍然需要重新索引所有文档，以便新的子字段 text.prefix 和 text.suffix 进行填充和分析。

I'm starting to learn Elasticsearch and now I am trying to write my first analyser configuration. What I want to achieve is that substrings are found if they are at the beginning or ending of a word. If I have the word "stackoverflow" and I search for "stack" I want to find it and when I search for "flow" I want to find it, but I do not want to find it when searching for "ackov" (in my use case this would not make sense).

I know there is the "Edge n gram tokenizer", but one analyser can only have one tokenizer and the edge n-gram can either be front or back (but not both at the same time).

And if I understood correctly, applying both version of the "Edge ngram filter" (front and back) to the analyzer, then I would not find either, because both filters need to return true, isn't it? Because "stack" wouldn't be in the ending of the word, so the back edge n gram filter would return false and the word "stackoverflow" would not be found.

So, how do I configure my analyzer to find substrings either in the end or in the beginning of a word, but not in the middle?

解决方案

What can be done is to define two analyzers, one for matching at the start of a string and another to match at the end of a string. In the index settings below, I named the former one prefix_edge_ngram_analyzer and the latter one suffix_edge_ngram_analyzer. Those two analyzers can be applied to a multi-field string field to the text.prefix sub-field, respectively to the text.suffix string field.

{
  "settings": {
    "analysis": {
      "analyzer": {
        "prefix_edge_ngram_analyzer": {
          "tokenizer": "prefix_edge_ngram_tokenizer",
          "filter": ["lowercase"]
        },
        "suffix_edge_ngram_analyzer": {
          "tokenizer": "keyword",
          "filter" : ["lowercase","reverse","suffix_edge_ngram_filter","reverse"]
        }
      },
      "tokenizer": {
        "prefix_edge_ngram_tokenizer": {
          "type": "edgeNGram",
          "min_gram": "2",
          "max_gram": "25"
        }
      },
      "filter": {
        "suffix_edge_ngram_filter": {
          "type": "edgeNGram",
          "min_gram": 2,
          "max_gram": 25
        }
      }
    }
  },
  "mappings": {
    "test_type": {
      "properties": {
        "text": {
          "type": "string",
          "fields": {
            "prefix": {
              "type": "string",
              "analyzer": "prefix_edge_ngram_analyzer"
            },
            "suffix": {
              "type": "string",
              "analyzer": "suffix_edge_ngram_analyzer"
            }
          }
        }
      }
    }
  }
}

Then let's say we index the following test document:

PUT test_index/test_type/1
{ "text": "stackoverflow" }

We can then search either by prefix or suffix using the following queries:

# input is "stack" => 1 result
GET test_index/test_type/_search?q=text.prefix:stack OR text.suffix:stack

# input is "flow" => 1 result
GET test_index/test_type/_search?q=text.prefix:flow OR text.suffix:flow

# input is "ackov" => 0 result
GET test_index/test_type/_search?q=text.prefix:ackov OR text.suffix:ackov

Another way to query with the query DSL:

POST test_index/test_type/_search
{
   "query": {
      "multi_match": {
         "query": "stack",
         "fields": [ "text.*" ]
      }
   }
}

UPDATE

If you already have a string field, you can "upgrade" it to a multi-field and create the two required sub-fields with their analyzers. The way to do this would be to do this in order:

Close your index in order to create the analyzers
```
POST test_index/_close
```

Update the index settings

PUT test_index/_settings
{
"analysis": {
  "analyzer": {
    "prefix_edge_ngram_analyzer": {
      "tokenizer": "prefix_edge_ngram_tokenizer",
      "filter": ["lowercase"]
    },
    "suffix_edge_ngram_analyzer": {
      "tokenizer": "keyword",
      "filter" : ["lowercase","reverse","suffix_edge_ngram_filter","reverse"]
    }
  },
  "tokenizer": {
    "prefix_edge_ngram_tokenizer": {
      "type": "edgeNGram",
      "min_gram": "2",
      "max_gram": "25"
    }
  },
  "filter": {
    "suffix_edge_ngram_filter": {
      "type": "edgeNGram",
      "min_gram": 2,
      "max_gram": 25
    }
  }
}
}

Re-open your index
```
POST test_index/_open
```

Finally, update the mapping of your text field

PUT test_index/_mapping/test_type
{
  "properties": {
    "text": {
      "type": "string",
      "fields": {
        "prefix": {
          "type": "string",
          "analyzer": "prefix_edge_ngram_analyzer"
        },
        "suffix": {
          "type": "string",
          "analyzer": "suffix_edge_ngram_analyzer"
        }
      }
    }
  }
}

You still need to re-index all your documents in order for the new sub-fields text.prefix and text.suffix to be populated and analyzed.

这篇关于如何配置Elasticsearch以在单词的最后一个字符（而不是中间）找到子字符串？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何配置Elasticsearch以在单词的最后一个字符（而不是中间）找到子字符串？ [英] How do I configure Elasticsearch to find substrings at the beginning OR at the end of a word (but not in middle)?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

如何配置Elasticsearch以在单词的最后一个字符（而不是中间）找到子字符串？ [英] How do I configure Elasticsearch to find substrings at the beginning OR at the end of a word (but not in middle)?

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭