Elasticsearch使用NEST：如何配置分析仪发现部分单词？ [英] Elasticsearch using NEST: How to configure analyzers to find partial words?

查看：526 发布时间：2016/9/19 11:52:33 c# elasticsearch nest

本文介绍了Elasticsearch使用NEST：如何配置分析仪发现部分单词？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图让偏词的搜索，忽略套管和忽视一些字母的加重。是否可以？我想NGRAM默认标记者应该做的伎俩，但我不明白如何与NEST做到这一点。

I am trying to make a search by partial word, ignoring casing and ignoring the accentuation of some letters. Is it possible? I think ngram with default tokenizer should do the trick but i don't understand how to do it with NEST.

例：musiic应与有音乐的记录

Example: "musiic" should match records that have "music"

我使用Elasticsearch的版本为1.9。

The version I am using of Elasticsearch is 1.9.

我做这样的，但它不工作...

I am doing like this but it doesn't work...

var ix = new IndexSettings();
        ix.Add("analysis",
            @"{
               'index_analyzer' : {
                          'my_index_analyzer' : {
                                        'type' : 'custom',
                                        'tokenizer' : 'standard',
                                        'filter' : ['lowercase', 'mynGram']
                          }
               },
               'search_analyzer' : {
                          'my_search_analyzer' : {
                                        'type' : 'custom',
                                        'tokenizer' : 'standard',
                                        'filter' : ['standard', 'lowercase', 'mynGram']
                          }
               },
               'filter' : {
                        'mynGram' : {
                                   'type' : 'nGram',
                                   'min_gram' : 2,
                                   'max_gram' : 50
                        }
               }
    }");
        client.CreateIndex("sample", ix);

谢谢，

Thanks,

大卫

推荐答案

简答

我想你正在寻找的是href=\"http://www.elasticsearch.org/guide/reference/query-dsl/fuzzy-query.html\" rel=\"nofollow\">模糊查询一个 Levenshtein距离的算法匹配类似的话。

I think what you're looking for is a fuzzy query, which uses the Levenshtein distance algorithm to match similar words.

上n元长的答案

NGRAM过滤器把文本的基础上定义的最小/最大范围内许多小令牌。

The nGram filter splits the text into many smaller tokens based on the defined min/max range.

例如，从音乐查询过滤器会发生：
'亩'，'我们'，'思'，' IC，亩，USI，原文，木斯，USIC'和'音乐'

For example, from your 'music' query the filter will generate: 'mu', 'us', 'si', 'ic', 'mus', 'usi', 'sic', 'musi', 'usic', and 'music'

作为你可以看到 musiic 不符合这些NGRAM令牌。

As you can see musiic does not match any of these nGram tokens.

为什么n元

n元组的一个好处是，它使通配符查询显著的速度更快，因为所有潜在的子预先生成并在插入时索引（我有看到查询从多秒的加速使用n元）15毫秒。

One benefit of nGrams is that it makes wildcard queries significantly faster because all potential substrings are pre-generated and indexed at insert time (I have seen queries speed up from multi-seconds to 15 milliseconds using nGrams).

如果没有n元，每个字符串必须在查询时搜索匹配〔O（N ^ 2）]，而不是在指数[直接抬头O（1）]。由于伪代码：

Without the nGrams, each string must be searched at query time for a match [O(n^2)] instead of directly looked up in the index [O(1)]. As pseudocode:

hits = []
foreach string in index:
    if string.substring(query):
        hits.add(string)
return hits

return index[query]

注意，这是以使插入更慢，因此需要更多的存储，和较重的内存使用的费用。

Note that this comes at the expense of making inserts slower, requiring more storage, and heavier memory usage.

这篇关于Elasticsearch使用NEST：如何配置分析仪发现部分单词？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Elasticsearch使用NEST：如何配置分析仪发现部分单词？ [英] Elasticsearch using NEST: How to configure analyzers to find partial words?

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

Elasticsearch使用NEST：如何配置分析仪发现部分单词？ [英] Elasticsearch using NEST: How to configure analyzers to find partial words?

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭