在搜索结果中使用ngram过滤器时,可以确定更精确的匹配项吗? [英] can I prioritize more exact matches when using ngram filter in search results?

查看:131
本文介绍了在搜索结果中使用ngram过滤器时,可以确定更精确的匹配项吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当使用带有弹性搜索的ngram过滤器时,当我搜索像test这样的东西时,我返回一个文档最新,测试和测试。有没有办法做到这一点,使得搜索结果完全匹配查询测试总是返回更高的

When using the ngram filter with elasticsearch so that when I search for something like "test" I return a document "latest", "tests" and "test". Is there a way to make it so that the "document exactly matching the query "test" is always returned higher up in the search results?

推荐答案

这是Ngram的一个问题:你的排名中有很多误报,一个解决方案是将ngram和瓦楞纸组合在一起,除了ngram之外,你还可以将完整的词作为一个单独的术语或甚至组合的单词,带状疱疹基本上是像ngram,但是用字而不是字符。

That is a bit of an issue with ngrams: you get a lot of false positives in your ranking. A solution is to combine ngrams with shingles. Basically in addition to the ngrams, you also index the full word as a separate term or even combinations of words. Shingles are basically like ngrams but with words rather than characters.

这样一来,与木瓦词的完全匹配得分高于只有匹配ngram。

That way, an exact match against the shingle terms scores higher than something that only matches the ngrams.

更新。这是一个自定义分析器的示例,定义后,可以在映射中使用它这种情况我使用icu_normalizer和folding和我的suggest_shingle,所有这一切都设置为默认分析器,所以我的所有字符串都是这样处理的。

Update. Here's an example of a custom analyzer. After you define it, you can use it in your mappings. In this case I use the icu_normalizer and folding and my suggestions_shingle. All this is set as the default analyzer so all my strings are handled this way.

{
    "analyzer":{
        "default":{
            "tokenizer":"icu_tokenizer",
            "filter":"icu_normalizer,icu_folding,suggestions_shingle"
        }
    },
    "filter": {
        "suggestions_shingle": {
            "type": "shingle",
            "min_shingle_size": 2,
            "max_shingle_size": 5
        }
    }
}

这篇关于在搜索结果中使用ngram过滤器时,可以确定更精确的匹配项吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆