在草垛的弹性搜索中忽略重音符号 [英] ignore accents in elastic search with haystack

查看:86
本文介绍了在草垛的弹性搜索中忽略重音符号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在与干草堆一起使用elasticsearch以便提供搜索。我希望用户搜索英语以外的其他语言。例如。目前正在尝试使用希腊语。

I am using elasticsearch along with haystack in order to provide search. I want user to search in language other than english. E.g. currently trying with Greek.

在搜索任何内容时如何忽略重音符号。例如。假设我输入Ανδρέας(带有重音符号)时,其返回结果与其匹配。

How can I ignore the accents while searching for anything. E.g. let's say if I enter Ανδρέας ( with accents), its returning results matched with it.

但是当我输入Ανδρεας时,它不返回任何结果。搜索引擎应带任何包含Ανδρέας但也带有Ανδρεας的结果(第二个不带重音)。

But when I enter Ανδρεας, its not returning any results. The search engine should bring any results that have "Ανδρέας" but also "Ανδρεας" as well (the second one is not accented).

有人可以指出如何解决问题?

Can someone point out how to resolve the issue?

请让我知道是否需要弹性搜索,search_indexex等的帖子设置。

Please let me know if I need post settings for elastic search, search_indexex, etc.

编辑:

这是我的索引设置:

ELASTICSEARCH_INDEX_SETTINGS = {
     'settings': {
         "analysis": {
             "analyzer": {
                 "myanalyzer_search": {
                     "type": "custom",
                     "tokenizer": "standard",
                     "filter": [
                         "greek_lowercase_filter",
                         "my_stop_filter",
                         "greek_stem_filter",
                         "english_stem_filter",
                         "my_edge_ngram_filter",
                         "asciifolding"
                     ]
                 },
                 "myanalyzer_index": {
                     "type": "custom",
                     "tokenizer": "edgeNGram",
                     "filter": [
                         "greek_lowercase_filter",
                         "my_stop_filter",
                         "greek_stem_filter",
                         "english_stem_filter",
                         "my_edge_ngram_filter",
                         "asciifolding"
                     ]
                 },
             },
             "tokenizer": {
                 "my_edge_ngram_tokenizer": {
                     "type": "edgeNGram",
                     "min_gram": "2",
                     "max_gram": "18",
                     "token_chars": ["letter"]
                 }
             },
             "filter": {
                 "my_edge_ngram_filter": {
                     "type": "edgeNGram",
                     "min_gram": 3,
                     "max_gram": 18
                 },
                 "greek_stem_filter": {
                     "type": "stemmer",
                     "name": "greek"
                 },
                 "greek_lowercase_filter": {
                     "type": "lowercase",
                     "language": "greek"
                 },
                 "english_stem_filter": {
                     "type": "stemmer",
                     "name": "english"
                 },
                 "my_stop_filter": {
                     "type": "stop",
                     "stopwords": ["_greek_", "_english_"]
                 }
             }
         }
     }
}

search_index.py

class ProfileIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.EdgeNgramField(document=True, use_template=True)
    title = indexes.CharField(model_attr='title')
    sorted_title = indexes.CharField(model_attr='title', indexed=False, stored=True)
    employment_history = indexes.EdgeNgramField(model_attr='employment_history', null=True)

    def get_model(self):
        return SellerProfile

    def index_queryset(self, using=None):
        return self.get_model().objects.all()


   .........

这是模板:

{{ object.user.get_full_name }}
{{ object.title }}
{{ object.bio }}
{{ object.employment_history }}
{{ object.education }}

我正在执行以下查询:

results = SearchQuerySet().model(Profile).autocomplete(text='Ανδρεας')

results = SearchQuerySet().model(Profile).autocomplete(text='Ανδρέας')

谢谢。

推荐答案

您需要添加 asciifolding 令牌过滤器供您分析/查询管道 http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-asciifolding-tokenfilter.html

You need to add asciifolding token filter to you analysis/query pipeline http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-asciifolding-tokenfilter.html

这基本上消除了单词中的所有重音,因此您以后可以轻松地(有/没有搜索重音)轻松找到它们。

That basically strips any accents from your words so you can easily find them later with/without searching with accents.

这篇关于在草垛的弹性搜索中忽略重音符号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆