Elasticsearch不敏感的搜索口音 [英] Elasticsearch insensitive search accents

查看:66
本文介绍了Elasticsearch不敏感的搜索口音的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python进行弹性搜索.我找不到用重音符号进行不敏感搜索的方法.

I'm using Elastic search with Python. I can't find a way to make insensitive search with accents.

例如:我有两个字."卡米翁"和" Camion ".当用户搜索"camion"时,我希望显示两个结果.

For example: I have two words. "Camión" and "Camion". When a user search for "camion" I'd like the two results show up.

创建索引:

es = Elasticsearch([{u'host': u'127.0.0.1', u'port': b'9200'}])

es.indices.create(index='name', ignore=400)

es.index(
    index="name",
    doc_type="producto",
    id=p.pk,
    body={
        'title': p.titulo,
        'slug': p.slug,
        'summary': p.summary,
        'description': p.description,
        'image': foto,
        'price': p.price,
        'wholesale_price': p.wholesale_price,
        'reference': p.reference,
        'ean13': p.ean13,
        'rating': p.rating,
        'quantity': p.quantity,
        'discount': p.discount,
        'sales': p.sales,
        'active': p.active,
        'encilleria': p.encilleria,
        'brand': marca,
        'brand_title': marca_titulo,
        'sellos': sellos_str,
        'certificados': certificados_str,
        'attr_naturales': attr_naturales_str,
        'soluciones': soluciones_str,
        'categories': categories_str,
        'delivery': p.delivery,
        'stock': p.stock,
        'consejos': p.consejos,
        'ingredientes': p.ingredientes,
        'es_pack': p.es_pack,
        'temp': p.temp,
        'relevancia': p.relevancia,
        'descontinuado': p.descontinuado,
    }

搜索:

    from elasticsearch import Elasticsearch
    es = Elasticsearch([{'host': '127.0.0.1', 'port': '9200'}])

    resul = es.search(
        index="name",
        body={
            "query": {
                "query_string": {
                    "query": "(title:" + search + " OR description:" + search + " OR summary:" + search + ") AND (active:true)",
                    "analyze_wildcard": False
                }
            },
            "size": "9999",
        }
    )
    print resul

我在Google,Stackoverflow和elastic.co上进行了搜索,但没有找到任何有效的方法.

I've searched on Google, Stackoverflow and elastic.co but I didn't find anything that works.

推荐答案

您需要更改查询中具有的那些字段的映射.更改映射需要重新索引,以便对字段进行不同的分析,并且查询将起作用.

You need to change the mapping of those fields you have in the query. Changing the mapping requires re-indexing so that the fields will be analyzed differently and the query will work.

基本上,您需要以下类似的内容.名为 text 的字段仅是示例.您还需要对其他字段应用相同的设置.请注意,我在其中使用了 fields ,以便根域将保留默认情况下分析的原始文本,而 text.folded 将删除带重音符号的字符,并为您的查询可以正常工作.我还对查询做了一些更改,以便您搜索该字段的两个版本( camion 将匹配,而且camión也将匹配).

Basically, you need something like the following below. The field called text is just an example. You need to apply the same settings for other fields as well. Note that I used fields in there so that the root field will maintain the original text analyzed by default, while text.folded will remove the accented characters and will make it possible for your query to work. I have also changed the query a bit so that you search both versions of that field (camion will match, but also camión).

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "folding": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "properties": {
        "text": {
          "type": "string",
          "fields": {
            "folded": {
              "type": "string",
              "analyzer": "folding"
            }
          }
        }
      }
    }
  }
}

查询:

  "query": {
    "query_string": {
      "query": "\\*.folded:camion"
    }
  }

此外,我强烈建议阅读文档的这一部分: https://www.elastic.co/guide/zh-CN/elasticsearch/guide/current/asciifolding-token-filter.html

Also, I strongly suggest reading this section of the documentation: https://www.elastic.co/guide/en/elasticsearch/guide/current/asciifolding-token-filter.html

这篇关于Elasticsearch不敏感的搜索口音的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆