使用同义词过滤器的Elasticsearch [英] Elasticsearch using shingle filter with synonym

查看:62
本文介绍了使用同义词过滤器的Elasticsearch的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下文件:

  • 南非
  • 北非

我想从以下位置检索非洲南部"文档:

I want to retrieve my "south africa" document from:

  • 非洲(a)
  • southafrica (b)
  • safrica (c)
  • s africa (a)
  • southafrica (b)
  • safrica (c)

我定义了以下过滤器和分析器:

I defined the followings filters and analyzers:

POST test_index
{
  "settings": {
   "analysis": {
      "filter": {
        "synonym_filter": {
          "type": "synonym",
          "synonyms": [
            "south,s",
            "north,n"
          ]
        },
        "shingle_filter": {
            "type": "shingle",
            "min_shingle_size": 2,
            "max_shingle_size": 3,
            "token_separator": ""
          }
      },
      "analyzer": {
        "my_shingle": {
          "type":      "custom",
          "tokenizer": "standard",
          "filter":    ["shingle_filter"]
        },
        "my_shingle_synonym": {
          "type":      "custom",
          "tokenizer": "standard",
          "filter":    ["shingle_filter", "synonym_filter"]
        },
        "my_synonym_shingle": {
          "type":      "custom",
          "tokenizer": "standard",
          "filter":    ["synonym_filter", "shingle_filter"]
        }
    }
  } 
  },
  "mappings": {}
}

1)使用 my_shingle (南非)将被索引为 south southafrica africa

1) With my_shingle south africa will be indexed as south, southafrica, africa

2)使用 my_shingle_synonym (<我>非洲南非洲)将被索引为 south s southafrica 非洲

2) With my_shingle_synonym south africa will be indexed as south, s, southafrica, africa

3)使用 my_synonym_shingle 南非洲将被索引为 south souths southsafrica s safrica 非洲

3) With my_synonym_shingle south africa will be indexed as south, souths, southsafrica, s, safrica, africa

所以

  • (1)我会找到b

  • (1) I will find b

(2)我会找到a,b

(2) I will find a, b

(3)我会找到一个c

(3) I will find a, c

我希望将非洲南部索引为: south s southafrica safrica非洲

I want south africa to be indexed as: south, s, southafrica, safrica, africa

推荐答案

您不必不必根据需要输出所有可能的令牌.您可以通过在多字段上使用不同的分析器来解决您的问题.

You do not have to output all possible tokens as per your requirement. Your problem can be solved by using different analyzers on multi fields.

您将像这样定义所需字段的 mapping .

You would define mapping of your desired field like this.

"mappings": {
    "your_mapping": {
      "properties": {
        "name": {
          "type": "string",
          "analyzer": "my_shingle",
          "fields": {
            "synonym": {
              "type": "string",
              "analyzer": "my_synonym_shingle"
            }
          }
        }
      }
    }
  }

示例文档以建立索引

PUT test_index/your_mapping/1
{
  "name" : "south africa"
}

然后,您将使用

这篇关于使用同义词过滤器的Elasticsearch的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆