Elasticsearch字段数据-我应该使用它吗? [英] Elasticsearch fielddata - should I use it?

查看:54
本文介绍了Elasticsearch字段数据-我应该使用它吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出包含具有 brand 属性的文档的索引,我们需要创建一个不区分大小写的术语聚合.

Given an index with documents that have a brand property, we need to create a term aggregation that is case insensitive.

索引定义

请注意,使用 fielddata

PUT demo_products
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "tokenizer": "keyword",
          "filter": [
            "lowercase"
          ]
        }
      }
    }
  },
  "mappings": {
    "product": {
      "properties": {
        "brand": {
          "type": "text",
          "analyzer": "my_custom_analyzer",
          "fielddata": true,
        }
      }
    }
  }
}

数据

POST demo_products/product
{
  "brand": "New York Jets"
}

POST demo_products/product
{
  "brand": "new york jets"
}

POST demo_products/product
{
  "brand": "Washington Redskins"
}

查询

GET demo_products/product/_search
{
  "size": 0,
  "aggs": {
    "brand_facet": {
      "terms": {
        "field": "brand"
      }
    }
  }
}

结果

"aggregations": {
    "brand_facet": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "new york jets",
          "doc_count": 2
        },
        {
          "key": "washington redskins",
          "doc_count": 1
        }
      ]
    }
  }

如果我们使用 keyword (而不是 text ),由于大小写不同,我们最终会为纽约喷气机找到2个存储桶.

If we use keyword instead of text we end up the 2 buckets for New York Jets because of the differences in casing.

我们担心使用字段数据对性能的影响.但是,如果禁用了fielddata,我们将得到可怕的默认情况下,在文本字段上禁用了fielddata."

We're concerned about the performance implications by using fielddata. However if fielddata is disabled we get the dreaded "Fielddata is disabled on text fields by default."

解决此问题的其他任何技巧-还是我们不应该太在意fielddate?

Any other tips to resolve this - or should we not be so concerned about fielddate?

推荐答案

从ES 5.2(今天开始)开始,您可以使用

Starting with ES 5.2 (out today), you can use normalizers with keyword fields in order to (e.g.) lowercase the value.

规范化器的作用有点像 text 字段的分析器,尽管您可以使用它们做些限制,但这可能会帮助您解决所面临的问题.

The role of normalizers is a bit like analyzers for text fields, though what you can do with them is more restrained, but that would probably help with the issue you're facing.

您将像这样创建索引:

PUT demo_products
{
  "settings": {
    "analysis": {
      "normalizer": {
        "my_normalizer": {
          "type": "custom",
          "filter": [ "lowercase" ]
        }
      }
    }
  },
  "mappings": {
    "product": {
      "properties": {
        "brand": {
          "type": "keyword",
          "normalizer": "my_normalizer"
        }
      }
    }
  }
}

您的查询将返回以下内容:

And your query would return this:

  "aggregations" : {
    "brand_facet" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "new york jets",
          "doc_count" : 2
        },
        {
          "key" : "washington redskins",
          "doc_count" : 1
        }
      ]
    }
  }

两全其美!

这篇关于Elasticsearch字段数据-我应该使用它吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆