动态模板不能短时,字节&漂浮 [英] Dynamic Template not working for short, byte & float

查看:171
本文介绍了动态模板不能短时,字节&漂浮的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个模板,在我的模板中,我试图实现动态映射.

I am trying to create a template, in my template I am trying to achieve the dynamic mapping.

这就是我写的内容,如 6.2.1 一样,系统会自动检测到唯一的布尔值,日期,双精度型,长型,对象,字符串,这对于映射float,short和amp;字节.

Here is what I wrote, as in 6.2.1 the only boolean, date, double, long, object, string are automatically detected, facing issues for mapping the float, short & byte.

如果我在其中索引 127 ,它将被从 short_fields 映射到 short ,这很好,但是当我索引一些325566时,我收到的异常数值数值(325566)超出了Java的范围,我想取消这个限制,而让 long_fields 应当注意此&它应该映射为long.我尝试使用coerce:falseignore_malformed:true,但它们均未按预期工作.

Here if I index 127, it will be mapped to short from the short_fields, it's fine, but when I index some 325566, I am getting exception Numeric value (325566) out of range of Java short, I want to suppress this and let long_fields, should take care about this & it should be mapped to long. I have tried with coerce:false, ignore_malformed:true, none of them worked as expected.

"dynamic_templates": [
  {
    "short_fields": {
      "match": "*",
      "match_mapping_type": "long",
      "mapping": {
        "type": "short",
        "doc_values": true
      }
    }
  },
  {
    "long_fields": {
      "match": "*",
      "match_mapping_type": "long",
      "mapping": {
        "type": "long",
        "doc_values": true
      }
    }
  },
  {
    "byte_fields": {
      "match": "*",
      "match_mapping_type": "byte",
      "mapping": {
        "type": "byte",
        "doc_values": true
      }
    }
  }
]

推荐答案

不幸的是,不可能让Elasticsearch为您选择最小的数据类型.有很多解决方法,但是让我先解释一下为什么它不起作用.

Unfortunately, it is not possible to make Elasticsearch choose the smallest data type possible for you. There are plenty of workarounds, but let me first explain why it does not work.

动态映射模板允许通过三种方式覆盖默认动态类型匹配:

Dynamic mapping templates allow to override default dynamic type matching in three ways:

  • 通过匹配字段名称
  • 通过匹配Elasticsearch为您猜测的类型,
  • 以及文档中的路径.

Elasticsearch选择第一个有效的匹配规则.在您的情况下,第一个规则short_fields始终适用于任何整数,因为它接受任何字段名称和猜测的类型long.

Elasticsearch picks the first matching rule that works. In your case, the first rule, short_fields, always works for any integer, because it accepts any field name and a guessed type long.

这就是为什么它适用于127,但不适用于325566的原因.

That's why it works for 127 but doesn't work for 325566.

为更好地说明这一点,让我们在第一个规则中更改"matching_mapping_type",如下所示:

To illustrate better this point, let's change "matching_mapping_type" in the first rule like this:

"match_mapping_type": "short",

Elasticsearch不接受它并返回错误:

Elasticsearch does not accept it and returns an error:

  {
    "type": "mapper_parsing_exception",
    "reason": "Failed to parse mapping [doc]: No field type matched on [short], \
possible values are [object, string, long, double, boolean, date, binary]"
  }

但是我们如何让Elasticsearch选择正确的类型?

以下是一些选项.

But how can we make Elasticsearch pick the right types?

Here are some of the options.

这使您可以完全控制类型的选择.

This gives you full control over the selection of types.

推迟缩小"数据,直到出现性能问题为止.

Postpone "shrinking" data until it starts being a performance problem.

实际上,使用较小的数据类型将仅仅影响搜索/索引编制性能,而不影响所需的存储空间.只要您对动态映射感到满意,Elasticsearch就会为您很好地管理它们.

In fact, using smaller data types will only affect searching/indexing performance, not the storage required. As long as you are fine with dynamic mappings, Elasticsearch manages them for you pretty well.

由于Elasticsearch无法识别长字节,因此您可以预先确定类型,并在字段名称中添加类型信息,例如customerAge_byterevenue_long.

Since Elasticsearch is not able to tell a byte from long, you can determine the type beforehand and add type information in the field name, like customerAge_byte or revenue_long.

然后,您将可以使用前缀/后缀 match 像这样:

Then you will be able to use a prefix/suffix match like this:

    {
      "bytes_as_longs": {
        "match_mapping_type": "long",
        "match":   "*_byte",
        "mapping": {
          "type": "byte"
        }
      }
    }

请选择更适合您需求的方法.

Please choose the approach that fit your needs better.

Elasticsearch花很长时间进行任何整数输入的原因可能来自数字类型的JSON定义(如 json.org ):

The reason why Elasticsearch takes longs for any integer input is probably coming from the JSON definition of a number type (as shown at json.org):

无法确定整个数据集中的数字01实际上是整数还是长 . Elasticsearch必须从所示的第一个示例中猜测正确的类型,并且它要尽可能安全地拍摄.

It is not possible to tell if a number 0 or 1 is actually integer or long in the entire dataset. Elasticsearch has to guess the correct type from the first example shown, and it takes the safest shot possible.

希望有帮助!

这篇关于动态模板不能短时,字节&漂浮的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆