UTF8 编码超过最大长度 32766 [英] UTF8 encoding is longer than the max length 32766

查看:38
本文介绍了UTF8 编码超过最大长度 32766的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已将 Elasticsearch 集群从 1.1 升级到 1.2,但在为较大的字符串编制索引时出错.

I've upgraded my Elasticsearch cluster from 1.1 to 1.2 and I have errors when indexing a somewhat big string.

{
  "error": "IllegalArgumentException[Document contains at least one immense term in field="response_body" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[7b 22 58 48 49 5f 48 6f 74 65 6c 41 76 61 69 6c 52 53 22 3a 7b 22 6d 73 67 56 65 72 73 69]...']",
  "status": 500
}

索引的映射:

{
  "template": "partner_requests-*",
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "request": {
      "properties": {
        "asn_id": { "index": "not_analyzed", "type": "string" },
        "search_id": { "index": "not_analyzed", "type": "string" },
        "partner": { "index": "not_analyzed", "type": "string" },
        "start": { "type": "date" },
        "duration": { "type": "float" },
        "request_method": { "index": "not_analyzed", "type": "string" },
        "request_url": { "index": "not_analyzed", "type": "string" },
        "request_body": { "index": "not_analyzed", "type": "string" },
        "response_status": { "type": "integer" },
        "response_body": { "index": "not_analyzed", "type": "string" }
      }
    }
  }
}

我搜索了文档,但没有找到与最大字段大小相关的任何内容.根据 核心类型 部分,我不明白为什么我应该为 not_analyzed 字段更正分析器".

I've searched the documentation and didn't find anything related to a maximum field size. According to the core types section I don't understand why I should "correct the analyzer" for a not_analyzed field.

推荐答案

因此,您遇到了单个术语的最大大小问题.当您将字段设置为 not_analyzed 时,它会将其视为一个术语.底层 Lucene 索引中单个术语的最大大小为 32766 字节,我认为这是硬编码的.

So you are running into an issue with the maximum size for a single term. When you set a field to not_analyzed it will treat it as one single term. The maximum size for a single term in the underlying Lucene index is 32766 bytes, which is I believe hard coded.

您的两个主要选项是将类型更改为二进制或继续使用字符串但将索引类型设置为否".

Your two primary options are to either change the type to binary or to continue to use string but set the index type to "no".

这篇关于UTF8 编码超过最大长度 32766的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆