UTF8编码长度超过最大长度32766 [英] UTF8 encoding is longer than the max length 32766

查看:1215
本文介绍了UTF8编码长度超过最大长度32766的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已将我的Elasticsearch集群从1.1升级到1.2,并在索引一个较大的字符串时发生错误。

I've upgraded my Elasticsearch cluster from 1.1 to 1.2 and I have errors when indexing a somewhat big string.

{
  "error": "IllegalArgumentException[Document contains at least one immense term in field=\"response_body\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[7b 22 58 48 49 5f 48 6f 74 65 6c 41 76 61 69 6c 52 53 22 3a 7b 22 6d 73 67 56 65 72 73 69]...']",
  "status": 500
}

索引的映射:

{
  "template": "partner_requests-*",
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1
  },
  "mappings": {
    "request": {
      "properties": {
        "asn_id": { "index": "not_analyzed", "type": "string" },
        "search_id": { "index": "not_analyzed", "type": "string" },
        "partner": { "index": "not_analyzed", "type": "string" },
        "start": { "type": "date" },
        "duration": { "type": "float" },
        "request_method": { "index": "not_analyzed", "type": "string" },
        "request_url": { "index": "not_analyzed", "type": "string" },
        "request_body": { "index": "not_analyzed", "type": "string" },
        "response_status": { "type": "integer" },
        "response_body": { "index": "not_analyzed", "type": "string" }
      }
    }
  }
}

我搜索了文档,没有找到与最大字段大小相关的任何内容。
根据核心类型部分不明白为什么我应该为 not_analyzed 字段更正分析器。

I've searched the documentation and didn't find anything related to a maximum field size. According to the core types section I don't understand why I should "correct the analyzer" for a not_analyzed field.

推荐答案

所以你遇到一个单一术语的最大大小问题。当您将字段设置为not_analyzed时,它将其视为一个单一术语。底层Lucene索引中单个术语的最大大小为32766字节,这相当于硬编码。

So you are running into an issue with the maximum size for a single term. When you set a field to not_analyzed it will treat it as one single term. The maximum size for a single term in the underlying Lucene index is 32766 bytes, which is I believe hard coded.

您的两个主要选项是将类型更改为二进制或继续使用字符串,但将索引类型设置为否。

Your two primary options are to either change the type to binary or to continue to use string but set the index type to "no".

这篇关于UTF8编码长度超过最大长度32766的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆