UTF8编码长度超过最大长度32766 [英] UTF8 encoding is longer than the max length 32766
问题描述
我已将我的Elasticsearch集群从1.1升级到1.2,并在索引一个较大的字符串时发生错误。
I've upgraded my Elasticsearch cluster from 1.1 to 1.2 and I have errors when indexing a somewhat big string.
{
"error": "IllegalArgumentException[Document contains at least one immense term in field=\"response_body\" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped. Please correct the analyzer to not produce such terms. The prefix of the first immense term is: '[7b 22 58 48 49 5f 48 6f 74 65 6c 41 76 61 69 6c 52 53 22 3a 7b 22 6d 73 67 56 65 72 73 69]...']",
"status": 500
}
索引的映射:
{
"template": "partner_requests-*",
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1
},
"mappings": {
"request": {
"properties": {
"asn_id": { "index": "not_analyzed", "type": "string" },
"search_id": { "index": "not_analyzed", "type": "string" },
"partner": { "index": "not_analyzed", "type": "string" },
"start": { "type": "date" },
"duration": { "type": "float" },
"request_method": { "index": "not_analyzed", "type": "string" },
"request_url": { "index": "not_analyzed", "type": "string" },
"request_body": { "index": "not_analyzed", "type": "string" },
"response_status": { "type": "integer" },
"response_body": { "index": "not_analyzed", "type": "string" }
}
}
}
}
我搜索了文档,没有找到与最大字段大小相关的任何内容。
根据核心类型部分不明白为什么我应该为 not_analyzed
字段更正分析器。
I've searched the documentation and didn't find anything related to a maximum field size.
According to the core types section I don't understand why I should "correct the analyzer" for a not_analyzed
field.
推荐答案
所以你遇到一个单一术语的最大大小问题。当您将字段设置为not_analyzed时,它将其视为一个单一术语。底层Lucene索引中单个术语的最大大小为32766字节,这相当于硬编码。
So you are running into an issue with the maximum size for a single term. When you set a field to not_analyzed it will treat it as one single term. The maximum size for a single term in the underlying Lucene index is 32766 bytes, which is I believe hard coded.
您的两个主要选项是将类型更改为二进制或继续使用字符串,但将索引类型设置为否。
Your two primary options are to either change the type to binary or to continue to use string but set the index type to "no".
这篇关于UTF8编码长度超过最大长度32766的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!