我可以在 ids 过滤器或一般查询子句中指定的值数量的最大限制? [英] Max limit on the number of values I can specify in the ids filter or generally query clause?
问题描述
在elasticsearch中,指定可以执行匹配的值数量的最大限制是多少?我在某处读到它是 1024,但也是可配置的.真的吗?以及它如何影响性能?
In elasticsearch what is the max limit to specify the value in the number of values a match can be performed on? I read somewhere that it is 1024 but is also configurable. Is that true? And how does it affect the performance?
curl -XPOST 'localhost:9200/my_index/_search?pretty' -d '{
"query": {
"filtered": {
"filter": {
"not": {
"ids": {
"type": "my_type",
"values": ["1", "2", "3"]
}}}}}}'
我可以在这个数组中指定多少个值?限制是什么?如果它是可配置的,那么增加限制对性能有什么影响?
How many values can I specify in this array ? What is the limit? If it is configurable what is the performance impact on increasing the limit?
推荐答案
我认为 Elaticsearch 或 Lucene 没有明确设置任何限制.不过,您可能会遇到 JDK 设置的限制.
I don't think there is any limit set by Elaticsearch or Lucene explicitly. The limit you might hit, though, is the one set in place by the JDK.
为了证明我上面的说法,我查看了Elasticsearch的源代码:
To prove my statement above, I looked at the source code of Elasticsearch:
当请求进入 有一个解析 id 数组的解析器.它使用的只是一个
ArrayList
.然后将其传递给 Lucene,后者使用 List.
when the request comes in there is a parser that parses the array of ids. All it's using is an
ArrayList
. This is then passed along to Lucene, which in turn it's using a List.
这是 Lucene TermsFilter 类(第 84 行),它从列表中的 Elasticsearch 中获取 IDS 列表.
this is the Lucene TermsFilter class (line #84) that gets the list of IDS from Elasticsearch within a List.
来自 Oracle JDK 1.7.0_67 的 ArrayList
类的源代码:
source code of ArrayList
class from Oracle JDK 1.7.0_67:
/**
* The maximum size of array to allocate.
* Some VMs reserve some header words in an array.
* Attempts to allocate larger arrays may result in
* OutOfMemoryError: Requested array size exceeds VM limit
*/
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
/**
* Increases the capacity to ensure that it can hold at least the
* number of elements specified by the minimum capacity argument.
*
* @param minCapacity the desired minimum capacity
*/
private void grow(int minCapacity) {
...
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity);
...
}
private static int hugeCapacity(int minCapacity) {
if (minCapacity < 0) // overflow
throw new OutOfMemoryError();
return (minCapacity > MAX_ARRAY_SIZE) ?
Integer.MAX_VALUE :
MAX_ARRAY_SIZE;
}
那个数字 (Integer.MAX_VALUE - 8
) 是 2147483639
.因此,这将是该数组的理论最大大小.
And that number (Integer.MAX_VALUE - 8
) is 2147483639
. So, this would be the theoretical max size of that array.
我已经在我的 ES 实例中本地测试了一个包含 150000 个元素的数组.这是性能影响:当然,阵列越大,性能就会下降.在我使用 150k ids 的简单测试中,我得到了 800 毫秒的执行时间.但是,一切都取决于 CPU、内存、负载、数据大小、数据映射等.最好让您实际测试一下.
I've tested locally in my ES instance an array of 150000 elements. And here comes the performance implications: of course, you would get a degrading performance the larger the array gets. In my simple test with 150k ids I got a 800 ms execution time. But, all depends on CPU, memory, load, datasize, data mapping etc etc. The best would be for you to actually test this.
2016 年 12 月更新:此答案适用于 2014 年底存在的 Elasticsearch 版本,即 1.x 分支.当时可用的最新版本是 1.4.x.
UPDATED Dec. 2016: this answer applies for the Elasticsearch version in existence at the end of 2014, ie in the 1.x branch. The latest available at that time was 1.4.x.
这篇关于我可以在 ids 过滤器或一般查询子句中指定的值数量的最大限制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!