我可以在 ids 过滤器或一般查询子句中指定的值数量的最大限制? [英] Max limit on the number of values I can specify in the ids filter or generally query clause?

查看:17
本文介绍了我可以在 ids 过滤器或一般查询子句中指定的值数量的最大限制?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在elasticsearch中,指定可以执行匹配的值数量的最大限制是多少?我在某处读到它是 1024,但也是可配置的.真的吗?以及它如何影响性能?

In elasticsearch what is the max limit to specify the value in the number of values a match can be performed on? I read somewhere that it is 1024 but is also configurable. Is that true? And how does it affect the performance?

curl -XPOST 'localhost:9200/my_index/_search?pretty' -d '{
  "query": {
    "filtered": {
      "filter": {
        "not": {
          "ids": {
            "type": "my_type",
            "values": ["1", "2", "3"]
}}}}}}'

我可以在这个数组中指定多少个值?限制是什么?如果它是可配置的,那么增加限制对性能有什么影响?

How many values can I specify in this array ? What is the limit? If it is configurable what is the performance impact on increasing the limit?

推荐答案

我认为 Elaticsearch 或 Lucene 没有明确设置任何限制.不过,您可能会遇到 JDK 设置的限制.

I don't think there is any limit set by Elaticsearch or Lucene explicitly. The limit you might hit, though, is the one set in place by the JDK.

为了证明我上面的说法,我查看了Elasticsearch的源代码:

To prove my statement above, I looked at the source code of Elasticsearch:

  • when the request comes in there is a parser that parses the array of ids. All it's using is an ArrayList. This is then passed along to Lucene, which in turn it's using a List.

这是 Lucene TermsFilter 类(第 84 行),它从列表中的 Elasticsearch 中获取 IDS 列表.

this is the Lucene TermsFilter class (line #84) that gets the list of IDS from Elasticsearch within a List.

来自 Oracle JDK 1.7.0_67 的 ArrayList 类的源代码:

source code of ArrayList class from Oracle JDK 1.7.0_67:

/**
 * The maximum size of array to allocate.
 * Some VMs reserve some header words in an array.
 * Attempts to allocate larger arrays may result in
 * OutOfMemoryError: Requested array size exceeds VM limit
 */
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;   

/**
 * Increases the capacity to ensure that it can hold at least the
 * number of elements specified by the minimum capacity argument.
 *
 * @param minCapacity the desired minimum capacity
 */
private void grow(int minCapacity) {
    ...
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    ...
}

private static int hugeCapacity(int minCapacity) {
    if (minCapacity < 0) // overflow
        throw new OutOfMemoryError();
    return (minCapacity > MAX_ARRAY_SIZE) ?
        Integer.MAX_VALUE :
        MAX_ARRAY_SIZE;
}

那个数字 (Integer.MAX_VALUE - 8) 是 2147483639.因此,这将是该数组的理论最大大小.

And that number (Integer.MAX_VALUE - 8) is 2147483639. So, this would be the theoretical max size of that array.

我已经在我的 ES 实例中本地测试了一个包含 150000 个元素的数组.这是性能影响:当然,阵列越大,性能就会下降.在我使用 150k ids 的简单测试中,我得到了 800 毫秒的执行时间.但是,一切都取决于 CPU、内存、负载、数据大小、数据映射等.最好让您实际测试一下.

I've tested locally in my ES instance an array of 150000 elements. And here comes the performance implications: of course, you would get a degrading performance the larger the array gets. In my simple test with 150k ids I got a 800 ms execution time. But, all depends on CPU, memory, load, datasize, data mapping etc etc. The best would be for you to actually test this.

2016 年 12 月更新:此答案适用于 2014 年底存在的 Elasticsearch 版本,即 1.x 分支.当时可用的最新版本是 1.4.x.

UPDATED Dec. 2016: this answer applies for the Elasticsearch version in existence at the end of 2014, ie in the 1.x branch. The latest available at that time was 1.4.x.

这篇关于我可以在 ids 过滤器或一般查询子句中指定的值数量的最大限制?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆