具有嵌套聚合的弹性搜索查询导致内存不足 [英] Elasticsearch query with nested aggregations causing out of memory

查看:102
本文介绍了具有嵌套聚合的弹性搜索查询导致内存不足的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有Elasticsearch安装了16GB的内存。我开始使用聚合,但是当我尝试发出以下查询时,遇到了一个java.lang.OutOfMemoryError:Java堆空间错误:

  POST / test-index-syslog3 / type-syslog / _search 
{
query:{
query_string:{
default_field:DstCountry ,
query:CN
}
},
aggs:{
whatever:{
terms {
field:SrcIP
},
aggs:{
destination_ip:{
terms:{
字段:DstIP
},
aggs:{
port:{
terms:{
field:DstPort
}
}
}
}
}
}
}
}

query_string本身只返回1266点击,所以我有点困惑的OOM错误。



我使用汇总错误?如果没有,我该怎么办来解决这个问题?
谢谢!

解决方案

您正在加载整个 SrcIP - , DstIP - 和 DstPort -fields到内存中,以便在其上聚合。这是因为Elasticsearch不会反转整个字段,以便能够快速查找文档的ID给它的字段的值。



如果你要大部分汇总一组非常小的数据,您应该查看使用docvalues 。然后,文档的值以存储文档的ID容易查找的方式存储。这有一点开销,但是你可以把它留给操作系统的字段缓存,让内存中的相关页面,而不必加载整个字段。


I have Elasticsearch installed with 16gb of memory. I started using aggregations, but ran into a "java.lang.OutOfMemoryError: Java heap space" error when I attempted to issue the following query:

POST /test-index-syslog3/type-syslog/_search
{
    "query": {
        "query_string": {
           "default_field": "DstCountry",
           "query": "CN"
        }
    },
    "aggs": {
        "whatever": {
            "terms": {
                "field" : "SrcIP"
            },
            "aggs": {
                "destination_ip": {
                    "terms": {
                        "field" : "DstIP"
                    },
                    "aggs": {
                        "port" : {
                            "terms": {
                                "field" : "DstPort"
                            }
                        }
                    }
                }
            }
        }
    }
}

The query_string itself only returns 1266 hits so I'm a bit confused by the OOM error.

Am I using aggregations incorrectly? If not, what can I do to troubleshoot this issue? Thanks!

解决方案

You are loading the entire SrcIP-, DstIP-, and DstPort-fields into memory in order to aggregate on them. This is because Elasticsearch un-inverts the entire field to be able to rapidly look up a document's value for a field given its ID.

If you're going to largely be aggregating on a very small set of data, you should look into using docvalues. Then a document's value is stored in a way that makes it easy to look up given the document's ID. There's a bit more overhead to it, but that way you'll leave it to the operating system's field cache to have the relevant pages in memory, instead of having to load the entire field.

这篇关于具有嵌套聚合的弹性搜索查询导致内存不足的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆