了解elasticsearch jvm的堆用法 [英] Understanding elasticsearch jvm heap usage

查看：148 发布时间：2017/8/7 0:33:49 elasticsearch

本文介绍了了解elasticsearch jvm的堆用法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Folks，

我正在尝试减少我的弹性搜索部署（单节点群集）中的内存使用量。

我可以看到正在使用的3GB JVM堆空间。
要优化我首先需要了解瓶颈。
我对JVM使用情况的理解有限。

字段数据看起来要消耗1.5GB，并且过滤缓存&查询缓存组合消耗少于0.5GB，最多可以增加2GB。

有人可以帮助我了解elasticsearch在哪里放置1GB的剩余部分？

解决方案

我无法确定您的确切设置，但是为了知道堆中发生了什么，您可以使用jvisualvm工具（与jdk捆绑在一起）以及奇迹或

我还检查了我的字段数据和过滤器缓存是否为空：

我的缓存也增长了几兆字节：

  $ curl'localhost：9200 / _cat / fielddata？bytes = b& v'
 id host ip node total 
 TMVa3S2oTUWOElsBrgFhuw iMac.local 192.168.1.100 Tumbler 9066424

在这一点上我采取另一个堆转储获取堆的演变如何，我计算了

正如你所看到的那样，堆托管了三个主要的缓存，但它也是存放Elasticsearch进程所需的所有其他Java对象所需的地方，并不一定缓存相关。

所以如果你想控制你的堆使用情况，你显然无法控制ES需要正常运行的内部对象，但你可以肯定地影响你的缓存。如果您按照第一个项目符号列表中的链接，您将准确了解可以调整的设置。

还可以调整缓存可能不是唯一的选项，也许您需要重写一些查询才能更易于内存友好，或更改您的分析器或您的映射等等，很难说在你的情况下，没有更多的信息，但这应该给你一些潜在客户。

继续推出jvisualvm与我在这里和了解您的应用程序（搜索+索引）是否正在击中ES，您应该如何快速获得有关发生的情况的洞察。

Folks,

I am trying reduce my memory usage in my elasticsearch deployment (Single node cluster).

I can see 3GB JVM heap space being used. To optimize I first need to understand the bottleneck. I have limited understanding of how is JVM usage is split.

Field data looks to consume 1.5GB and filter cache & query cache combined consume less than 0.5GB, that adds upto 2GB at the max.

Can someone help me understand where does elasticsearch eats up rest of 1GB?

解决方案

I can't tell for your exact setup, but in order to know what's going on in your heap, you can use the jvisualvm tool (bundled with the jdk) together with marvel or the bigdesk plugin (my preference) and the _cat APIs to analyze what's going on.

As you've rightly noticed, the heap hosts three main caches, namely:

the fielddata cache: unbounded by default, but can be controlled with indices.fielddata.cache.size (in your case it seems to be around 50% of the heap, probably due to the fielddata circuit breaker)
the node query/filter cache: 10% of the heap
the shard request cache: 1% of the heap but disabled by default

There is nice mindmap available here (Kudos to Igor Kupczyński) that summarizes the roles of caches. That leaves more or less ~30% of the heap (1GB in your case) for all other object instances that ES needs to create in order to function properly (see more about this later).

Here is how I proceeded on my local env. First, I started my node fresh (with Xmx1g) and waited for green status. Then I started jvisualvm and hooked it onto my elasticsearch process. I took a heap dump from the Sampler tab so I can compare it later on with another dump. My heap looks like this initially (only 1/3 of max heap allocated so far):

I also checked that my field data and filter caches were empty:

Just to make sure, I also ran /_cat/fielddata and as you can see there's no heap used by field data yet since the node just started.

$ curl 'localhost:9200/_cat/fielddata?bytes=b&v'
id                     host       ip            node    total 
TMVa3S2oTUWOElsBrgFhuw iMac.local 192.168.1.100 Tumbler     0

This is the initial situation. Now, we need to warm this all up a bit, so I started my back- and front-end apps to put some pressure on the local ES node.

After a while, my heap looks like this, so its size has more or less increased by 300 MB (139MB -> 452MB, not much but I ran this experiment on a small dataset)

My caches have also grown a bit to a few megabytes:

$ curl 'localhost:9200/_cat/fielddata?bytes=b&v'
id                     host       ip            node      total 
TMVa3S2oTUWOElsBrgFhuw iMac.local 192.168.1.100 Tumbler 9066424

At this point I took another heap dump to gain insights into how the heap had evolved, I computed the retained size of the objects and I compared it with the first dump I took just after starting the node. The comparison looks like this:

Among the objects that increased in retained size, he usual suspects are maps, of course, and any cache-related entities. But we can also find the following classes:

NIOFSDirectory that are used to read Lucene segment files on the filesystem
A lot of interned strings in the form of char arrays or byte arrays
Doc values related classes
Bit sets
etc

As you can see, the heap hosts the three main caches, but it is also the place where reside all other Java objects that the Elasticsearch process needs and that are not necessarily cache-related.

So if you want to control your heap usage, you obviously have no control over the internal objects that ES needs to function properly, but you can definitely influence the sizing of your caches. If you follow the links in the first bullet list, you'll get a precise idea of what settings you can tune.

Also tuning caches might not be the only option, maybe you need to rewrite some of your queries to be more memory-friendly or change your analyzers or some fields types in your mapping, etc. Hard to tell in your case, without more information, but this should give you some leads.

Go ahead and launch jvisualvm the same way I did here and learn how your heap is growing while your app (searching+indexing) is hitting ES and you should quickly gain some insights into what's going on in there.

这篇关于了解elasticsearch jvm的堆用法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

了解elasticsearch jvm的堆用法 [英] Understanding elasticsearch jvm heap usage

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

了解elasticsearch jvm的堆用法 [英] Understanding elasticsearch jvm heap usage

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭