Elasticsearch中的-1 refresh_interval到底是什么意思? [英] What exactly does -1 refresh_interval in Elasticsearch mean?

查看:755
本文介绍了Elasticsearch中的-1 refresh_interval到底是什么意思?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经阅读了很多有关Elasticsearch中索引刷新的文章。我了解大于0的不同间隔的含义,这是连续段刷新之间经过的时间,使它们可用于搜索。但是,我不确定 refresh_interval:-1 到底能做什么。据我了解,这是一种禁用索引自动刷新的方法,但不能完全禁用。即使 refresh_interval 设置为-1,Elasticsearch仍会不时刷新段。我想知道如果禁用了自动刷新,哪种机制将控制此刷新活动。

I have read a lot of articles about index refreshing in Elasticsearch. I understand the implication of different intervals that are greater than 0, which is the elapsed time between consecutive segments flush, making them available for search. However, I am not sure what refresh_interval: -1 does exactly. In my understanding, it's a means to disable automatic index refreshing but not completely. Elasticsearch still flushes segments from time to time even though the refresh_interval is set to -1. I wonder which mechanism governs this flushing activity if automatic refresh is disabled.

对不起,我知道我没有很多代码可发布,因此我会做一点说明成为我所追求的背景。我的应用程序不需要近乎实时的搜索;它只需要最终的一致性。但是,这种偶然性应该是合理的,即在几秒钟到不到一分钟的时间内,而不是半个小时。我想知道是否可以让它留给Elasticsearch决定什么时候最好在其方便时刷新,而不是定期刷新。原因是因为禁用自动刷新确实会给我的应用程序带来一些性能方面的好处,例如在垃圾回收间隔之间,JVM堆大小使用率的上升幅度较小(请参见下图)

Sorry I know I don't have a lot of code to post, so I will give a bit of background into what I am after. My application doesn't need near real-time search; it only needs eventual consistency. However, this eventuality should be reasonable, i.e. within a few seconds to less than a minute, not half an hour. I was wondering if I can leave it to Elasticsearch to decide when best to refresh at its convenience rather than refreshing at a regular interval. The reason is because disabling automatic refreshing does bring some benefits in terms of performance to my application, e.g. JVM Heap Size usage rises less aggressively in between garbage collection interval (see graph below)

推荐答案

您的理解有些混乱。刷新索引和写入磁盘是两个不同的过程,并且不一定相关,因此,即使 refresh_interval 为-1,您对段仍在写入的观察。

There is a bit of confusion in your understanding. Refreshing the index and writing to disk are two different processes and are not necessarily related, thus your observation about segments still being written even if the refresh_interval is -1.

对文档建立索引后,会将其添加到内存缓冲区中并附加到事务日志文件中。刷新后,会将缓冲区中的文档写入新段,而没有fsync ,该段将打开以使其可见,并清除缓冲区。 translog尚未清除,并且实际上没有任何内容持久存储在磁盘上(因为没有 fsync )。

When a document is indexed, it is added to the in-memory buffer and appended to the translog file. When a refresh takes place the docs in the buffer are written to a new segment, without an fsync, the segment is opened to make it visible to search and the buffer is cleared. The translog is not yet cleared and nothing is actually persisted to disk (as there was no fsync).

现在想象刷新没有发生:没有索引刷新,您无法搜索您的文档,段不在缓存中创建。

Now imagine the refresh is not happening: there is no index refresh, you cannot search your documents, the segments are not created in cache.

设置此处将规定何时刷新(写入磁盘)发生。默认情况下,当日志的大小达到512mb时或30分钟后。实际上,这是在磁盘上保留数据,其他所有内容都在文件系统缓存中(如果节点死掉或计算机重新启动,则缓存会丢失,并且唯一可以保存的是日志)。

The settings here will dictate when the flush (writing to disk) happens. By default when the translog reaches 512mb in size, or after 30 minutes. This is actually persisting data on disk, everything else is in filesystem cache (if the node dies or the machine is rebooted the cache is lost and the translog is the only salvation).

这篇关于Elasticsearch中的-1 refresh_interval到底是什么意思?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆