如何加快Elasticsearch的恢复速度? [英] How to speed up Elasticsearch recovery?

查看：509 发布时间：2020/5/4 7:34:36 performance elasticsearch lucene

本文介绍了如何加快Elasticsearch的恢复速度?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在研究6B小文档的ES群集，这些文档以6.5K索引进行组织，总共6TB.索引在7台服务器之间复制和分片. 索引占用量从几KB到几百GB不等.

I'm working on ES cluster of 6B of small documents, organized in 6.5K indexes, for a total of 6TB. The indexes are replicated and sharded among 7 servers. The indexes occupancy varies from few KB to hundreds of GB.

在使用ES之前，我曾在相同的文档组织中使用过Lucene.

Before using ES, I used Lucene with the same documents organization.

基于 Lucene的应用程序的恢复非常迅速.实际上，当查询到达时，索引是延迟加载的，然后将IndexReader缓存起来，以加快以后的回复速度.

The recovery of the Lucene based application was quite immediate. In fact, the indexes were lazy loaded when a query arrived and then the IndexReader were cached, to speed up future replies.

现在，使用Elasticsearch，恢复非常缓慢(数十分钟).请注意，通常在崩溃之前，所有索引都会打开，并且大多数索引会经常接收要建立索引的文档.

Now, with Elasticsearch, the recovery is very slow (tens of minutes). Note that usually before a crash, all the indexes are opened and that most of them receive documents to index quite often.

是否有任何好的方法可以减少ES恢复时间? 我还对与索引管理相关的任何事物都感兴趣，而不仅仅是与配置有关. 例如，我想更快地恢复最重要的索引，然后加载所有其他索引；这样，我可以减少大多数用户的停机时间.

Is there any good pattern to reduce the ES recovery time? I'm also interested in anything related the index management and not only about the configuration. For example, I would like to recovery faster the most important indexes and then load all the others; by doing so, I can reduce the perceived downtime for most of the users.

我正在使用以下配置:

#Max number of indices cuncurrently loaded at startup
indices.recovery.concurrent_streams: 80

#Max number of bytes cuncurrently readed at startup for loading the indices
indices.recovery.max_bytes_per_sec: 250mb

#Allow to control specifically the number of initial recoveries of primaries that are allowed per node
cluster.routing.allocation.node_initial_primaries_recoveries: 20

#Max number of indices cuncurrently loaded at startup
cluster.routing.allocation.node_concurrent_recoveries: 80

#the number of streams to open (on a node level) for small files (under 5mb) to recover a shard from a peer shard
indices.recovery.concurrent_small_file_streams: 30

PS:现在我正在使用ES 2.4.1，但是我将在几周后使用ES 5.2. PPS:一种情况可能是停电后的恢复.

PS: Right now I'm using ES 2.4.1, but I will use ES 5.2 in a few weeks. PPS: A scenario could be a recovery after a blackout.

谢谢！

推荐答案

编辑要在某些索引上优先进行恢复，可以通过以下方式在索引上使用优先级设置:

Edit To prioritize recovery on certain indices, you can use the priority setting on index this way:

PUT some_index
{
  "settings": {
    "index.priority": 10
  }
}

将首先恢复优先级最高的索引，否则按索引的创建时间对恢复进行排序，请参见

The index with the biggest priority will be recovered first, otherwise the recovery is ordered by creation time of the index, see this

第二次编辑:要更改副本数，您只需要一个HTTP请求:

Second Edit To change the number of replicas, you simply need a HTTP request:

PUT  index_name/_settings
{
  "index":{
    "number_of_replicas" : "0"
  }
}

关于快照恢复，我建议以下几点(某些情况可能不适用于您的情况):

Regarding snapshot recovery, I would suggest the following points (some might not be applicable to your case):

在恢复之前将副本数设置为0，然后将其交换回其默认值(减少写入)

如果使用旋转磁盘，则可以添加到elasticsearch.yml以提高索引速度:index.merge.scheduler.max_thread_count: 1(请参阅，然后将其恢复为默认值(请参见

put the number of replicas at 0 before the recovery then swap it back to its default value(less writing)
if using spinning disk, you can add to the elasticsearch.yml to increase the indexing speed: index.merge.scheduler.max_thread_count: 1 (see here)
Update before recovery your index settings with: "refresh_interval" : "-1" and put it back at its default value afterward(see the doc)

如果您还不关心搜索，则ES5群集上的以下内容也可能会有所帮助:

If you don't care about searching yet, the following on your ES5 cluster could also help:

PUT /_cluster/settings
{
    "transient" : {
        "indices.store.throttle.type" : "none" 
    }
}

以下几篇文章可能会有所帮助:

A few articles below that could help:

https://www.elastic.co/guide/en/elasticsearch/reference/5.x/tune-for-indexing-speed.html
https://www.elastic.co/guide/en/elasticsearch/reference/5.x/tune-for-disk-usage.html

一些一般性提示:确保已禁用交换功能. ES群集中的节点分配了多少内存? (由于jvm的内存寻址限制问题，您应该使用节点总可用内存的一半，上限为32 GB.)

A few general tips: be sure you have swapping disable. How much memory is allocated to your nodes in the ES cluster? (You should use half of the total available memory of a node, with a cap at 32 GB due to some memory addressing limit issue of jvms).

这篇关于如何加快Elasticsearch的恢复速度?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何加快Elasticsearch的恢复速度? [英] How to speed up Elasticsearch recovery?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何加快Elasticsearch的恢复速度? [英] How to speed up Elasticsearch recovery?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭