弹性搜索未能在崩溃后恢复 [英] Elasticsearch failed to recover after crash

查看:230
本文介绍了弹性搜索未能在崩溃后恢复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从磁盘空间中取出并拧紧弹性搜索分片。三个节点现在是红色的,两个已经恢复,状态为黄色。 ES在CPU上运行了150%,内存高,试图恢复它们。但是看起来像是有一些版本的匹配冲突。



我清理了磁盘空间,并删除了一个分片的translog,以阻止从translog加载。但是令人惊讶的是,translog再次创建!



请分享如何停止尝试从translog恢复并恢复正常索引操作。我不想删除分页数据。

  [2014-10-31 03:11:43,842] [WARN] [ [西方欧洲] [4]发送[西欧] [4],[x5M73qVXS5eZIBdz40boEg],[P],[INITIALIZING],indexUUID [wy-tIJqdQiynz5SGQ2IrGA]的故障分片[无法启动分片,消息[IndexShardGatewayRecoveryException [[western_europe] [4]未能恢复分片];嵌套:ElasticsearchException [未能阅读[tweet] [527924645014818817]];嵌套:ElasticsearchIllegalArgumentException [无版本类型匹配[101]]; ]] 
[2014-10-31 03:11:43,842] [WARN] [cluster.action.shard] [Angela Cairn] [western_europe] [4]收到了[western_europe] [4] [x5M73qVXS5eZIBdz40boEg],[P],s [INITIALIZING],indexUUID [wy-tIJqdQiynz5SGQ2IrGA],原因[无法启动分片,消息[IndexShardGatewayRecoveryException [[western_europe] [4] failed to recover shard];嵌套:ElasticsearchException [未能阅读[tweet] [527924645014818817]];嵌套:ElasticsearchIllegalArgumentException [无版本类型匹配[101]]; ]]
[2014-10-31 03:11:43,859] [WARN] [indices.cluster] [Angela Cairn] [western_europe] [2]未能启动shard
org.elasticsearch.index。 gateway.IndexShardGatewayRecoveryException:[western_europe] [2]无法在org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:269)上恢复碎片

在org.elasticsearch.index。 gateway.IndexShardGatewayService $ 1.run(IndexShardGatewayService.java:132)
在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
在java.util.concurrent.ThreadPoolExecutor $ Worker.run (ThreadPoolExecutor.java:615)
在java.lang.Thread.run(Thread.java:744)
导致:org.elasticsearch.ElasticsearchException:未能阅读[tweet] [527936245440065536]
在org.elasticsearch.index.translog.Translog $ Index.readFrom(Translog.java:511)
在org.elasticsearch.index.translog.TranslogStreams.readTranslogOperation(TranslogStreams.java:52)
在org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:241)
... 4更多
引起的:org.elasticsearch.ElasticsearchIllegalArgumentException:没有版本类型匹配[116 ]
在org.elasticsearch.index.VersionType.fromValue(VersionType.java:307)
在org.elasticsearch.index.translog.Translog $ Index.readFrom(Translog.java:508)


解决方案

首先,查看碎片本身没有问题。使用Lucene的CheckIndex,如下所示: cd to yout / usr / share / elasticsearch / lib p>

  java -cp*-ea:org.apache.lucene ... org.apache.lucene.index.CheckIndex / var / lib / elasticsearch /< ES-NAME> / nodes /< NODE-NUMBER> / indices /< INDEX-NAME> /< SHARD-NUMBER / index / 
/ pre>

这将检查分片的问题,如果您的分片很大,将需要一段时间。



请注意,如果您的Java类路径错误,将丢失一些所需的jar文件,并且CheckIndex可能会丢失错误,并且错误地声明碎片中的所有段都已损坏,因此请仔细阅读输出。



如果分片存在问题,而没有其他方法可以恢复,则使用 -fix 参数运行相同的命令将修复分片,但您将丢失数据。 CheckIndex会警告您有多少文档(如果有的话)从碎片中丢失。



如果CheckIndex报告一切都与分片一致,那么希望您的问题只有在过渡期。事务日志是ElasticSearch用于原子性的预写日志。崩溃后,ES将尝试恢复碎片,包括尚未刷新到碎片索引本身的写入。这些都是在translog中,所以如果您删除它,您将失去它。然而,这比丢掉分片要好得多。在你的情况下,translog已经出现损坏,我不知道有什么办法恢复它。



要删除用于恢复的损坏的事务日志,只需在 / var / lib / elasticsearch /< ES-NAME> / nodes /< NODE-NUMBER> / indices /< INDEX中删除删除每个受影响的节点的每个相关分片的/ NAME> /< SHARD-NUMBER> / translog / 后一部分是重要的,因为您可能会看到集群尝试从另一个节点重新生成一个分片的translog。



碎片应该正确初始化,尽管可能需要一段时间才能完成。


Ran out of diskspace and that screwed the elasticsearch shards. Three nodes are now in red, two got recovered and their state is yellow. ES is running 150% on CPU and high on memory, trying to recover them. But looks like there is some version match conflict.

I cleared up the disk space and deleted the translog for a shard to stop loading from translog. But surprisingly the translog gets created again!

Please share how can I stop this attempt to recover from translog and resume normal index operations. I do not want to delete the shard data.

[2014-10-31 03:11:43,742][WARN ][cluster.action.shard     ] [Angela Cairn] [western_europe][4] sending failed shard for [western_europe][4], node[x5M73qVXS5eZIBdz40boEg], [P], s[INITIALIZING], indexUUID [wy-tIJqdQiynz5SGQ2IrGA], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[western_europe][4] failed to recover shard]; nested: ElasticsearchException[failed to read [tweet][527924645014818817]]; nested: ElasticsearchIllegalArgumentException[No version type match [101]]; ]]
[2014-10-31 03:11:43,742][WARN ][cluster.action.shard     ] [Angela Cairn] [western_europe][4] received shard failed for [western_europe][4], node[x5M73qVXS5eZIBdz40boEg], [P], s[INITIALIZING], indexUUID [wy-tIJqdQiynz5SGQ2IrGA], reason [Failed to start shard, message [IndexShardGatewayRecoveryException[[western_europe][4] failed to recover shard]; nested: ElasticsearchException[failed to read [tweet][527924645014818817]]; nested: ElasticsearchIllegalArgumentException[No version type match [101]]; ]]
[2014-10-31 03:11:43,859][WARN ][indices.cluster          ] [Angela Cairn] [western_europe][2] failed to start shard
org.elasticsearch.index.gateway.IndexShardGatewayRecoveryException: [western_europe][2] failed to recover shard
    at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:269)
    at org.elasticsearch.index.gateway.IndexShardGatewayService$1.run(IndexShardGatewayService.java:132)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.ElasticsearchException: failed to read [tweet][527936245440065536]
    at org.elasticsearch.index.translog.Translog$Index.readFrom(Translog.java:511)
    at org.elasticsearch.index.translog.TranslogStreams.readTranslogOperation(TranslogStreams.java:52)
    at org.elasticsearch.index.gateway.local.LocalIndexShardGateway.recover(LocalIndexShardGateway.java:241)
    ... 4 more
Caused by: org.elasticsearch.ElasticsearchIllegalArgumentException: No version type match [116]
    at org.elasticsearch.index.VersionType.fromValue(VersionType.java:307)
    at org.elasticsearch.index.translog.Translog$Index.readFrom(Translog.java:508)

解决方案

First, check there really are no issues with the shards themselves. cd to yout /usr/share/elasticsearch/lib directory or equivalent, and use Lucene's CheckIndex like so:

java -cp "*" -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /var/lib/elasticsearch/<ES-NAME>/nodes/<NODE-NUMBER>/indices/<INDEX-NAME>/<SHARD-NUMBER/index/

This will check a shard for problems, and will take a while if your shards are large.

Be aware that if you get the Java classpath wrong, some required jar files will be missing and CheckIndex may throw errors and wrongly claim all of the segments in the shard are broken, so read the output carefully.

If there are problems with a shard, and you have no other way to restore it, running the same command with the -fix argument will fix the shard but you will lose data. CheckIndex will warn you how many documents (if any) you stand to lose from the shard.

If CheckIndex reports all is well with the shard, then hopefully your problem is only in the translog. The transaction log is a write-ahead log which ElasticSearch uses for atomicity. After a crash, ES will attempt to restore a shard, including writes which had not been flushed to the shard index itself yet. These are in the translog, so you will lose them if you delete it. That, however, is much better than losing the shard. In your case, the translog already appears corrupt, and I don't know of any way to recover it.

To remove the corrupted transaction log being used for recovery, just delete the translog by removing the translog files in /var/lib/elasticsearch/<ES-NAME>/nodes/<NODE-NUMBER>/indices/<INDEX-NAME>/<SHARD-NUMBER>/translog/ for each relevant shard for each affected node. The latter part is important because you may be seeing the cluster attempt to regenerate a shard's translog from another node after you delete it from one.

The shards should then initialise correctly, although as usual that may take a while to complete.

这篇关于弹性搜索未能在崩溃后恢复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆