elasticsearch:如何重新初始化节点? [英] elasticsearch: How to reinitialize a node?

查看:160
本文介绍了elasticsearch:如何重新初始化节点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

elasticsearch 1.7.2

elasticsearch 1.7.2 on CentOS

我们有一个运行良好的3节点集群.网络问题导致"B"节点失去网络访问权限.(然后发现C节点的"minimum_master_nodes"为1,而不是2.)

We have a 3 node cluster that has been running fine. A networking problem caused the "B" node to lose network access. (It then turns out that the C node had the "minimum_master_nodes" as 1, not 2.)

所以我们现在只和A节点一起戳.

So we are now poking along with just the A node.

我们在B和C节点上解决了问题,但他们拒绝提出并加入集群.在B和C上:

We fixed the issues on the B and C nodes, but they refuse to come up and join the cluster. On B and C:

# curl -XGET http://localhost:9200/_cluster/health?pretty=true
{
  "error" : "MasterNotDiscoveredException[waited for [30s]]",
  "status" : 503
}

elasticsearch.yml如下("b"和"c"节点上的名称反映在那些系统上的节点名称中,ALSO,每个节点上的IP addys反映了系统上的其他两个节点,但是)."c"节点,index.number_of_replicas被错误地设置为1.)

The elasticsearch.yml is as follows (the name on "b" and "c" nodes are reflected in the node names on those systems, ALSO, the IP addys on each node reflect the other 2 nodes, HOWEVER, on the "c" node, the index.number_of_replicas was mistakenly set to 1.)

cluster.name: elasticsearch-prod

node.name: "PROD-node-3a"

node.master: true

index.number_of_replicas: 2

discovery.zen.minimum_master_nodes: 2

discovery.zen.ping.multicast.enabled: false

discovery.zen.ping.unicast.hosts: ["192.168.3.100", "192.168.3.101"]

我们不知道他们为什么不加入.它们对A具有网络可见性,并且A可以看到它们.每个节点正确地在"discovery.zen.ping.unicast.hosts:"中定义了另外两个节点:

We have no idea why they won't join. They have network visibility to A, and A can see them. Each node correctly has the other two defined in "discovery.zen.ping.unicast.hosts:"

在B和C上,日志非常稀疏,什么也没告诉我们:

On B and C, the log is very sparse, and tells us nothing:

    # cat elasticsearch.log
[2015-09-24 20:07:46,686][INFO ][node                     ] [The Profile] version[1.7.2], pid[866], build[e43676b/2015-09-14T09:49:53Z]
[2015-09-24 20:07:46,688][INFO ][node                     ] [The Profile] initializing ...
[2015-09-24 20:07:46,931][INFO ][plugins                  ] [The Profile] loaded [], sites []
[2015-09-24 20:07:47,054][INFO ][env                      ] [The Profile] using [1] data paths, mounts [[/ (rootfs)]], net usable_space [148.7gb], net total_space [157.3gb], types [rootfs]
[2015-09-24 20:07:50,696][INFO ][node                     ] [The Profile] initialized
[2015-09-24 20:07:50,697][INFO ][node                     ] [The Profile] starting ...
[2015-09-24 20:07:50,942][INFO ][transport                ] [The Profile] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/10.181.3.138:9300]}
[2015-09-24 20:07:50,983][INFO ][discovery                ] [The Profile] elasticsearch/PojoIp-ZTXufX_Lxlwvdew
[2015-09-24 20:07:54,772][INFO ][cluster.service          ] [The Profile] new_master [The Profile][PojoIp-ZTXufX_Lxlwvdew][elastic-search-3c-prod-centos-case-48307][inet[/10.181.3.138:9300]], reason: zen-disco-join (elected_as_master)
[2015-09-24 20:07:54,801][INFO ][http                     ] [The Profile] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/10.181.3.138:9200]}
[2015-09-24 20:07:54,802][INFO ][node                     ] [The Profile] started
[2015-09-24 20:07:54,880][INFO ][gateway                  ] [The Profile] recovered [0] indices into cluster_state
[2015-09-24 20:42:45,691][INFO ][node                     ] [The Profile] stopping ...
[2015-09-24 20:42:45,727][INFO ][node                     ] [The Profile] stopped
[2015-09-24 20:42:45,727][INFO ][node                     ] [The Profile] closing ...
[2015-09-24 20:42:45,735][INFO ][node                     ] [The Profile] closed

我们如何使整个野兽栩栩如生?

How do we bring the whole beast to life?

  • 重新启动B和C完全没有区别
  • 我不愿意进行A循环,因为这就是我们的应用所要达到的目标...

推荐答案

好吧,我们不知道是什么使它栩栩如生,但它神奇地重新出现了.

Well, we do not know what brought it to life, but it kind of magically came back up.

我相信分片重新路由,(如下所示: elasticsearch:当我的三个节点中的两个故障时,我是否丢失了数据?/a>),导致节点重新加入集群.我们的理论是,节点A(唯一幸存的节点)不是健康"的主节点,因为它知道一个分片(分片1的"p"削减),如下所示:elasticsearch:当我的三个节点中有两个发生故障时,我是否丢失了数据?).

I believe that the shard reroute, (shown here: elasticsearch: Did I lose data when two of my three nodes went down? ) caused the nodes to rejoin the cluster. Our theory is that node A, the only surviving node, was not a "healthy" master, because it knew that one shard (the "p" cut of shard 1, as spelled out here: elasticsearch: Did I lose data when two of my three nodes went down? ) was not allocated.

由于主服务器知道它不是完整的,因此其他节点拒绝加入集群,并抛出"MasterNotDiscoveredException"

Since the master knew it was not intact, the other nodes declined to join the cluster, throwing the "MasterNotDiscoveredException"

一旦我们将所有"p"分片分配给了幸存的A节点,其他节点就加入了,并进行了整个复制动作.

Once we got all the "p" shards assigned to the surviving A node, the other nodes joined up, and did the whole replicating dance.

但是,通过这样的分片,数据丢失了.我们最终建立了一个新集群,并正在重建索引(需要几天的时间).

HOWEVER Data was lost by allocating the shard like that. We ultimately set up a new cluster, and are rebuilding the index (which takes several days).

这篇关于elasticsearch:如何重新初始化节点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆