将节点添加到运行中的集群Elasticsearch中导致主节点未发现异常 [英] Add node to running cluster elasticsearch causes master not discovered exception
问题描述
我有一个正在运行的集群,我想在其中添加一个数据节点.正在运行的群集是
I have a running cluster and I would like to add a data node into it. The running cluster is
x.x.x.246
,数据节点为
x.x.x.99
每个服务器都可以通过ping互相查看.机器操作系统:CentOS7Elasticsearch:7.61
each server can see each other by ping. Machine OS: CentOS7 Elasticsearch: 7.61
这是x.x.x.246的elasticsearch.yml:
here is elasticsearch.yml of x.x.x.246:
cluster.name: elasticsearch
node.master: true
node.name: Node_master
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: x.x.x.246
http.port: 9200
discovery.seed_hosts: ["x.x.x.99:9300"]
cluster.initial_master_nodes: ["x.x.x.246:9300"]
这是x.x.x.99的elasticsearch.yml
here is elasticsearch.yml of x.x.x.99
cluster.name: elasticsearch
node.name: Node_master
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: x.x.x.99
http.port: 9200
discovery.seed_hosts: ["x.x.x.245:9300"]
cluster.initial_master_nodes: ["x.x.x.246:9300"]
在计算机上测试运行的Elasticsearch
当我在每台机器上运行 systemctl start elasticsearch
时,它运行良好.
curl -X GET "X.X.X.246:9200/_cluster/health?pretty"
show:节点数不变
show:number of the node not changing
curl -X GET "X.X.X.99:9200/_cluster/health?pretty
显示:
{
"error" : {
"root_cause" : [
{
"type" : "master_not_discovered_exception",
"reason" : null
}
],
"type" : "master_not_discovered_exception",
"reason" : null
},
"status" : 503
}
已编辑
这是x.x.x.246的elasticsearch.yml:
edited
here is elasticsearch.yml of x.x.x.246:
cluster.name: elasticsearch
node.name: master
node.master: true
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.99","x.x.x.246]
cluster.initial_master_nodes: ["x.x.x.246"]
logger.org.elasticsearch.discovery: TRACE
这是x.x.x.99的elasticsearch.yml
here is elasticsearch.yml of x.x.x.99
cluster.name: elasticsearch
node.name: node
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.246","x.x.x.99"]
cluster.initial_master_nodes: ["x.x.x.246"]
logger.org.elasticsearch.discovery: TRACE
登录x.x.x.99:
log on x.x.x.99:
[root@dev ~]# tail -30 /var/log/elasticsearch/elasticsearch.log
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:830) ~[?:?]
[2020-03-19T12:12:04,462][INFO ][o.e.c.c.JoinHelper ] [node-1] failed to join {master}{0UHYehfNQ2-WCadTC_VVkA}{1FNy5AJrTpKOCAejBLKR2w}{10.64.2.246}{10.64.2.246:9300}{dilm}{ml.machine_memory=1907810304, ml.max_open_jobs=20, xpack.installed=true} with JoinRequest{sourceNode={node-1}{jb_3lJq1R5-BZtxlPs_NyQ}{a4TYDhG7SWqL3CSG4tusEg}{10.64.2.99}{10.64.2.99:9300}{d}{xpack.installed=true}, optionalJoin=Optional[Join{term=178, lastAcceptedTerm=8, lastAcceptedVersion=100, sourceNode={node-1}{jb_3lJq1R5-BZtxlPs_NyQ}{a4TYDhG7SWqL3CSG4tusEg}{10.64.2.99}{10.64.2.99:9300}{d}{xpack.installed=true}, targetNode={master}{0UHYehfNQ2-WCadTC_VVkA}{1FNy5AJrTpKOCAejBLKR2w}{10.64.2.246}{10.64.2.246:9300}{dilm}{ml.machine_memory=1907810304, ml.max_open_jobs=20, xpack.installed=true}}]}
org.elasticsearch.transport.RemoteTransportException: [master][10.64.2.246:9300][internal:cluster/coordination/join]
Caused by: java.lang.IllegalStateException: failure when sending a validation request to node
at org.elasticsearch.cluster.coordination.Coordinator$2.onFailure(Coordinator.java:514) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.action.ActionListenerResponseHandler.handleException(ActionListenerResponseHandler.java:59) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1118) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.TransportService$ContextRestoreResponseHandler.handleException(TransportService.java:1118) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.InboundHandler.lambda$handleException$2(InboundHandler.java:244) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:633) ~[elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:830) [?:?]
Caused by: org.elasticsearch.transport.RemoteTransportException: [node-1][10.64.2.99:9300][internal:cluster/coordination/join/validate]
Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a different cluster uuid P4QlwvuRRGSmlT77RroSjA than local cluster uuid oUoIe2-bSbS2UPg722ud9Q, rejecting
at org.elasticsearch.cluster.coordination.JoinHelper.lambda$new$4(JoinHelper.java:148) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler$1.doRun(SecurityServerTransportInterceptor.java:257) ~[?:?]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.xpack.security.transport.SecurityServerTransportInterceptor$ProfileSecuredRequestHandler.messageReceived(SecurityServerTransportInterceptor.java:315) ~[?:?]
at org.elasticsearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:63) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.transport.InboundHandler$RequestHandler.doRun(InboundHandler.java:264) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:692) ~[elasticsearch-7.6.1.jar:7.6.1]
at org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:37) ~[elasticsearch-7.6.1.jar:7.6.1]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?]
at java.lang.Thread.run(Thread.java:830) ~[?:?]
推荐答案
对于节点 x.x.x.99
,种子主机的输入错误.它应该如下所示:
For node x.x.x.99
the entry for seed host is wrong. It should be as below:
discovery.seed_hosts: ["x.x.x.246:9300"]
discovery.seed_hosts
列表用于检测主节点,因为此列表包含作为主资格节点的节点的地址,并且还保存当前主节点的信息.在 xxx99
的配置中指向 xxx245
而不是 xxx246
,节点 xxx99
为无法检测到主机.
The discovery.seed_hosts
list is used to detect the master node, since this list contains the address to the nodes which are master eligible nodes and hold the information of the current master node as well, Since it is pointed to x.x.x.245
instead of x.x.x.246
in the configuration of x.x.x.99
, the node x.x.x.99
is unable to detect the master.
发表评论的讨论正确的配置应为:
主节点:
cluster.name: elasticsearch
node.name: master
node.master: true
node.data: true
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.246]
cluster.initial_master_nodes: ["master"]
请注意,如果您希望上述节点仅是主节点,而不保存数据,则进行设置
Note that if you want the above node to be master only and not hold data then set
node.data: false
数据节点:
cluster.name: elasticsearch
node.name: data-node-1
node.data: true
node.master: false
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: ["x.x.x.246"]
此外,由于节点 x.x.x.99
无法加入集群,因此它具有过时的集群状态.因此,删除 x.x.x.99
上的 data
文件夹,然后重新启动该节点.
Also since node x.x.x.99
could not join cluster it has stale cluster state. So delete data
folder on x.x.x.99
and restart this node.
这篇关于将节点添加到运行中的集群Elasticsearch中导致主节点未发现异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!