设置因果集群失败 [英] Setting up Causal Cluster Fails

查看:82
本文介绍了设置因果集群失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试建立一个具有 3 个核心(仅限核心)的 Neo4J 因果集群.我有三台 Debian 服务器,都是 debian 8.5.我已经安装了 Java 8 和 Neo4J Enterprise 3.4.0(包源 deb https://debian.neo4j.org/repo 稳定/) 在每台服务器上.

I am trying to setup up a Neo4J Causal Cluster with 3 cores (core only). I have three Debian servers all debian 8.5. I have installed Java 8 and Neo4J Enterprise 3.4.0 (package source deb https://debian.neo4j.org/repo stable/) on each server.

我的主机是 192.168.20.163、192.168.20.164 和 192.168.20.165.每个主机上的配置都是相同的,但 IP 地址有明显的变化.以下为.163主机

My hosts are 192.168.20.163, 192.168.20.164 and 192.168.20.165. The config is the same on each host with the obvious change for IP address. The following is for the .163 host

dbms.connectors.default_listen_address=0.0.0.0
dbms.connectors.default_advertised_address=192.168.20.163
dbms.mode=CORE
causal_clustering.expected_core_cluster_size=3
causal_clustering.minimum_core_cluster_size_at_formation=3
causal_clustering.minimum_core_cluster_size_at_runtime=3
causal_clustering.initial_discovery_members=192.168.20.163:5000,192.168.20.164:5000,192.168.20.165:5000
causal_clustering.discovery_type=LIST
causal_clustering.discovery_listen_address=192.168.20.163:5000
causal_clustering.transaction_listen_address=192.168.20.163:6000
causal_clustering.raft_listen_address=192.168.20.163:7000

服务器完成选举过程,但 LEADER 继续切换回 FOLLOWER 并触发新的选举.

The servers go through the election process but the LEADER continues to switch back to FOLLOWER and trigger a new election.

非领导服务器或成员"均收到以下错误:

The non-leader servers or 'members' each get the following error:

ERROR [o.n.c.c.s.s.CoreStateDownloader] 由于存储,存储复制失败ID不匹配

ERROR [o.n.c.c.s.s.CoreStateDownloader] Store copy failed due to store ID mismatch

最先启动的服务器成为 LEADER,但如图所示切换回 FOLLOWER:

The server that was started first becomes a LEADER but as indicated switches back to FOLLOWER:

2018-05-30 14:58:22.808+0000 INFO [o.n.c.c.c.RaftMachine] Moving to CANDIDATE state after successfully starting election
2018-05-30 14:58:22.825+0000 INFO [o.n.c.m.SenderService] Creating channel to: [192.168.20.165:7000]
2018-05-30 14:58:22.827+0000 INFO [o.n.c.m.SenderService] Creating channel to: [192.168.20.164:7000]
2018-05-30 14:58:22.838+0000 INFO [o.n.c.p.h.HandshakeClientInitializer] Scheduling handshake (and timeout) local null remote null
2018-05-30 14:58:22.848+0000 INFO [o.n.c.p.h.HandshakeClientInitializer] Scheduling handshake (and timeout) local null remote null
2018-05-30 14:58:22.861+0000 INFO [o.n.c.m.SenderService] Connected: [id: 0x2ee2e930, L:/192.168.20.163:50169 - R:/192.168.20.165:7000]
2018-05-30 14:58:22.862+0000 INFO [o.n.c.p.h.HandshakeClientInitializer] Initiating handshake local /192.168.20.163:50169 remote /192.168.20.165:7000
2018-05-30 14:58:22.863+0000 INFO [o.n.c.m.SenderService] Connected: [id: 0x3d670ef3, L:/192.168.20.163:38239 - R:/192.168.20.164:7000]
2018-05-30 14:58:22.863+0000 INFO [o.n.c.p.h.HandshakeClientInitializer] Initiating handshake local /192.168.20.163:38239 remote /192.168.20.164:7000
2018-05-30 14:58:22.928+0000 INFO [o.n.c.p.h.HandshakeClientInitializer] Installing: ProtocolStack{applicationProtocol=RAFT_1, modifierProtocols=[]}
2018-05-30 14:58:22.929+0000 INFO [o.n.c.p.h.HandshakeClientInitializer] Installing: ProtocolStack{applicationProtocol=RAFT_1, modifierProtocols=[]}
2018-05-30 14:58:22.965+0000 INFO [o.n.c.p.h.HandshakeServerInitializer] Installing handshake server local /192.168.20.163:7000 remote /192.168.20.164:41725
2018-05-30 14:58:23.036+0000 INFO [o.n.c.c.c.RaftMachine] Moving to LEADER state at term 111 (I am MemberId{fbdff840}), voted for by [MemberId{4fe121e0}]
2018-05-30 14:58:23.036+0000 INFO [o.n.c.c.c.s.RaftState] First leader elected: MemberId{fbdff840}
2018-05-30 14:58:23.044+0000 INFO [o.n.c.c.c.s.RaftLogShipper] Starting log shipper: MemberId{f202d023}[matchIndex: -1, lastSentIndex: 0, localAppendIndex: 3, mode: MISMATCH]
2018-05-30 14:58:23.045+0000 INFO [o.n.c.c.c.s.RaftLogShipper] Starting log shipper: MemberId{4fe121e0}[matchIndex: -1, lastSentIndex: 0, localAppendIndex: 3, mode: MISMATCH]
2018-05-30 14:58:23.045+0000 INFO [o.n.c.c.c.m.RaftMembershipChanger] Idle{}
2018-05-30 14:58:23.046+0000 INFO [c.n.c.d.SslHazelcastCoreTopologyService] Leader MemberId{fbdff840} updating leader info for database default and term 111
2018-05-30 14:58:24.105+0000 INFO [o.n.c.p.h.HandshakeServerInitializer] Installing handshake server local /192.168.20.163:6000 remote /192.168.20.164:58041
2018-05-30 14:58:26.841+0000 INFO [o.n.c.p.h.HandshakeServerInitializer] Installing handshake server local /192.168.20.163:7000 remote /192.168.20.165:48317
2018-05-30 14:58:30.881+0000 INFO [o.n.c.p.h.HandshakeServerInitializer] Installing handshake server local /192.168.20.163:6000 remote /192.168.20.165:47015
2018-05-30 14:58:38.462+0000 INFO [o.n.c.c.c.m.MembershipWaiter] Leader commit unknown
2018-05-30 14:58:40.411+0000 INFO [o.n.c.c.c.RaftMachine] Moving to FOLLOWER state after not receiving heartbeat responses in this election timeout period. Heartbeats received: []
2018-05-30 14:58:40.411+0000 INFO [o.n.c.c.c.s.RaftState] Leader changed from MemberId{fbdff840} to null
2018-05-30 14:58:40.412+0000 INFO [o.n.c.c.c.s.RaftLogShipper] Stopping log shipper MemberId{f202d023}[matchIndex: -1, lastSentIndex: 3, localAppendIndex: 3, mode: MISMATCH]
2018-05-30 14:58:40.413+0000 INFO [o.n.c.c.c.s.RaftLogShipper] Stopping log shipper MemberId{4fe121e0}[matchIndex: -1, lastSentIndex: 3, localAppendIndex: 3, mode: MISMATCH]
2018-05-30 14:58:40.413+0000 INFO [o.n.c.c.c.m.RaftMembershipChanger] Inactive{}
2018-05-30 14:58:40.413+0000 INFO [c.n.c.d.SslHazelcastCoreTopologyService] Step down event detected. This topology member, with MemberId MemberId{fbdff840}, was leader in term 111, now moving to follower.
2018-05-30 14:58:48.342+0000 INFO [o.n.c.c.c.RaftMachine] Election timeout triggered

最终服务器失败:

ERROR [o.n.c.c.c.m.MembershipWaiterLifecycle] 服务器未能在追赶时间限制内加入集群 [600000 毫秒]

ERROR [o.n.c.c.c.m.MembershipWaiterLifecycle] Server failed to join cluster within catchup time limit [600000 ms]

推荐答案

根据您收到的消息,我假设您正在尝试使用来自某处的备份播种 集群?这是你应该做的:

Based on the messages you have I assume you are trying to seed the cluster with a backup from somewhere ? Here's what you should do :

  1. 检查集群是否在没有种子的情况下正确形成(所以在空数据库的情况下).这样您就可以验证所有设置是否正确.
  2. 使用备份为集群播种时,您需要在开始之前neo4j-admin 解除绑定每个实例上的数据库.检查 https://neo4j.com/docs/Operations-manual/current/clustering/causal-clustering/seed-cluster/ 以找出您案例的具体说明.商店 ID 不匹配就是您不解除绑定的结果.
  3. 如果 1. 和 2. 不能解决您的问题,请咨询 Neo4j 支持(因为您使用的是 EE,我假设您确实有支持).
  1. Check if the cluster forms correctly with no seeding (so with an empty database). That way you verify if all your settings are correct.
  2. When seeding the cluster with a backup you need to neo4j-admin unbind the database on each of the instances before starting. Check https://neo4j.com/docs/operations-manual/current/clustering/causal-clustering/seed-cluster/ to find out the specific instructions for your case. The store ID mismatch is what you get if you don't unbind.
  3. If 1. and 2. don't solve your problem, check with Neo4j support (since you are using the EE I assume you do have support).

希望这会有所帮助.

问候,汤姆

这篇关于设置因果集群失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆