如何确保kafka集群完全启动? [英] How to ensure a kafka cluster is fully up?

查看：31 发布时间：2021/11/12 2:06:18 apache-kafka

本文介绍了如何确保kafka集群完全启动?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们有 5 个节点集群在生产中运行，有 3 个 zookeeper - 都是虚拟机.我们必须经常重新启动集群以进行一些硬件修补.

We have five node cluster running in production with 3 zookeepers - all are VMs. We have to restart the cluster often for some hardware patching.

我们已经编写了一个ansible脚本来按以下顺序关闭集群，

We have written an ansible script to shutdown the cluster in the following order,

通过终止进程来停止 Kafka 连接(依次连接 1、2、3 个节点)
使用 kafka-server-stop.sh 停止 Kafka(依次停止 1、2、3、4、5 个节点)
使用 zookeeper-server-stop.sh 停止 Zookeeper(依次为 1、2、3 个节点)

打补丁后，启动脚本会做以下事情

After patching, start script will do the following

使用 zookeeper-server-start.sh 启动 Zookeeper(依次启动 1、2、3 个节点)
使用 kafka-server-start.sh 启动 Kafka(依次启动 1、2、3、4、5 个节点)
使用 connect-distributed.sh 启动 Kafka 连接(依次连接 1、2、3 个节点)

问题在于启动脚本的 #3 步骤，我们在执行 #3(启动 kafka 连接)之前保持了大约 10 分钟的硬编码延迟，以使 kafka 集群完全启动并运行.但有时，集群中的某些节点需要更多时间才能启动，因此即使延迟后 kafka 连接启动也会失败 - 在这种情况下，我们必须等待 30 分钟，然后再次尝试手动重新启动连接.

The issue is with the #3 step of start script, we have kept a hard coded delay about 10 mins before executing #3 (starting kafka connect) to make kafka cluster is fully up and running. But sometimes, some of the nodes in the cluster take more time to start, hence kafka connect start up fails even after the delay - In this case we have to wait for 30 mins and try restarting the connect manually again.

在我启动其他进程之前，有没有办法确保集群中的所有节点都已启动并正在运行?

Is there any way to make sure that all nodes in the cluster is up and running, before I start the other processes?

提前致谢.

硬编码延迟不起作用，我们不能通过一些假设继续改变延迟

Hard coded delay does not work, we can't keep on changing the delay with some assumption

如何确保kafka集群完全启动? [英] How to ensure a kafka cluster is fully up?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何确保kafka集群完全启动? [英] How to ensure a kafka cluster is fully up?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭