如何确保Kafka集群完全启动? [英] How to ensure a kafka cluster is fully up?

查看：29 发布时间：2021/4/8 18:46:28 apache-kafka

本文介绍了如何确保Kafka集群完全启动?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我们有3个Zookeeper在生产中运行着五个节点群集-所有都是VM.为了进行一些硬件修补，我们必须经常重新启动集群.

We have five node cluster running in production with 3 zookeepers - all are VMs. We have to restart the cluster often for some hardware patching.

我们编写了一个ansible脚本，按以下顺序关闭集群，

We have written an ansible script to shutdown the cluster in the following order,

通过杀死进程来停止Kafka连接(依次连接1个，2个，3个节点)
使用kafka-server-stop.sh停止Kafka(依次1、2、3、4、5个节点)
使用zookeeper-server-stop.sh停止Zookeeper(依次1、2、3个节点)

修补后，启动脚本将执行以下操作

After patching, start script will do the following

使用zookeeper-server-start.sh启动Zookeeper(依次1、2、3个节点)
使用kafka-server-start.sh启动Kafka(依次1、2、3、4、5个节点)
使用connect-distributed.sh启动Kafka连接(依次1、2、3个节点)

问题在于启动脚本的#3步骤，在执行#3(启动kafka connect)以使kafka集群完全启动并运行之前，我们保持了大约10分钟的硬编码延迟.但是有时，群集中的某些节点需要更多的时间来启动，因此即使经过了延迟，kafka连接也无法启动-在这种情况下，我们必须等待30分钟，然后尝试再次手动重新启动连接.

The issue is with the #3 step of start script, we have kept a hard coded delay about 10 mins before executing #3 (starting kafka connect) to make kafka cluster is fully up and running. But sometimes, some of the nodes in the cluster take more time to start, hence kafka connect start up fails even after the delay - In this case we have to wait for 30 mins and try restarting the connect manually again.

在启动其他进程之前，有什么方法可以确保集群中的所有节点都已启动并正在运行吗?

Is there any way to make sure that all nodes in the cluster is up and running, before I start the other processes?

预先感谢.

硬编码延迟不起作用，我们无法在某些假设的情况下继续更改延迟

Hard coded delay does not work, we can't keep on changing the delay with some assumption

如何确保Kafka集群完全启动? [英] How to ensure a kafka cluster is fully up?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何确保Kafka集群完全启动? [英] How to ensure a kafka cluster is fully up?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭