监控Kafka主题的消费者数量 [英] Monitoring number of consumer for the Kafka topic

查看:51
本文介绍了监控Kafka主题的消费者数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们使用 Prometheus 和 Grafana 来监控我们的 Kafka 集群.

We are using Prometheus and Grafana for monitoring our Kafka cluster.

在我们的应用程序中,我们使用 Kafka 流,并且 Kafka 流有可能因异常而停止.我们正在记录事件 setUnCaughtExceptionHandler 但是,我们还需要在流停止时发出某种警报.

In our application, we use Kafka streams and there is a chance that Kafka stream getting stopped due to exception. We are logging the event setUnCaughtExceptionHandler but, we also need some kind of alerting when the stream stops.

我们目前拥有的是,jmx_exporter 作为代理运行并通过端点公开 Kafka 指标,而 prometheus 从端点获取指标.

What we currently have is, jmx_exporter running as a agent and exposes Kafka metrics through an endpoint and prometheus fetches the metrics from the endpoint.

我们没有看到任何类型的指标可以提供每个主题的活跃消费者数量.我们错过了什么吗?关于如何获取活跃消费者数量并在消费者停止时发送警报的任何建议.

We don't see any kind of metrics which gives the count of active consumers per topic. Are we missing something? Any suggestions on how to get the number of active consumers and send alerts when the consumer stops.

推荐答案

我们有类似的需求,将每个分区的 Kafka Consumer Lag 添加到 Grafana 中,并在延迟超过指定阈值时添加警报(每个主题的阈值应该不同,取决于负载,例如对于某些主题,它可能是 10,而对于高负载 - 100000).所以如果你有更多,例如1000 条未处理的消息,您将收到警报.

we had similar needs and added Kafka Consumer Lag per partition into Grafana, and also added alerts if lag is more than specified threshold (threshold should be different per each topic, depending on load, e.g. for some topics it could be 10, and for highly loaded - 100000). so if you have more that e.g. 1000 unprocessed messages, you will get alert.

您可以为每个 kafka 流添加状态侦听器,如果流处于错误状态,则记录错误或发送电子邮件:

you could add state listener for each kafka stream and in case stream is in error state, log error or send email:

kafkaStream.setStateListener((newState, oldState) -> {
    log.info("Kafka stream state changed [{}] >>>>> [{}]", oldState, newState);
    if (newState == KafkaStreams.State.ERROR || newState == KafkaStreams.State.PENDING_SHUTDOWN) {
        log.error("Kafka Stream is in [{}] state. Application should be restarted", newState);
    }
});

您也可以添加健康检查指示器(例如通过 REST 端点或通过 spring-boot HealthIndicator)提供信息流是否正在运行:

also you could add health check indicator (e.g. via REST endpoint or via spring-boot HealthIndicator) that provides info whether stream is running or not:

KafkaStreams.State streamState = kafkaStream.state();state.isRunning();

我还没有找到任何提供有关活动消费者或可用连接分区信息的 kafka 流指标,但对我而言,如果 kafka 流提供此类数据(并希望在未来版本中可用),那就太好了.

I also haven't found any kafka streams metrics which provide info about active consumers or available connected partitions, but as for me it would be nice if kafka streams provide such data (and hope it will be available in future releases).

这篇关于监控Kafka主题的消费者数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆