Apache Storm Kafka Spout 滞后问题 [英] Apache Storm Kafka Spout Lag Issue

查看:40
本文介绍了Apache Storm Kafka Spout 滞后问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Storm 1.1.2 和 Kafka 0.11 构建要在 Docker 容器中启动的 Java Spring 应用程序.

I am building a Java Spring application using Storm 1.1.2 and Kafka 0.11 to be launched in a Docker container.

我的拓扑中的一切都按计划工作,但在 Kafka 的高负载下,Kafka 滞后随着时间的推移越来越多.

Everything in my topology works as planned but under a high load from Kafka, the Kafka lag increases more and more over time.

我的 KafkaSpoutConfig:

My KafkaSpoutConfig:

 KafkaSpoutConfig<String,String> spoutConf = 
     KafkaSpoutConfig.builder("kafkaContainerName:9092", "myTopic")
     .setProp(ConsumerConfig.GROUP_ID_CONFIG, "myGroup")
     .setProp(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, MyObjectDeserializer.class)
     .build()

那么我的拓扑如下

TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("stormKafkaSpout", new KafkaSpout<String,String>(spoutConf), 25);

builder.setBolt("routerBolt", new RouterBolt(),25).shuffleGrouping("stormKafkaSpout");

Config conf = new Config();
conf.setNumWorkers(10);
conf.put(Config.STORM_ZOOKEEPER_SERVERS, ImmutableList.of("zookeeper"));
conf.put(Config.STORM_ZOOKEEPER_PORT, 2181);

conf.put(Config.NIMBUS_SEEDS, ImmutableList.of("nimbus"));
conf.put(Config.NIMBUS_THRIFT_PORT, 6627);

System.setProperty("storm.jar", "/opt/storm.jar");

StormSubmitter.submitTopology("topologyId", conf, builder.createTopology());

RouterBolt(扩展 BaseRichBolt)执行一个非常简单的 switch 语句,然后使用本地 KafkaProducer 对象向另一个主题发送新消息.就像我说的那样,一切都会编译并且拓扑按预期运行,但是在高负载(3000 条消息/秒)下,Kafka 滞后只会堆积起来,相当于拓扑的低吞吐量.

The RouterBolt (which extends BaseRichBolt) does one very simple switch statement and then uses a local KafkaProducer object to send a new message to another topic. Like I said, everything compiles and the topology runs as expected but under a high load (3000 messages/s), the Kafka lag just piles up equating to low throughput for the topology.

我试过用

conf.setNumAckers(0);

conf.put(Config.TOPOLGY_ACKER_EXECUTORS, 0);

但我想这不是一个确认问题.

but I guess it's not an acking issue.

我在 Storm UI 上看到,RouterBolt 在高负载下的执行延迟为 1.2 毫秒,进程延迟为 0.03 毫秒,这让我相信 Spout 是瓶颈.此外,并行度提示为 25,因为有myTopic"的 25 个分区.谢谢!

I've seen on the Storm UI that the RouterBolt has execution latency of 1.2ms and process latency of .03ms under the high load which leads me to believe the Spout is the bottleneck.Also the parallelism hint is 25 because there are 25 partitions of "myTopic". Thanks!

推荐答案

您可能会受到 https://issues.apache.org/jira/browse/STORM-3102,这会导致 spout 在每次发出时都进行非常昂贵的调用.请尝试升级到固定版本之一.

You may be affected by https://issues.apache.org/jira/browse/STORM-3102, which causes the spout to do a pretty expensive call on every emit. Please try upgrading to one of the fixed versions.

该修复程序实际上尚未发布.您可能仍然想通过使用例如从源代码构建 spout 来尝试修复.https://github.com/apache/storm/tree/1.1.x-分支以构建 1.1.4 快照.

The fix isn't actually released yet. You might still want to try out the fix by building the spout from source using e.g. https://github.com/apache/storm/tree/1.1.x-branch to build a 1.1.4 snapshot.

这篇关于Apache Storm Kafka Spout 滞后问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆