Apache Storm Kafka喷口滞后问题 [英] Apache Storm Kafka Spout Lag Issue

查看:167
本文介绍了Apache Storm Kafka喷口滞后问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Storm 1.1.2和Kafka 0.11构建一个Java Spring应用程序,该应用程序将在Docker容器中启动.

I am building a Java Spring application using Storm 1.1.2 and Kafka 0.11 to be launched in a Docker container.

我的拓扑中的所有内容都按计划工作,但是在卡夫卡(Kafka)的高负载下,卡夫卡的滞后性随着时间的推移而越来越大.

Everything in my topology works as planned but under a high load from Kafka, the Kafka lag increases more and more over time.

我的KafkaSpoutConfig:

My KafkaSpoutConfig:

 KafkaSpoutConfig<String,String> spoutConf = 
     KafkaSpoutConfig.builder("kafkaContainerName:9092", "myTopic")
     .setProp(ConsumerConfig.GROUP_ID_CONFIG, "myGroup")
     .setProp(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, MyObjectDeserializer.class)
     .build()

然后我的拓扑如下

TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("stormKafkaSpout", new KafkaSpout<String,String>(spoutConf), 25);

builder.setBolt("routerBolt", new RouterBolt(),25).shuffleGrouping("stormKafkaSpout");

Config conf = new Config();
conf.setNumWorkers(10);
conf.put(Config.STORM_ZOOKEEPER_SERVERS, ImmutableList.of("zookeeper"));
conf.put(Config.STORM_ZOOKEEPER_PORT, 2181);

conf.put(Config.NIMBUS_SEEDS, ImmutableList.of("nimbus"));
conf.put(Config.NIMBUS_THRIFT_PORT, 6627);

System.setProperty("storm.jar", "/opt/storm.jar");

StormSubmitter.submitTopology("topologyId", conf, builder.createTopology());

RouterBolt(扩展了BaseRichBolt)执行一个非常简单的switch语句,然后使用本地KafkaProducer对象将新消息发送到另一个主题.就像我说的那样,所有内容都可以编译并且拓扑按预期运行,但是在高负载(3000消息/秒)下,Kafka滞后正好堆积起来,等同于拓扑的低吞吐量.

The RouterBolt (which extends BaseRichBolt) does one very simple switch statement and then uses a local KafkaProducer object to send a new message to another topic. Like I said, everything compiles and the topology runs as expected but under a high load (3000 messages/s), the Kafka lag just piles up equating to low throughput for the topology.

我尝试禁用acking

I've tried disabling acking with

conf.setNumAckers(0);

conf.put(Config.TOPOLGY_ACKER_EXECUTORS, 0);

但是我想这不是一个令人担忧的问题.

but I guess it's not an acking issue.

我在Storm UI上看到RouterBolt在高负载下的执行延迟为1.2毫秒,处理延迟为.03毫秒,这让我相信Spout是瓶颈,而且并行提示是25,因为有"myTopic"的25个分区.谢谢!

I've seen on the Storm UI that the RouterBolt has execution latency of 1.2ms and process latency of .03ms under the high load which leads me to believe the Spout is the bottleneck.Also the parallelism hint is 25 because there are 25 partitions of "myTopic". Thanks!

推荐答案

您可能会受到

You may be affected by https://issues.apache.org/jira/browse/STORM-3102, which causes the spout to do a pretty expensive call on every emit. Please try upgrading to one of the fixed versions.

该修复程序尚未真正发布.您可能仍想通过使用以下方法从源代码构建喷口来尝试解决此问题. https://github.com/apache/storm/tree/1.1.x-分支以构建1.1.4快照.

The fix isn't actually released yet. You might still want to try out the fix by building the spout from source using e.g. https://github.com/apache/storm/tree/1.1.x-branch to build a 1.1.4 snapshot.

这篇关于Apache Storm Kafka喷口滞后问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆