风暴,螺栓延迟与总延迟之间存在巨大差异? [英] Storm, huge discrepancy between bolt latency and total latency?

查看:90
本文介绍了风暴,螺栓延迟与总延迟之间存在巨大差异?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是我的拓扑的Storm UI的屏幕截图.这是在拓扑完成处理10k消息之后采取的.

(该拓扑配置了4个工作线程,并使用KafkaSpout).

我的螺栓的处理延迟"之和约为8100毫秒,而拓扑的完整延迟则要长得多,为115881毫秒.

我知道这类差异可能是由于资源争用或与Storm内部相关的事情而发生的.我相信这里的资源争夺不是问题.该测试期间GC根本没有运行,分析表明我有大量可用的CPU资源.

所以我认为问题是我在某种程度上滥用Storm内部.有什么建议在哪里看?

音色必须在某个地方等待,可能在喷口中.是等待发送到拓扑还是等待消息处理后被确认?

可能我应该调整ackers的数量(我将ackers设置为4,与工作人员的数量相同)?

有关如何解决此问题的其他常规建议?

*请注意,一个螺栓在其处理和执行延迟之间存在较大差异,因此实现了刻度螺栓,批处理模式.因此,这种差异是可以预期的.

*编辑. 我怀疑差异可能涉及在处理完后Spout确认的邮件.如果在处理过程中刷新Storm UI,则与Spouts的确认编号相比,最终Bolt的确认编号会迅速增加.尽管这可能是由于Spout所接收的消息少于最终的Bolt;最终螺栓确认的几百条消息可能代表了喷口中的一条消息.但是,我想我应该提到这种怀疑,以便就Spout的acker任务是否有溢出的可能性征求意见.

解决方案

可能有多种原因.首先,您需要了解如何测量数字.

  1. Spout完成延迟:在调用Spout.ack()之前发出元组的时间.
  2. 螺栓执行延迟:运行Bolt.execute()所需的时间.
  3. 螺栓处理延迟:调用时间Bolt.execute()直到螺栓固定给定的输入元组.

如果您不立即在Bolt.execute中确认每个传入的输入元组(这绝对可以),则处理延迟可能会比执行延迟高得多.

此外,由于元组可以保留在内部输入/输出缓冲区中,因此处理延迟不得总计为完整延迟.这会增加额外的时间,直到完成最后一个确认为止,从而增加了完全延迟.此外,ackers需要处理所有传入的ack,并通知Spout有关已完全处理的元组的信息.这还会增加完全延迟.

问题可能出在运算符之间的内部缓冲区很大.这可以通过增加dop(并行度)或通过设置参数TOPOLOGY_MAX_SPOUT_PEDING来解决-这限制了拓扑中的元组数.因此,如果飞行中有太多的元组,则喷口会停止发出元组,直到接收到码头为止.因此,元组不会在内部缓冲区中排队,并且完全等待时间会下降.如果这样做没有帮助,则可能需要增加确认器的数量.如果没有足够快地处理过支架,则支架可能会缓冲起来,从而增加完全等待时间.

Below is a screenshot of my topologies' Storm UI. This was taken after the topology finished processing 10k messages.

(The topology is configured with 4 workers and uses a KafkaSpout).

The sum of the "process latency" of my bolts is about 8100ms and the complete latency of the topology is a much longer 115881ms.

I'm aware that these sort of discrepancies can occur due to resource contention or something related to Storm internals. I believe resource contention is not an issue here; the GC didn't run at all during this test and profiling shows that I have plenty of available CPU resources.

So I assume the issue is that I am abusing Storm internals in some way. Any suggestions where to look?

Tuples must be waiting somewhere, possibly in the spouts; either waiting to be emitted to the topology or waiting to be acked when they messages have been processed?

Possibly I should adjust the number of ackers (I have set ackers to 4, the same as the number of workers)?

Any other general advice for how I should troubleshoot this?

*Note that the one bolt that has a large discrepancy between it's process and execute latencies implements the ticking bolt, batching pattern. So that discrepancy is expected.

*Edit. I suspect the discrepancy might involve the message being ack-ed by the Spout after being fully processed. If I refresh the Storm UI while it is processing, the ack-ed number for my final Bolt increase very quickly compared to the ack-ed number for the Spouts. Though this may be due to the Spout ack-ing much fewer messages than the final Bolt; a few hundred messages ack-ed by the final bolt may be representative of a single message in the Spout. But, thought I should mention this suspicion to get opinions on if it's a possibility, that the Spout's acker tasks are overflowing.

解决方案

There can be multiple reasons. First, of all you need to understand how the number are measured.

  1. Spout Complete Latency: the time a tuple is emitted until Spout.ack() is called.
  2. Bolt Execution Latency: the time it take to run Bolt.execute().
  3. Bolt Processing Latency: the time Bolt.execute() is called until the bolt acks the given input tuple.

If you do not ack each incoming input tuple in Bolt.execute immediately (which is absolutely ok), processing latency can be much higher than execution latency.

Furthermore, the processing latencies must not add up to the complete latency because tuple can stay in internal input/output buffers. This add additional time, until the last ack is done, thus increasing complete latency. Furthermore, the ackers need to process all incoming acks and notify the Spout about fully processed tuples. This also adds to the complete latency.

To the problem could be to large internal buffers between operators. This could be resolve by either increasing the dop(degree of parallelism) or by setting parameter TOPOLOGY_MAX_SPOUT_PEDING -- this limits the number of tuple within the topology. Thus, if too many tuples are in-flight the spout stops to emit tuples until it received acks. Therefore, tuples does not queue up in internal buffers and complete latency goes down. If this does not help, you might need to increase the number of ackers. If the acks are not processed fast enough, the acks could buffer up, increasing the complete latency, too.

这篇关于风暴,螺栓延迟与总延迟之间存在巨大差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆