如何测量 Storm 拓扑中的延迟和吞吐量 [英] How to measure latency and throughput in a Storm topology

查看:35
本文介绍了如何测量 Storm 拓扑中的延迟和吞吐量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在通过示例学习 Storm ExclamationTopology.我想测量一个bolt和吞吐量(例如,每秒有多少个单词通过bolt)的延迟(将!!!添加到一个单词中所需的时间).

I'm learning Storm with the example ExclamationTopology. I want measure the latency (the time it takes to add !!! to a word) of a bolt and throughput (say, how many words pass through a bolt per second).

这里,我可以数出字数和字数执行螺栓的次数:

From here, I can count the number of words and how many times a bolt is executed:

_countMetric = new CountMetric();
_wordCountMetric = new MultiCountMetric();


context.registerMetric("execute_count", _countMetric, 5);
context.registerMetric("word_count", _wordCountMetric, 60);

我知道 Storm UI 提供了 Process LatencyExecute Latency 和这个

I know that the Storm UI gives Process Latency and Execute Latency and this post gives a good explanation of what they are.

但是,我想记录每个 bolt 每次执行的延迟,并使用此信息和 word_count 来计算吞吐量.

However, I want to log the latency of every execution of each bolt, and use this information along with the word_count to calculate the throughput.

我如何使用 Storm Metrics 来完成此任务?

How can I use Storm Metrics to accomplish this?

推荐答案

虽然您的问题很直接,肯定会引起很多人的兴趣,但它的答案并不像应有的那么琐碎.首先,我们需要澄清,我们究竟想要测量什么.吞吐量和延迟是很容易理解的术语,但在 Storms 分布式环境中,事情变得更加复杂.

While your question is straight forward and will be surely for interest for many people, it`s answer is not as trivial as it should be. First of all, we need to clarify, what exactly we really want to measure. Throughput and Latency are terms, that can be easily be understood but in things get more complicated in Storms distributed environment.

正如这篇优秀的博客文章,每个Storm主管至少有3个线程来完成不同的任务.当 Worker Receiver Thread 等待传入的数据元组并将它们聚合成一个块时,它们被发送到 Worker Executor Thread.这包含用户逻辑(在你的例子中是 ExclamationBolt 和一个处理传出消息的发送者.最后,在每个主管节点上,有一个 Worker Send Thread聚合来自所有执行者的消息,聚合它们并将它们发送到网络.

As depicted in this excellent blog post, each Storm supervisor has at least 3 threads which fulfill different tasks. While the Worker Receiver Thread waits for incoming data tuples and aggregates them to a bulk, they are send to the Worker Executor Thread. This contains the user logic (in your case the ExclamationBolt and a sender that takes care of the outgoing messages. Finally, on every Supervisor Node, there is a Worker Send Thread that aggregates messages coming from all executors, aggregates them and send them to the network.

当然,每个线程都有自己的延迟和吞吐量.对于 Sender 和 Receiver Thread,它们在很大程度上取决于缓冲区大小,您可以调整这些大小.在您的情况下,您只想测量一个(执行)螺栓的延迟和吞吐量 - 这是可能的,但请记住,其他线程会对这个螺栓产生影响.

For sure, each of those threads has its own latency and throughput. For the Sender and Receiver Thread, they are largely depending on the buffer sizes, that you can adjust. In your case, you want just to measure latency and throughput of one (execution) bolt - this is possible, but keep in mind that those other threads have their effects on this bolt.

我的方法:为了获得延迟和吞吐量,我使用了旧的 Storm 内置指标.因为我发现文档不是很清楚,我在这里划了一条线:我们使用新的Storm Metric API v2,我们使用集群指标.

My approach: To obtain latency and throughput, I used the old Storm Builtin Metrics. Because I found the documentation not very clear, I o draw a line here: we are not using the new Storm Metric API v2 and we are not using Cluster Metrics.

  1. 通过在您的 storm.yaml 中放置以下内容来激活 Storm Logging:
  1. Activate the Storm Logging with placing the following in your storm.yaml:

topology.metrics.consumer.register:
  - class: "org.apache.storm.metric.LoggingMetricsConsumer"
    parallelism.hint: 1

  1. 您可以设置报告间隔:topology.builtin.metrics.bucket.size.secs: 10

运行您的查询.所有指标每 10 秒记录在特定的指标日志文件中.找到这个日志文件并非易事.Storm 创建了一个 LoggingMetricsConsumer-Bolt 并将其分发到集群中.在这个节点上,你应该在 Storm 日志中找到相应的度量文件.

Run your Query. All metrics are logged every 10 Seconds in a specific metrics-logfile. It is not trivial to find this logfile. Storm creates a LoggingMetricsConsumer-Bolt and distributes it among the cluster. On this node, you should find in the Storm logs the corresponding metric file.

这个指标文件包含每个执行者正在寻找的指标,例如:complete-latencyexecute-latency 等等.对于吞吐量,我将使用包含例如:arrival_rate_secs 的队列指标作为每秒插入多少元组的估计.照顾好在每个主管上执行的多个线程.

This metric file contains for each executor the metrics, you are looking for, like: complete-latency, execute-latency and so on. For throughput, I would use the Queue Metrics that contains e.g.: arrival_rate_secs as an estimate of how many tuples are inserted per second. Take care of the multiple threads that are executed on every supervisor.

祝你好运!

这篇关于如何测量 Storm 拓扑中的延迟和吞吐量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆