什么是“任务"?在 Storm 并行性中 [英] What is the "task" in Storm parallelism

查看:28
本文介绍了什么是“任务"?在 Storm 并行性中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过阅读伟大的文章理解 Storm 拓扑的并行性"

I'm trying to learn twitter storm by following the great article "Understanding the parallelism of a Storm topology"

但是我对任务"的概念有点困惑.任务是组件(spout 或 bolt)的运行实例吗?有多个任务的执行器实际上是说执行器多次执行同一个组件,我说得对吗?

However I'm a bit confused by the concept of "task". Is a task an running instance of the component(spout or bolt) ? A executor having multiple tasks actually is saying the same component is executed for multiple times by the executor, am I correct ?

此外,在一般的并行性意义上,Storm 会为 spout 或 bolt 生成一个专用线程(执行器),但是具有多个任务的执行器(线程)对并行性有什么贡献?我认为在一个线程中有多个任务,因为一个线程按顺序执行,只会使该线程成为一种缓存"资源,从而避免为下一个任务运行产生新线程.我对么?

Moreover in a general parallelism sense, Storm will spawn a dedicated thread(executor) for a spout or bolt, but what is contributed to the parallelism by an executor(thread) having multiple tasks ? I think having multiple tasks in a thread, since a thread executes sequentially, only make the thread a kind of "cached" resource, which avoids spawning new thread for next task run. Am I correct?

在花更多时间进行调查后,我可能会自己解决这些困惑,但你知道,我们都喜欢 stackoverflow ;-)

I may clear those confusion by myself after taking more time to investigate, but you know, we both love stackoverflow ;-)

提前致谢.

推荐答案

免责声明:我写了 您在上述问题中引用的文章.

但是我对任务"的概念有点困惑.任务是组件(spout 或 bolt)的运行实例吗?有多个任务的执行器实际上是说执行器多次执行同一个组件,我说得对吗?

However I'm a bit confused by the concept of "task". Is a task an running instance of the component(spout or bolt) ? A executor having multiple tasks actually is saying the same component is executed for multiple times by the executor, am I correct ?

是的,是的.

此外,在一般的并行性意义上,Storm 会为 Spout 或 Bolt 生成一个专用线程(执行器),但是具有多个任务的执行器(线程)对并行性有何贡献?

Moreover in a general parallelism sense, Storm will spawn a dedicated thread(executor) for a spout or bolt, but what is contributed to the parallelism by an executor(thread) having multiple tasks ?

每个 executor 运行一个以上的任务并不会提高并行度——一个 executor 总是有一个线程用于它的所有任务,这意味着任务在一个 executor 上串行运行.

Running more than one task per executor does not increase the level of parallelism -- an executor always has one thread that it uses for all of its tasks, which means that tasks run serially on an executor.

正如我在文章中所写,请注意:

As I wrote in the article please note that:

  • 可以在拓扑启动后更改执行器线程的数量(请参阅 storm rebalance 命令).
  • 拓扑的任务数量是静态的.

根据定义,#executors <= #tasks 是不变量.

And by definition there is the invariant of #executors <= #tasks.

因此,每个执行器线程有 2 个以上任务的原因之一是让您可以灵活地在未来通过 storm rebalance 命令扩展/扩展拓扑,而无需使拓扑脱机.例如,假设您从一个由 15 台机器组成的 Storm 集群开始,但已经知道下周将添加另外 10 个机器.在这里,您可以选择在 15 个初始机器上已经有 25 台机器的预期并行度级别上运行拓扑(这当然比 25 个机器慢).一旦集成了额外的 10 个盒子,您就可以风暴重新平衡拓扑以充分利用所有 25 个盒子,而无需任何停机时间.

So one reason for having 2+ tasks per executor thread is to give you the flexibility to expand/scale up the topology through the storm rebalance command in the future without taking the topology offline. For instance, imagine you start out with a Storm cluster of 15 machines but already know that next week another 10 boxes will be added. Here you could opt for running the topology at the anticipated parallelism level of 25 machines already on the 15 initial boxes (which is of course slower than 25 boxes). Once the additional 10 boxes are integrated you can then storm rebalance the topology to make full use of all 25 boxes without any downtime.

每个执行程序运行 2 个以上任务的另一个原因是(主要是功能)测试.例如,如果您的开发机器或 CI 服务器的功能仅足以运行 2 个执行程序以及机器上运行的所有其他内容,您仍然可以运行 30 个任务(此处:每个执行程序 15 个)以查看诸如您的自定义 Storm 分组按预期工作.

Another reason to run 2+ tasks per executor is for (primarily functional) testing. For instance, if your dev machine or CI server is only powerful enough to run, say, 2 executors alongside all the other stuff running on the machine, you can still run 30 tasks (here: 15 per executor) to see whether code such as your custom Storm grouping is working as expected.

在实践中,我们通常每个执行器运行 1 个任务.

In practice we normally we run 1 task per executor.

PS:注意 Storm 实际上会生成一些更多幕后线索.例如,每个执行器都有自己的发送线程",负责处理传出的元组.还有系统级"后台线程,例如确认与您的"线程一起运行的元组.除了你的"线程之外​​,IIRC Storm UI 还会计算那些确认线程.

PS: Note that Storm will actually spawn a few more threads behind the scenes. For instance, each executor has its own "send thread" that is responsible for handling outgoing tuples. There are also "system-level" background threads for e.g. acking tuples that run alongside "your" threads. IIRC the Storm UI counts those acking threads in addition to "your" threads.

这篇关于什么是“任务"?在 Storm 并行性中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆