什么是“任务"?在Storm并行性中 [英] What is the "task" in Storm parallelism

查看:98
本文介绍了什么是“任务"?在Storm并行性中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过阅读精彩文章"了解Storm拓扑的并行性"

I'm trying to learn twitter storm by following the great article "Understanding the parallelism of a Storm topology"

但是,我对任务"的概念有点困惑.任务是组件(喷嘴或螺栓)的运行实例吗?具有多个任务的执行者实际上是说该执行者多次执行同一个组件,对吗?

However I'm a bit confused by the concept of "task". Is a task an running instance of the component(spout or bolt) ? A executor having multiple tasks actually is saying the same component is executed for multiple times by the executor, am I correct ?

此外,在一般的并行性意义上,Storm会为喷口或螺栓生成专用线程(执行器),但是具有多个任务的执行器(线程)对并行性有何贡献?我认为在一个线程中有多个任务,因为一个线程是按顺序执行的,所以只能使该线程成为一种缓存"资源,从而避免了为下一个任务运行产生新的线程.我对么?

Moreover in a general parallelism sense, Storm will spawn a dedicated thread(executor) for a spout or bolt, but what is contributed to the parallelism by an executor(thread) having multiple tasks ? I think having multiple tasks in a thread, since a thread executes sequentially, only make the thread a kind of "cached" resource, which avoids spawning new thread for next task run. Am I correct?

在花更多时间进行调查之后,我可能会自己消除那些困惑,但是,我们都喜欢stackoverflow;-)

I may clear those confusion by myself after taking more time to investigate, but you know, we both love stackoverflow ;-)

谢谢.

推荐答案

免责声明:我写了

但是,我对任务"的概念有点困惑.任务是组件(喷嘴或螺栓)的运行实例吗?具有多个任务的执行者实际上是说该执行者多次执行同一个组件,对吗?

However I'm a bit confused by the concept of "task". Is a task an running instance of the component(spout or bolt) ? A executor having multiple tasks actually is saying the same component is executed for multiple times by the executor, am I correct ?

是的,是的.

而且,从一般并行性的意义上讲,Storm会为喷口或螺栓生成专用线程(执行器),但是具有多个任务的执行器(线程)对并行性有何贡献?

Moreover in a general parallelism sense, Storm will spawn a dedicated thread(executor) for a spout or bolt, but what is contributed to the parallelism by an executor(thread) having multiple tasks ?

每个执行器运行一个以上的任务并不会提高并行度-执行器始终具有一个用于其所有任务的线程,这意味着任务在执行器上串行运行.

Running more than one task per executor does not increase the level of parallelism -- an executor always has one thread that it uses for all of its tasks, which means that tasks run serially on an executor.

正如我在文章中所写,请注意:

As I wrote in the article please note that:

  • 启动拓扑后可以更改执行程序线程的数量(请参见storm rebalance命令).
  • 拓扑的任务数是静态的.
  • The number of executor threads can be changed after the topology has been started (see storm rebalance command).
  • The number of tasks of a topology is static.

根据定义,存在#executors <= #tasks的不变性.

And by definition there is the invariant of #executors <= #tasks.

因此,每个执行程序线程要有2个以上任务的原因之一是,使您将来可以通过storm rebalance命令灵活地扩展/扩展拓扑,而无需使拓扑脱机.例如,假设您从一个由15台计算机组成的Storm集群开始,但是已经知道下周将再添加10个盒子.在这里,您可以选择在15个初始框(当然比25个框慢)上以25个机器的预期并行度运行拓扑.一旦集成了额外的10个盒子,您就可以storm rebalance拓扑以充分利用所有25个盒子,而无需停机.

So one reason for having 2+ tasks per executor thread is to give you the flexibility to expand/scale up the topology through the storm rebalance command in the future without taking the topology offline. For instance, imagine you start out with a Storm cluster of 15 machines but already know that next week another 10 boxes will be added. Here you could opt for running the topology at the anticipated parallelism level of 25 machines already on the 15 initial boxes (which is of course slower than 25 boxes). Once the additional 10 boxes are integrated you can then storm rebalance the topology to make full use of all 25 boxes without any downtime.

每个执行程序要运行2个以上任务的另一个原因是用于(主要是功能性)测试.例如,如果您的开发机或CI服务器仅具有足够的功能来运行,例如2个执行程序以及在该计算机上运行的所有其他程序,则您仍然可以运行30个任务(此处:每个执行程序15个)以查看是否您的自定义Storm分组按预期工作.

Another reason to run 2+ tasks per executor is for (primarily functional) testing. For instance, if your dev machine or CI server is only powerful enough to run, say, 2 executors alongside all the other stuff running on the machine, you can still run 30 tasks (here: 15 per executor) to see whether code such as your custom Storm grouping is working as expected.

实际上,我们通常每个执行者运行1个任务.

In practice we normally we run 1 task per executor.

PS:请注意,Storm实际上会产生一些更多幕后线索.例如,每个执行器都有自己的发送线程",负责处理传出的元组.还有例如系统级"后台线程.确认与您的"线程一起运行的元组. IIRC,Storm UI除了您的"线程之外​​,还计算那些确认线程.

PS: Note that Storm will actually spawn a few more threads behind the scenes. For instance, each executor has its own "send thread" that is responsible for handling outgoing tuples. There are also "system-level" background threads for e.g. acking tuples that run alongside "your" threads. IIRC the Storm UI counts those acking threads in addition to "your" threads.

这篇关于什么是“任务"?在Storm并行性中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆