如何在风暴中调整并行提示 [英] how to tune the parallelism hint in storm

查看:114
本文介绍了如何在风暴中调整并行提示的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

并行提示"在风暴中用于并行化运行中的风暴拓扑.我知道有一些概念,例如工作程序,执行程序和任务.使并行性提示尽可能大以使您的拓扑尽可能并行化是否有意义?

"parallelism hint" is used in storm to parallelise a running storm topology. I know there are concepts like worker process, executor and tasks. Would it make sense to make the parallelism hint as big as possible so that your topologies are parallelised as much as possible?

我的问题是如何为我的风暴拓扑找到完美的并行提示数.它取决于风暴集群的规模,还是更像特定于拓扑/作业的设置,它从一种拓扑到另一种拓扑都不同?还是取决于两者?

My question is How to find a perfect parallelism hint number for my storm topologies. Is it depending on the scale of my storm cluster or it's more like a topology/job specific setting, it varies from one topology to another? or it depends on both?

推荐答案

添加到@Chiron解释的内容

Adding to what @Chiron explained

并行提示"在风暴中用于并行化运行中的风暴拓扑

实际上在风暴中,术语parallelism hint用于指定 组件(喷嘴,螺栓)的执行器(线程)的初始数量 ,例如

Actually in storm the term parallelism hint is used to specify the initial number of executor (threads) of a component (spout, bolt) e.g

    topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2)

上面的语句告诉Storm首先分配2执行程序线程(可以在运行时更改).再次

The above statement tells storm to allot 2 executor thread initially (this can be changed in the run time). Again

    topologyBuilder.setBolt("green-bolt", new GreenBolt(), 2).setNumTasks(4) 

setNumTasks(4)表示要运行4个相关任务(在拓扑的整个生命周期中这都是相同的).因此,在这种情况下,每个风暴将在每个执行程序上运行两个任务. 默认情况下,任务数设置为与执行程序数相同,即Storm将在每个线程中运行一个任务.

the setNumTasks(4) indicate to run 4 associated tasks (this will be same throughout the lifetime of a topology). So in this case each storm will be running two tasks per executor. By default, the number of tasks is set to be the same as the number of executors, i.e. Storm will run one task per thread.

使并行性提示尽可能大以使您的拓扑尽可能并行化是否有意义

一件关键要注意的事情,如果您打算为每个执行者运行多个任务,则不会增加并行度.因为执行程序使用一个线程来处理所有任务,即任务在执行程序上串行运行.

One key thing to note that if you intent to run more than one tasks per executor it does not increase the level of parallelism. Because executor uses one single thread to process all the tasks i.e tasks run serially on an executor.

为每个执行程序配置多个任务的目的是可以在运行时使用重新平衡机制来更改执行程序(线程)的数量(请记住在整个生命周期中任务的数量始终相同)拓扑)),而拓扑仍在运行.

The purpose of configuring more than 1 task per executor is it is possible to change the number of executor(thread) using the re-balancing mechanism in the runtime (remember the number of tasks are always the same through out the life cycle of a topology) while the topology is still running.

增加工人的数量(负责为一个或多个组件运行一个或多个执行程序)可能还会为您带来性能上的好处,但这也是相对的,正如我从

Increasing the number of workers (responsible for running one or more executors for one or more components) might also gives you a performance benefit, but this also relative as I found from this discussion where nathanmarz says

让更多的工人可能有更好的表现,具体取决于您的瓶颈所在.每个工作线程都有一个线程,该线程将元组传递给0mq连接,以便传输给其他工作线程,因此,如果您遇到了CPU瓶颈,并且每个工作人员都在处理大量的元组,那么更多的工作线程可能会为您带来更好的吞吐量.

Having more workers might have better performance, depending on where your bottleneck is. Each worker has a single thread that passes tuples on to the 0mq connections for transfer to other workers, so if you're bottlenecked on CPU and each worker is dealing with lots of tuples, more workers will probably net you better throughput.

因此,基本上没有明确的答案,您应该根据环境和设计尝试不同的配置.

So basically there is no definite answer to this, you should try different configuration based on your environment and design.

这篇关于如何在风暴中调整并行提示的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆