在 Storm 中配置并行性 [英] Configuring parallelism in Storm
问题描述
我是 Apache Storm 的新手,我正在尝试自己思考如何配置 Storm 并行性.所以有一篇很棒的文章Understanding the Parallelism of风暴拓扑",但它只会引起问题.
I am new to Apache Storm, and I am trying to figure for myself about configuring storm parallelism. So there is a great article "Understanding the Parallelism of a Storm Topology", but it only arouses questions.
当您有一个多节点风暴集群时,每个拓扑都根据 TOPOLOGY_WORKERS
配置参数作为一个整体分布.因此,如果您有 5 个工人,那么您就有 5 个 spout 副本(每个工人 1 个),而螺栓也是如此.
When you have a multinode storm cluster each topology is distributed as a whole according to TOPOLOGY_WORKERS
configuration parameter. So if you have 5 workers, then you have 5 copies of spout (1 per worker), and the same thing is with bolts.
如何在storm集群内部处理这样的情况(最好不创建外部服务):
How to deal with situation like this inside a storm cluster (preferably without creating external services):
- 我只需要一个供所有拓扑实例使用的 spout,例如,如果输入数据通过网络文件夹推送到集群,并扫描新文件.
- 混凝土类型的螺栓存在类似问题.例如,当数据由锁定到具体物理机器的授权第三方库处理时.
推荐答案
一、基础:
- Workers - 运行 executors,每个 worker 都有自己的 JVM
- Executors - 运行任务,每个 executor 被风暴分配到不同的 worker 中
- 任务 - 运行您的 spout/bolt 代码的实例
第二,更正……拥有 5 个工人并不意味着您将自动拥有 5 个喷口副本.拥有 5 个 worker 意味着你有 5 个独立的 JVM,storm 可以在其中分配执行器运行(将其视为 5 个存储桶).
Second, a correction... having 5 workers does NOT mean you will automatically have 5 copies of your spout. Having 5 workers means you have 5 separate JVMs where storm can assign executors to run (think of this as 5 buckets).
在您第一次创建和提交拓扑时配置了 spout 的实例数:
The number of instances of your spout is configured when you first create and submit your topology:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("0-spout", new MySpout(), spoutParallelism).setNumTasks(spoutTasks);
由于您只需要一个 spout 用于整个集群,您可以将 spoutParallelism
和 spoutTasks
都设置为 1.
Since you want only one spout for the entire cluster, you'd set both spoutParallelism
and spoutTasks
to 1.
这篇关于在 Storm 中配置并行性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!