Storm-Kafka多个喷嘴,如何分担负荷? [英] Storm-Kafka multiple spouts, how to share the load?

查看:104
本文介绍了Storm-Kafka多个喷嘴,如何分担负荷?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在多个喷嘴之间共享任务.我遇到的情况是,我一次从外部来源获得一个元组/消息,并且我想拥有多个喷口实例,其主要目的是分担负载并提高性能效率.

I am trying to share the task among the multiple spouts. I have a situation, where I'm getting one tuple/message at a time from external source and I want to have multiple instances of a spout, main intention behind is to share the load and increase performance efficiency.

我可以用一个喷口本身做同样的事情,但是我想在多个喷口之间分担负荷.我无法获得分散负载的逻辑.由于直到特定的喷口完成零件的消耗(即根据缓冲区大小设置),消息的偏移量才知道.

I can do the same with one Spout itself, but I want to share the load across multiple spouts. I am not able to get the logic to spread the load. Since the offset of messages will not be known until the particular spout finishes the consuming the part (i.e based on buffer size set).

任何人都可以对逻辑/算法的求解方法大开眼界吗?

Can anyone please put some bright light on the how to work-out on the logic/algorithm?

高级谢谢您的宝贵时间.


根据答案进行更新
现在在Kafka上使用了多分区(即5)
以下是使用的代码:
builder.setSpout("spout", new KafkaSpout(cfg), 5);

Advance Thanks for your time.


Update in response to answers:
Now used multi-partitions on Kafka (i.e 5)
Following is the code used:
builder.setSpout("spout", new KafkaSpout(cfg), 5);

通过在每个分区上充入800 MB数据进行测试,并花费~22 sec完成读取.

Tested by flooding with 800 MB data on each partition and it took ~22 sec to finish read.

再次,使用parallelism_hint = 1的代码
builder.setSpout("spout", new KafkaSpout(cfg), 1);

Again, used the code with parallelism_hint = 1
i.e. builder.setSpout("spout", new KafkaSpout(cfg), 1);

现在花了更多的~23 sec!为什么?

Now it took more ~23 sec! Why?

根据Storm 文档 setSpout()声明如下:

According to Storm Docs setSpout() declaration is as follows:

public SpoutDeclarer setSpout(java.lang.String id,
                              IRichSpout spout,
                              java.lang.Number parallelism_hint)

在哪里,
parallelism_hint -是为执行此喷口而应分配的任务数.每个任务都将在集群周围某个进程中的线程上运行.

where,
parallelism_hint - is the number of tasks that should be assigned to execute this spout. Each task will run on a thread in a process somewhere around the cluster.

推荐答案

我在风暴用户,其中讨论了类似的内容.

I had come across a discussion in storm-user which discuss something similar.

阅读 Spout并行度与kafka分区数之间的关系.

在使用kafka-spout进行风暴时要注意的两件事

  1. 您可以在KafkaSpout上获得的最大并行度是分区数.
  2. 我们可以将负载分为多个kafka主题,并为每个主题分别设置 spout实例. IE. 每个喷口处理一个单独的主题.
  1. The maximum parallelism you can have on a KafkaSpout is the number of partitions.
  2. We can split the load into multiple kafka topics and have separate spout instances for each. ie. each spout handling a separate topic.

因此,如果我们将每个主机的kafka分区配置为1且主机数为2,那么即使将spout并行度设置为10,所遵循的最大值也只会是2,即数量分区.

So if we have a case where kafka partitions per host is configured as 1 and the number of hosts is 2. Even if we set the spout parallelism as 10, the max value which is repected will only be 2 which is the number of partitions.

如何提及Kafka-spout中的分区数?

List<HostPort> hosts = new ArrayList<HostPort>();
hosts.add(new HostPort("localhost",9092));
SpoutConfig objConfig=new SpoutConfig(new KafkaConfig.StaticHosts(hosts, 4), "spoutCaliber", "/kafkastorm", "discovery");

如您所见,此处可以使用hosts.add添加代理,并且在new KafkaConfig.StaticHosts(hosts, 4)代码段中将分区号指定为 4 .

As you can see, here brokers can be added using hosts.add and the partion number is specified as 4 in the new KafkaConfig.StaticHosts(hosts, 4) code snippet.

如何在Kafka-spout中提及并行提示?

builder.setSpout("spout", spout,4);

使用setSpout方法将喷口添加到拓扑中时,您可以提及相同的内容.此处 4 是并行提示.

You can mention the same while adding your spout into the topology using setSpout method. Here 4 is the parallelism hint.

更多可能有用的链接

了解并行机制-a-Storm-topology

what-is-the-task-in-twitter-storm -parallelism

免责声明: !!我是Storm和Java的新手!!!因此,如果需要在某些位置,请 edit/add .

Disclaimer: !! i am new to both storm and java !!!! So pls edit/add if its required some where.

这篇关于Storm-Kafka多个喷嘴,如何分担负荷?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆