Kafka如何同时实现分布式处理和高可用? [英] How to achieve distributed processing and high availability simultaneously in Kafka?

查看:25
本文介绍了Kafka如何同时实现分布式处理和高可用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由 n 个分区组成的主题.为了进行分布式处理,我创建了两个在不同机器上运行的进程.他们订阅具有相同分组 id 的主题并分配 n/2 个线程,每个线程处理单个流(每个进程 n/2 个分区).

I have a topic consisting of n partitions. To have distributed processing I create two processes running on different machines. They subscribe to the topic with same groupd id and allocate n/2 threads, each of which processes single stream(n/2 partitions per process).

有了这个,我将实现负载分配,但现在如果进程 1 崩溃,那么进程 2 无法使用来自分配给进程 1 的分区的消息,因为它在开始时只侦听 n/2 个流.

With this I will have achieved load distribution, but now if process 1 crashes, than process 2 cannot consume messages from partitions allocated to process 1, as it listened only on n/2 streams at the start.

否则,如果我为 HA 配置并在两个进程上启动 n 个线程/流,那么当一个节点出现故障时,所有分区都将由其他节点处理.但在这里,我们妥协了分布,因为所有分区将一次由一个节点处理.

Or else, if I configure for HA and start n threads/streams on both processes, then when one node fails, all partitions will be processed by other node. But here, we have compromised distribution, as all partitions will be processed by a single node at a time.

有没有办法同时实现?如何实现?

Is there a way to achieve both simultaneously and how?

推荐答案

是的,使用现有的流处理引擎.Storm 是一个不错的选择,SparkSamza,取决于您的用例.

Yes, use an existing stream processing engine. Storm is a good choice, as are Spark and Samza, depends on your use case.

现在您可以推出自己的产品,但正如您已经发现的那样,管理失败的流程和高可用性很棘手.一般来说,分布式处理充满了许多其他人已经解决的微妙问题.在你看来,我会使用现有的软件来解决这个问题.

Now you could roll your own, but as you've already discovered, managing failing processes and high availability is tricky. Generally speaking, distributed processing is filled with lots of subtle problems that someone else has already solved. In your shoes I'd use existing software to deal with that problem.

这篇关于Kafka如何同时实现分布式处理和高可用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆