如何在Kafka中同时实现分布式处理和高可用性? [英] How to achieve distributed processing and high availability simultaneously in Kafka?

查看:167
本文介绍了如何在Kafka中同时实现分布式处理和高可用性?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含n个分区的主题.为了进行分布式处理,我创建了两个在不同计算机上运行的进程.他们使用相同的分组ID订阅该主题,并分配n/2个线程,每个线程处理一个流(每个进程n/2个分区).

I have a topic consisting of n partitions. To have distributed processing I create two processes running on different machines. They subscribe to the topic with same groupd id and allocate n/2 threads, each of which processes single stream(n/2 partitions per process).

这样,我将实现负载分配,但是现在,如果进程1崩溃,则进程2将无法使用分配给进程1的分区中的消息,因为它在开始时仅侦听n/2个流.

With this I will have achieved load distribution, but now if process 1 crashes, than process 2 cannot consume messages from partitions allocated to process 1, as it listened only on n/2 streams at the start.

否则,如果我为HA配置并在两个进程上启动n个线程/流,则当一个节点发生故障时,所有分区将由另一节点处理.但是在这里,我们已经破坏了分配,因为所有分区一次将由一个节点处理.

Or else, if I configure for HA and start n threads/streams on both processes, then when one node fails, all partitions will be processed by other node. But here, we have compromised distribution, as all partitions will be processed by a single node at a time.

有没有办法同时实现这两个目标?

Is there a way to achieve both simultaneously and how?

推荐答案

是的,请使用现有的流处理引擎. 暴风雨是一个不错的选择, Spark Samza ,取决于您的用例.

Yes, use an existing stream processing engine. Storm is a good choice, as are Spark and Samza, depends on your use case.

现在您可以自己动手,但是正如您已经发现的那样,管理失败的流程和高可用性非常棘手.一般来说,分布式处理充满了许多别人已经解决的细微问题.穿上你的鞋子,我会使用现有的软件来解决这个问题.

Now you could roll your own, but as you've already discovered, managing failing processes and high availability is tricky. Generally speaking, distributed processing is filled with lots of subtle problems that someone else has already solved. In your shoes I'd use existing software to deal with that problem.

这篇关于如何在Kafka中同时实现分布式处理和高可用性?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆