Apache Kafka 是否适合用作无序任务队列? [英] Is Apache Kafka appropriate for use as an unordered task queue?

查看:29
本文介绍了Apache Kafka 是否适合用作无序任务队列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Kafka 根据生产者分配的分区将传入的消息分成多个分区.来自分区的消息然后被不同消费者组中的消费者消费.

Kafka splits incoming messages up into partitions, according to the partition assigned by the producer. Messages from partitions then get consumed by consumers in different consumer groups.

这种架构让我对使用 Kafka 作为工作/任务队列持谨慎态度,因为我必须在生产时指定分区,这间接限制了哪些消费者可以对其进行工作,因为一个分区只发送给一个消费者消费群体.我宁愿不提前指定分区,以便任何可以执行该任务的消费者都可以这样做.有没有一种方法可以在 Kafka 架构中构建分区/生产者,其中任务可以由下一个可用的使用者拉取,而不必在工作产生时通过选择分区来提前拆分工作?

This architecture makes me wary of using Kafka as a work/task queue, because I have to specify the partition at time of production, which indirectly limits which consumers can work on it because a partition is sent to only one consumer in a consumer group. I would rather not specify the partition ahead of time, so that whichever consumer is available to take that task can do so. Is there a way to structure partitions/producers in a Kafka architecture where tasks can be pulled by the next available consumer, without having to split up work ahead of time by choosing a partition when the work is produced?

这个主题只使用一个分区会将所有任务放在同一个队列中,但是每个消费者组的消费者数量限制为1个,因此每个消费者必须在不同的组中.然而,所有的任务都会分配给每个消费者组,这不是我正在寻找的那种工作队列.

Using only one partition for this topic would put all the tasks in the same queue, but then the number of consumers is limited to 1 per consumer group, so each consumer would have to be in a different group. Then all of the task get distributed to each consumer group, though, which is not the kind of work queue I'm looking for.

Apache Kafka 是否适合用作任务队列?

Is Apache Kafka appropriate for use as a task queue?

推荐答案

将 Kafka 用于任务队列是一个坏主意.改用RabbitMQ,它做得更好,更优雅.

Using Kafka for a task queue is a bad idea. Use RabbitMQ instead, it does it much better and more elegantly.

尽管您可以将 Kafka 用于任务队列 - 但您会遇到一些问题:Kafka 不允许多个消费者(按设计)使用单个分区,因此,例如,如果单个分区被许多任务填满并且拥有该分区的消费者很忙,则该分区中的任务将变得饥饿".这也意味着主题中任务的消费顺序将与生成任务的顺序不同,如果需要以特定顺序消费任务,这可能会导致严重问题(在 Kafka 中要完全实现您必须只有一个消费者和一个分区——这意味着只有一个节点串行消费.如果你有多个消费者和多个分区,那么在主题级别将无法保证任务消费的顺序).

Although you can use Kafka for a task queue - you will get some issues: Kafka is not allowing to consume a single partition by many consumers (by design), so if for example a single partition gets filled with many tasks and the consumer who owns the partition is busy, the tasks in that partition will get "starvation". This also means that the order of consumption of tasks in the topic will not be identical to the order which the tasks were produced which might cause serious problems if the tasks needs to be consumed in a specific order (in Kafka to fully achieve that you must have only one consumer and one partition - which means serial consumption by just one node. If you have multiple consumers and multiple partitions the order of tasks consumption will not be guaranteed in the topic level).

事实上 - Kafka 主题不是计算机科学方式中的队列.队列的意思是先进先出——这不是你在 Kafka 中得到的主题级别.

In fact - Kafka topics are not queues in the computer science manner. Queue means First in First out - this is not what you get in Kafka in the topic level.

另一个问题是很难动态改变分区的数量.添加或删除新工人应该是动态的.如果您想确保新工作人员将在 Kakfa 中获得任务,您必须将分区编号设置为最大可能工作人员.这不够优雅.

Another issue is that it is difficult to change the number of partitions dynamically. Adding or removing new workers should be dynamic. If you want to ensure that the new workers will get tasks in Kakfa you will have to set the partition number to the maximum possible workers. This is not elegant enough.

所以最重要的是 - 改用 RabbitMQ 或其他队列.

So the bottom line - use RabbitMQ or other queues instead.

说了这么多 - Samza(通过linkedin)正在使用kafka作为某种基于流的任务队列:Samza

Having said all of that - Samza (by linkedin) is using kafka as some sort of streaming based task queue: Samza

规模考虑:我忘了说 Kakfa 是一个大数据/大规模工具.如果你的工作量很大,那么 Kafka 可能是你不错的选择,尽管我之前写过,因为处理巨大的规模是非常具有挑战性的,而 Kafka 在这方面做得很好.如果我们谈论的是较小的规模(例如,每秒最多几笔/数百个作业),那么与 RabbitMQ 相比,Kafka 又是一个糟糕的选择.

scale considerations: I forgot to mention that Kakfa is a big data/big scale tool. If your job rate is huge then Kafka might be good option for you despite the things I wrote earlier, since dealing with huge scale is very challenging and Kafka is very good in doing that. If we are talking about smaller scales (say, up to few dosens/hundreds of jobs per second) then again Kafka is a poor choice compared to RabbitMQ.

这篇关于Apache Kafka 是否适合用作无序任务队列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆