Apache Kafka是否适合用作无序任务队列? [英] Is Apache Kafka appropriate for use as an unordered task queue?

查看:126
本文介绍了Apache Kafka是否适合用作无序任务队列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Kafka根据生产者分配的分区将进入的消息划分为多个分区.然后,来自分区的消息将被不同使用者组中的使用者所使用.

Kafka splits incoming messages up into partitions, according to the partition assigned by the producer. Messages from partitions then get consumed by consumers in different consumer groups.

此体系结构使我对将Kafka用作工作/任务队列感到很警惕,因为我必须在生产时指定分区,这间接限制了哪些使用者可以在其上工作,因为一个分区仅发送给一个用户.消费群体.我宁愿不提前指定分区,以便任何有能力执行此任务的使用者都可以这样做.是否有一种在Kafka体系结构中构造分区/生产者的方法,该任务可以由下一个可用的消费者来拉任务,而不必在生产工作时通过选择分区来提前拆分工作?

This architecture makes me wary of using Kafka as a work/task queue, because I have to specify the partition at time of production, which indirectly limits which consumers can work on it because a partition is sent to only one consumer in a consumer group. I would rather not specify the partition ahead of time, so that whichever consumer is available to take that task can do so. Is there a way to structure partitions/producers in a Kafka architecture where tasks can be pulled by the next available consumer, without having to split up work ahead of time by choosing a partition when the work is produced?

对于该主题仅使用一个分区会将所有任务放在同一队列中,但是每个使用者组的使用者数量限制为1,因此每个使用者必须位于不同的组中.然后,所有任务都分配给每个消费者组,这不是我要寻找的工作队列.

Using only one partition for this topic would put all the tasks in the same queue, but then the number of consumers is limited to 1 per consumer group, so each consumer would have to be in a different group. Then all of the task get distributed to each consumer group, though, which is not the kind of work queue I'm looking for.

Apache Kafka是否适合用作任务队列?

Is Apache Kafka appropriate for use as a task queue?

推荐答案

将Kafka用于任务队列是一个坏主意. 而是使用RabbitMQ,它会做得更好,更优雅.

Using Kafka for a task queue is a bad idea. Use RabbitMQ instead, it does it much better and more elegantly.

尽管您可以将Kafka用于任务队列-您会遇到一些问题: Kafka不允许许多使用者(设计使然)使用一个分区,因此,例如,如果一个分区充满了许多任务,而拥有该分区的使用者很忙,则该分区中的任务将变得饥饿". 这也意味着,主题中任务的使用顺序将与任务的产生顺序不同,如果需要按特定顺序使用任务,则任务产生的顺序可能会导致严重问题(在Kafka中,要完全实现您必须只有一个使用者和一个分区-这意味着仅一个节点即可进行串行消耗.如果您有多个使用者和多个分区,则在主题级别将无法保证任务的使用顺序.

Although you can use Kafka for a task queue - you will get some issues: Kafka is not allowing to consume a single partition by many consumers (by design), so if for example a single partition gets filled with many tasks and the consumer who owns the partition is busy, the tasks in that partition will get "starvation". This also means that the order of consumption of tasks in the topic will not be identical to the order which the tasks were produced which might cause serious problems if the tasks needs to be consumed in a specific order (in Kafka to fully achieve that you must have only one consumer and one partition - which means serial consumption by just one node. If you have multiple consumers and multiple partitions the order of tasks consumption will not be guaranteed in the topic level).

实际上-Kafka主题不是计算机科学领域的队列.队列意味着先进先出-这不是您在主题级别的Kafka中获得的.

In fact - Kafka topics are not queues in the computer science manner. Queue means First in First out - this is not what you get in Kafka in the topic level.

另一个问题是,很难动态更改分区数.添加或删除新工作人员应该是动态的.如果要确保新工作人员将在Kakfa中获得任务,则必须将分区号设置为最大可能的工作人员.这还不够优雅.

Another issue is that it is difficult to change the number of partitions dynamically. Adding or removing new workers should be dynamic. If you want to ensure that the new workers will get tasks in Kakfa you will have to set the partition number to the maximum possible workers. This is not elegant enough.

因此,最重要的是-使用RabbitMQ或其他队列代替.

So the bottom line - use RabbitMQ or other queues instead.

话虽如此,Samza(通过linkedin)正在将kafka用作某种基于流的任务队列: Samza

Having said all of that - Samza (by linkedin) is using kafka as some sort of streaming based task queue: Samza

规模方面的考虑:我忘了提到Kakfa是大数据/大规模工具.如果您的工作率很高,那么尽管我写过很多东西,Kafka可能还是您的最佳选择,因为处理大规模业务非常具有挑战性,而Kafka则非常擅长.如果我们正在谈论较小的规模(例如,每秒最多几个剂量/几百个工作),那么与RabbitMQ相比,Kafka仍然是一个糟糕的选择.

scale considerations: I forgot to mention that Kakfa is a big data/big scale tool. If your job rate is huge then Kafka might be good option for you despite the things I wrote earlier, since dealing with huge scale is very challenging and Kafka is very good in doing that. If we are talking about smaller scales (say, up to few dosens/hundreds of jobs per second) then again Kafka is a poor choice compared to RabbitMQ.

这篇关于Apache Kafka是否适合用作无序任务队列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆