Kafka工作队列具有动态数量的并行使用者 [英] Kafka work queue with a dynamic number of parallel consumers

查看:154
本文介绍了Kafka工作队列具有动态数量的并行使用者的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用Kafka来划分工作".我想将某个工作实例发布到某个主题,并运行一组由相同使用者组成的云来处理它们.每个消费者完成工作后,都会从该主题中提取下一个工作.每件作品只能由一位消费者处理一次.处理工作非常昂贵,因此我将需要许多在许多计算机上运行的消费者来跟上工作.我希望消费者的数量根据需要增加和减少(我打算为此使用Kubernetes).

I want to use Kafka to "divide the work". I want to publish instances of work to a topic, and run a cloud of identical consumers to process them. As each consumer finishes its work, it will pluck the next work from the topic. Each work should only be processed once by one consumer. Processing work is expensive, so I will need many consumers running on many machines to keep up. I want the number of consumers to grow and shrink as needed (I plan to use Kubernetes for this).

我发现了一种模式,其中为每个使用者创建了一个唯一的分区.这划分了工作",但是在创建主题时设置了分区数.此外,必须在命令行上创建主题,例如

I found a pattern where a unique partition is created for each consumer. This "divides the work", but the number of partitions is set when the topic is created. Furthermore, the topic must be created on the command line e.g.

bin/kafka-topics.sh --zookeeper localhost:2181 --partitions 3 --topic divide-topic --create --replication-factor 1

...

for n in range(0,3):
    consumer = KafkaConsumer(
                     bootstrap_servers=['localhost:9092'])
    partition = TopicPartition('divide-topic',n)
    consumer.assign([partition])
    ...

我可以为每个消费者创建一个唯一的主题,并编写自己的代码以将工作分配给这些主题.这似乎很麻烦,我仍然必须通过命令行创建主题.

I could create a unique topic for each consumer, and write my own code to assign work to those topic. That seems gross, and I still have to create topics via the command line.

具有动态数量的并行使用者的工作队列是一种常见的体系结构.我不能成为第一个需要这个的人.用Kafka正确的方法是什么?

A work queue with a dynamic number of parallel consumers is a common architecture. I can't be the first to need this. What is the right way to do it with Kafka?

推荐答案

您找到的模式是准确的.请注意,也可以使用也可以添加分区一旦创建了主题(有一些陷阱).

The pattern you found is accurate. Note that topics can also be created using the Kafka Admin API and partitions can also be added once a topic has been created (with some gotchas).

在Kafka中,划分工作并允许扩展的方法是使用分区.这是因为在使用者组中,每个分区都可以随时由一个使用者使用.

In Kafka, the way to divide work and allow scaling is to use partitions. This is because in a consumer group, each partition is consumed by a single consumer at any time.

例如,您可以拥有一个包含50个分区的主题,并且订阅了该主题的使用者组:

For example, you can have a topic with 50 partitions and a consumer group subscribed to this topic:

  • 当吞吐量低时,组中只有几个使用者,他们应该能够处理流量.

  • When the throughput is low, you can have only a few consumers in the group and they should be able to handle the traffic.

当吞吐量增加时,您可以添加使用者(最多分区数)(在此示例中为50)来承担一些工作.

When the throughput increases, you can add consumers, up to the number of partitions (50 in this example), to pick up some of the work.

在这种情况下,扩展规模限制为50个使用者.消费者可以了解许多指标(例如滞后),使您可以随时决定是否有足够的指标

In this scenario, 50 consumers is the limit in terms of scaling. Consumers expose a number of metrics (like lag) allowing you to decide if you have enough of them at any time

这篇关于Kafka工作队列具有动态数量的并行使用者的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆