Kafka 多个生产者写入同一主题 - 消息和数据突发的排序 [英] Kafka multiple producer writing to same topic - Ordering of message and data burst

查看:45
本文介绍了Kafka 多个生产者写入同一主题 - 消息和数据突发的排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试了解 Kafka.假设我有多个制作人,每个人都在写同一个主题.(因为设计不能有更多的话题,我们用avro进行序列化)因为我们的消息太大了,我们需要把它分成小部分发送给kafka.

I am trying to learn about Kafka. Say I have multiple producer each writing to same Topic. (Cant have more topics because of design and we using avro for serialization) Because our message is too big, we need to divide it into small parts and we send it to kafka.

在这种情况下 - 来自不同生产者的消息可以混合吗?我怎样才能避免这种情况?任何想法

In this scenario - Can mesages from different Producers can intermix ? How can I avoid this scenario ? any ideas

另外,我们有大量的数据突发,比如2分钟内会有1000条消息,接下来5到7分钟的消息很少,在这种情况下我们能做什么

Also , we have huge databurst, like there will be 1000s of message in 2 minutes, then very few message for next 5 to 7 minutes, what can we do in such scenarios

推荐答案

因为我们的消息太大了,我们需要把它分成小部分发送给kafka.

Because our message is too big, we need to divide it into small parts and we send it to kafka.

你呢?您运行了基本测试并遇到了一些问题?您是否尝试过调整缓冲区?我很确定 kafka 可以毫不费力地处理相对较大的消息(数十兆字节).事实上,与大量的小消息相比,您可能会获得更好的吞吐量.

Do you? You runned basic tests and had some issues? Have you tried to adjust buffers? I'm pretty sure that kafka can handle relatively big messages (tens of megabytes) without much hassle. In fact, you will likely get a better throughput, comparing to the huge number of tiny messages.

另外,我们有大量的数据突发,比如2分钟内会有1000条消息,接下来5到7分钟的消息很少,在这种情况下我们能做什么

Also , we have huge databurst, like there will be 1000s of message in 2 minutes, then very few message for next 5 to 7 minutes, what can we do in such scenarios

现代服务器机器上的单个 kafka 代理可以轻松处理约 20k-40k 消息/每秒(批量处理 1000 条消息,每条 2k 大小,同步模式).我看不出有什么问题.

Single kafka broker on a modern server machine can easily handle ~20k-40k messages/per second (batched of 1000 messages, each 2k size, sync mode). I don't see a problem there.

来自不同生产者的消息可以混合吗?

Can messages from different Producers can intermix ?

是的,它们可能会混合在一起.事实上,kafka 协议并没有很好地定义这个时刻,特定的实现可能会改变这种行为,因此对于稳定的解决方案,请查看以下部分.

Yes, they might intermix. In fact, this moment is not well defined by the kafka protocol and particular implementation may change this behaviour, so for stable solution take a look at the below section.

我怎样才能避免这种情况?

How can I avoid this scenario?

Kafka 有一个 partitions 的概念:每个 topic 默认有 1 个 partition,每个 partition 可以认为是一个并行单元.设置适当的分区程序,以便每个生产者以隔离的方式写入自己的分区.

Kafka has a concept of partitions: each topic by default has 1 partition, each partition can be thought as a unit of parallelism. Setup appropriate partitioner, such that each producer writes to it's own partition, in isolated manner.

这篇关于Kafka 多个生产者写入同一主题 - 消息和数据突发的排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆