Kafka多个生产者写相同的主题-消息和数据突发的顺序 [英] Kafka multiple producer writing to same topic - Ordering of message and data burst

查看:271
本文介绍了Kafka多个生产者写相同的主题-消息和数据突发的顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试了解Kafka.假设我有多个制作人,每个制作人都在写同一个主题.(由于设计的原因,无法提供更多的主题,因此我们使用avro进行序列化.)由于我们的消息太大,我们需要将其分成小部分,然后将其发送给kafka.

I am trying to learn about Kafka. Say I have multiple producer each writing to same Topic. (Cant have more topics because of design and we using avro for serialization) Because our message is too big, we need to divide it into small parts and we send it to kafka.

在这种情况下-来自不同生产者的消息可以混合吗?如何避免这种情况?任何想法

In this scenario - Can mesages from different Producers can intermix ? How can I avoid this scenario ? any ideas

此外,我们拥有巨大的数据量,例如在2分钟内将有1000条消息,然后在接下来的5至7分钟内只有很少的消息,在这种情况下我们该怎么办

Also , we have huge databurst, like there will be 1000s of message in 2 minutes, then very few message for next 5 to 7 minutes, what can we do in such scenarios

推荐答案

由于我们的信息太大,我们需要将其分成小部分,然后将其发送给kafka.

Because our message is too big, we need to divide it into small parts and we send it to kafka.

你呢?您进行了基本测试,遇到了一些问题?您是否尝试过调整缓冲区?我很确定kafka可以处理相对较大的消息(数十兆字节)而没有太多麻烦.实际上,与数量众多的小消息相比,您可能会获得更好的吞吐量.

Do you? You runned basic tests and had some issues? Have you tried to adjust buffers? I'm pretty sure that kafka can handle relatively big messages (tens of megabytes) without much hassle. In fact, you will likely get a better throughput, comparing to the huge number of tiny messages.

此外,我们拥有巨大的数据量,例如在2分钟内将有1000条消息,然后在接下来的5至7分钟内只有很少的消息,在这种情况下我们该怎么办

Also , we have huge databurst, like there will be 1000s of message in 2 minutes, then very few message for next 5 to 7 minutes, what can we do in such scenarios

现代服务器计算机上的单个kafka代理可以轻松地每秒处理约20k至40k消息(每秒)(分批处理1000条消息,每条2k大小,同步模式).我在那里没有发现问题.

Single kafka broker on a modern server machine can easily handle ~20k-40k messages/per second (batched of 1000 messages, each 2k size, sync mode). I don't see a problem there.

来自不同生产者的消息可以混合吗?

Can messages from different Producers can intermix ?

是的,它们可能会混合在一起.实际上,kafka协议并没有很好地定义这一时刻,特定的实现方式可能会更改此行为,因此,要获得稳定的解决方案,请查看以下部分.

Yes, they might intermix. In fact, this moment is not well defined by the kafka protocol and particular implementation may change this behaviour, so for stable solution take a look at the below section.

如何避免这种情况?

How can I avoid this scenario?

Kafka有一个分区概念:默认情况下,每个主题都有1个分区,每个分区都可以看作是并行性的单位.设置适当的分区程序,以便每个生产者以隔离的方式写入其自己的分区.

Kafka has a concept of partitions: each topic by default has 1 partition, each partition can be thought as a unit of parallelism. Setup appropriate partitioner, such that each producer writes to it's own partition, in isolated manner.

这篇关于Kafka多个生产者写相同的主题-消息和数据突发的顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆