卡夫卡有重复的消息 [英] Kafka having duplicate messages
问题描述
我在生成或使用数据时没有看到任何失败,但是在生产中存在大量重复消息.对于一个收到大约 100k 条消息的小主题,有大约 4k 重复,尽管就像我说的没有失败,最重要的是没有实现重试逻辑或设置配置值.
I don't see any failure while producing or consuming the data however there are bunch of duplicate messages in production. For a small topic which gets around 100k messages, there are ~4k duplicates though like I said there is no failure and on top of that there is no retry logic implemented or config value is set.
我还检查了那些重复消息的偏移值,每个消息都有不同的值,告诉我问题出在生产者身上.
I also check offset values for those duplicate messages and each has distinct values which tells me that the issue is in producer.
任何帮助将不胜感激
推荐答案
阅读有关 kafka 中消息传递的更多信息:
Read more about message delivery in kafka:
https://kafka.apache.org/08/design.html#semantics
Kafka 如此有效地保证了默认至少一次交付,并且允许用户通过禁用最多实现一次交付在处理之前重试生产者并提交其抵消一批消息.Exactly once 交付需要与目标存储系统,但 Kafka 提供了偏移量使实施变得简单.
So effectively Kafka guarantees at-least-once delivery by default and allows the user to implement at most once delivery by disabling retries on the producer and committing its offset prior to processing a batch of messages. Exactly-once delivery requires co-operation with the destination storage system but Kafka provides the offset which makes implementing this straight-forward.
可能您正在寻找 jms 中的恰好一次交付"
Probably you are looking for "exactly once delivery" like in jms
https://cwiki.apache.org/confluence/display/KAFKA/FAQ#FAQ-HowdoIgetexactly-oncemessagingfromKafka?
有两种方法可以在数据期间获得恰好一次语义生产: 1. 每个分区使用一个单一的写入器,每次你得到一个网络错误检查该分区中的最后一条消息,看看您的上次写入成功 2. 包含主键(UUID 或其他)消息并在消费者上进行重复数据删除.
There are two approaches to getting exactly once semantics during data production: 1. Use a single-writer per partition and every time you get a network error check the last message in that partition to see if your last write succeeded 2. Include a primary key (UUID or something) in the message and deduplicate on the consumer.
我们在系统中实施了第二点.
We implemented second point in our systems.
这篇关于卡夫卡有重复的消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!