Kafka Log 压缩返回具有相同键的两条记录 [英] Kafka Log compaction returns two records with the same key

查看:23
本文介绍了Kafka Log 压缩返回具有相同键的两条记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 kafka 中对日志压缩有一个奇怪的行为.我创建了一个具有以下配置的主题:

I am having a strange behaviour with log compaction in kafka. I have created a topic with the following configuration:

kafka-topics --zookeeper ... \
--create --topic myTopic \
--partitions 12 \
--replication-factor 3 \
--config "min.insync.replicas=2" \
--config "cleanup.policy=compact" \
--config "delete.retention.ms=100" \
--config "retention.bytes=-1" \
--config "segment.ms=100" \
--config "min.cleanable.dirty.ratio=0.000001" \
--config "min.compaction.lag.ms=10"

我向主题发送具有相同密钥的消息,并且当 Compation 启动时,它返回该主题的最后两条消息.

I send messages with the same key to the topic, and when the compation launches, it returns the last two message of this topic.

示例:

Writting message with key="1" and value="A" into topic "myTopic"
Writting message with key="1" and value="B" into topic "myTopic"
Writting message with key="1" and value="C" into topic "myTopic"

压缩

从头开始打印'myTopic'

print 'myTopic' from beginning

{"ROWTIME":1549444994905,"ROWKEY":"1","value=B"}{"ROWTIME":1549444994905,"ROWKEY":"1","value=C"}

{"ROWTIME":1549444994905,"ROWKEY":"1","value=B"} {"ROWTIME":1549444994905,"ROWKEY":"1","value=C"}

Writting message with key="1" and value="D" into topic "myTopic"

压缩

从头开始打印'myTopic'

print 'myTopic' from beginning

{"ROWTIME":1549444994905,"ROWKEY":"1","value=C"}{"ROWTIME":1549444994905,"ROWKEY":"1","value=D"}

{"ROWTIME":1549444994905,"ROWKEY":"1","value=C"} {"ROWTIME":1549444994905,"ROWKEY":"1","value=D"}

与 segment.ms=60000 相同

The same with segment.ms=60000

有什么想法吗?

谢谢!!

推荐答案

compaction 之前 Kafka 确定了最低偏移位置,即不能参与 compaction (firstUncleanableDirtyOffset).

Before compaction Kafka determined the lowest offset position, that can't take a part in compaction (firstUncleanableDirtyOffset).

该位置是基于计算的:

  • 第一个不稳定偏移
  • 活动段偏移
  • min.compaction.lag.ms

在您的情况下 min.compaction.lag.ms 非常低,因此最低偏移位置(不能参与压缩)取自 Active 段.因为只有一条消息可以参与压缩(例如 key=1, value=C),所以没有什么可做的.

In your case min.compaction.lag.ms is very low, so the lowest offset position (that can't take part in compaction) is taken from Active segment. Because of that only one message can take part in compaction (ex. key=1, value=C), so there is nothing to do.

如果您使用其他密钥生成额外的消息,它应该为 key=1 压缩您的消息.

If you produce extra message with other key, it should compact your messages for key=1.

注意:您还必须了解 segment.bytes 属性,它决定了段的大小.如果消息与 segment.bytes 相比较小,则它们可能处于活动段中并且不会被压缩.

Notice: You also have to be aware of segment.bytes property, which determine segment's size. If messages are small compare to segment.bytes, they might be in Active segment and won't be compacted.

这篇关于Kafka Log 压缩返回具有相同键的两条记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆