Kafka Log 压缩返回具有相同键的两条记录 [英] Kafka Log compaction returns two records with the same key
问题描述
我在 kafka 中对日志压缩有一个奇怪的行为.我创建了一个具有以下配置的主题:
I am having a strange behaviour with log compaction in kafka. I have created a topic with the following configuration:
kafka-topics --zookeeper ... \
--create --topic myTopic \
--partitions 12 \
--replication-factor 3 \
--config "min.insync.replicas=2" \
--config "cleanup.policy=compact" \
--config "delete.retention.ms=100" \
--config "retention.bytes=-1" \
--config "segment.ms=100" \
--config "min.cleanable.dirty.ratio=0.000001" \
--config "min.compaction.lag.ms=10"
我向主题发送具有相同密钥的消息,并且当 Compation 启动时,它返回该主题的最后两条消息.
I send messages with the same key to the topic, and when the compation launches, it returns the last two message of this topic.
示例:
Writting message with key="1" and value="A" into topic "myTopic"
Writting message with key="1" and value="B" into topic "myTopic"
Writting message with key="1" and value="C" into topic "myTopic"
压缩
从头开始打印'myTopic'
print 'myTopic' from beginning
{"ROWTIME":1549444994905,"ROWKEY":"1","value=B"}{"ROWTIME":1549444994905,"ROWKEY":"1","value=C"}
{"ROWTIME":1549444994905,"ROWKEY":"1","value=B"} {"ROWTIME":1549444994905,"ROWKEY":"1","value=C"}
Writting message with key="1" and value="D" into topic "myTopic"
压缩
从头开始打印'myTopic'
print 'myTopic' from beginning
{"ROWTIME":1549444994905,"ROWKEY":"1","value=C"}{"ROWTIME":1549444994905,"ROWKEY":"1","value=D"}
{"ROWTIME":1549444994905,"ROWKEY":"1","value=C"} {"ROWTIME":1549444994905,"ROWKEY":"1","value=D"}
与 segment.ms=60000 相同
The same with segment.ms=60000
有什么想法吗?
谢谢!!
推荐答案
compaction 之前 Kafka 确定了最低偏移位置,即不能参与 compaction (firstUncleanableDirtyOffset
).
Before compaction Kafka determined the lowest offset position, that can't take a part in compaction (firstUncleanableDirtyOffset
).
该位置是基于计算的:
- 第一个不稳定偏移
- 活动段偏移
min.compaction.lag.ms
在您的情况下 min.compaction.lag.ms
非常低,因此最低偏移位置(不能参与压缩)取自 Active 段.因为只有一条消息可以参与压缩(例如 key=1, value=C),所以没有什么可做的.
In your case min.compaction.lag.ms
is very low, so the lowest offset position (that can't take part in compaction) is taken from Active segment. Because of that only one message can take part in compaction (ex. key=1, value=C), so there is nothing to do.
如果您使用其他密钥生成额外的消息,它应该为 key=1
压缩您的消息.
If you produce extra message with other key, it should compact your messages for key=1
.
注意:您还必须了解 segment.bytes
属性,它决定了段的大小.如果消息与 segment.bytes
相比较小,则它们可能处于活动段中并且不会被压缩.
Notice: You also have to be aware of segment.bytes
property, which determine segment's size. If messages are small compare to segment.bytes
, they might be in Active segment and won't be compacted.
这篇关于Kafka Log 压缩返回具有相同键的两条记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!