Kafka日志压缩返回具有相同键的两个记录 [英] Kafka Log compaction returns two records with the same key
问题描述
我在kafka中使用日志压缩有一个奇怪的行为.我创建了一个具有以下配置的主题:
I am having a strange behaviour with log compaction in kafka. I have created a topic with the following configuration:
kafka-topics --zookeeper ... \
--create --topic myTopic \
--partitions 12 \
--replication-factor 3 \
--config "min.insync.replicas=2" \
--config "cleanup.policy=compact" \
--config "delete.retention.ms=100" \
--config "retention.bytes=-1" \
--config "segment.ms=100" \
--config "min.cleanable.dirty.ratio=0.000001" \
--config "min.compaction.lag.ms=10"
我向主题发送具有相同键的消息,并且当组合启动时,它将返回该主题的最后两条消息.
I send messages with the same key to the topic, and when the compation launches, it returns the last two message of this topic.
示例:
Writting message with key="1" and value="A" into topic "myTopic"
Writting message with key="1" and value="B" into topic "myTopic"
Writting message with key="1" and value="C" into topic "myTopic"
COMPACTION
COMPACTION
从头开始打印'myTopic'
print 'myTopic' from beginning
{"ROWTIME":1549444994905,"ROWKEY":"1","value = B"}{"ROWTIME":1549444994905,"ROWKEY":"1","value = C"}
{"ROWTIME":1549444994905,"ROWKEY":"1","value=B"} {"ROWTIME":1549444994905,"ROWKEY":"1","value=C"}
Writting message with key="1" and value="D" into topic "myTopic"
COMPACTION
COMPACTION
从头开始打印'myTopic'
print 'myTopic' from beginning
{"ROWTIME":1549444994905,"ROWKEY":"1","value = C"}{"ROWTIME":1549444994905,"ROWKEY":"1","value = D"}
{"ROWTIME":1549444994905,"ROWKEY":"1","value=C"} {"ROWTIME":1549444994905,"ROWKEY":"1","value=D"}
与segment.ms = 60000相同
The same with segment.ms=60000
有什么主意吗?
谢谢!
推荐答案
在压缩之前,Kafka确定了最低偏移位置,该位置无法参与压缩( firstUncleanableDirtyOffset
).
Before compaction Kafka determined the lowest offset position, that can't take a part in compaction (firstUncleanableDirtyOffset
).
该排名是基于以下计算得出的:
That position is calculate based:
- 第一个不稳定偏移量
- 活动细分的偏移量
-
min.compaction.lag.ms
在您的情况下, min.compaction.lag.ms
非常低,因此最低的偏移位置(不能参与压缩)是从Active段获取的.因为只有一条消息可以参与压缩(例如key = 1,value = C),所以没有任何事情要做.
In your case min.compaction.lag.ms
is very low, so the lowest offset position (that can't take part in compaction) is taken from Active segment. Because of that only one message can take part in compaction (ex. key=1, value=C), so there is nothing to do.
如果您使用其他键产生了额外的消息,它将压缩您的消息,以用于 key = 1
.
If you produce extra message with other key, it should compact your messages for key=1
.
注意:您还必须知道 segment.bytes
属性,该属性确定段的大小.如果邮件与 segment.bytes
相比较小,则它们可能在Active网段中,不会被压缩.
Notice: You also have to be aware of segment.bytes
property, which determine segment's size. If messages are small compare to segment.bytes
, they might be in Active segment and won't be compacted.
这篇关于Kafka日志压缩返回具有相同键的两个记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!