Kafka日志压缩返回具有相同键的两个记录 [英] Kafka Log compaction returns two records with the same key

查看:58
本文介绍了Kafka日志压缩返回具有相同键的两个记录的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在kafka中使用日志压缩有一个奇怪的行为.我创建了一个具有以下配置的主题:

I am having a strange behaviour with log compaction in kafka. I have created a topic with the following configuration:

kafka-topics --zookeeper ... \
--create --topic myTopic \
--partitions 12 \
--replication-factor 3 \
--config "min.insync.replicas=2" \
--config "cleanup.policy=compact" \
--config "delete.retention.ms=100" \
--config "retention.bytes=-1" \
--config "segment.ms=100" \
--config "min.cleanable.dirty.ratio=0.000001" \
--config "min.compaction.lag.ms=10"

我向主题发送具有相同键的消息,并且当组合启动时,它将返回该主题的最后两条消息.

I send messages with the same key to the topic, and when the compation launches, it returns the last two message of this topic.

示例:

Writting message with key="1" and value="A" into topic "myTopic"
Writting message with key="1" and value="B" into topic "myTopic"
Writting message with key="1" and value="C" into topic "myTopic"

COMPACTION

COMPACTION

从头开始打印'myTopic'

print 'myTopic' from beginning

{"ROWTIME":1549444994905,"ROWKEY":"1","value = B"}{"ROWTIME":1549444994905,"ROWKEY":"1","value = C"}

{"ROWTIME":1549444994905,"ROWKEY":"1","value=B"} {"ROWTIME":1549444994905,"ROWKEY":"1","value=C"}

Writting message with key="1" and value="D" into topic "myTopic"

COMPACTION

COMPACTION

从头开始打印'myTopic'

print 'myTopic' from beginning

{"ROWTIME":1549444994905,"ROWKEY":"1","value = C"}{"ROWTIME":1549444994905,"ROWKEY":"1","value = D"}

{"ROWTIME":1549444994905,"ROWKEY":"1","value=C"} {"ROWTIME":1549444994905,"ROWKEY":"1","value=D"}

与segment.ms = 60000相同

The same with segment.ms=60000

有什么主意吗?

谢谢!

推荐答案

在压缩之前,Kafka确定了最低偏移位置,该位置无法参与压缩( firstUncleanableDirtyOffset ).

Before compaction Kafka determined the lowest offset position, that can't take a part in compaction (firstUncleanableDirtyOffset).

该排名是基于以下计算得出的:

That position is calculate based:

  • 第一个不稳定偏移量
  • 活动细分的偏移量
  • min.compaction.lag.ms

在您的情况下, min.compaction.lag.ms 非常低,因此最低的偏移位置(不能参与压缩)是从Active段获取的.因为只有一条消息可以参与压缩(例如key = 1,value = C),所以没有任何事情要做.

In your case min.compaction.lag.ms is very low, so the lowest offset position (that can't take part in compaction) is taken from Active segment. Because of that only one message can take part in compaction (ex. key=1, value=C), so there is nothing to do.

如果您使用其他键产生了额外的消息,它将压缩您的消息,以用于 key = 1 .

If you produce extra message with other key, it should compact your messages for key=1.

注意:您还必须知道 segment.bytes 属性,该属性确定段的大小.如果邮件与 segment.bytes 相比较小,则它们可能在Active网段中,不会被压缩.

Notice: You also have to be aware of segment.bytes property, which determine segment's size. If messages are small compare to segment.bytes, they might be in Active segment and won't be compacted.

这篇关于Kafka日志压缩返回具有相同键的两个记录的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆