日志压缩,每个键仅保留一条消息 [英] Log compaction to keep exactly one message per key

查看:72
本文介绍了日志压缩,每个键仅保留一条消息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个包含唯一键及其对应的最新值的主题.因此,当将具有现有键的消息插入主题时,将删除旧消息.

I want to create a topic which contains unique keys along with their corresponding most recent values. So when a message with an existing key is inserted to the topic, the old message is removed.

为此,我已经在 server.properties 文件中配置了以下参数:

To do so, I have configured the following parameters in server.properties file:

log.cleaner.enable=true
log.cleanup.policy=compact

# The minimum age of a log file to be eligible for deletion due to age
log.retention.minutes=3

log.retention.bytes=10737418

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=60000

# The maximum time before a new log segment is rolled out (in milliseconds).
# If not set, the value in log.roll.hours is used
log.roll.ms=600000

因此,压实应每3分钟进行一次.为了测试压缩策略,我创建了一个主题 retention_test

So that compaction should take place every 3 minutes. In order to test compaction policy I have created a topic retention_test

kafka-topics --zookeeper localhost:2181 --create --topic retention_test --replication-factor 1 --partitions 1

并使用控制台使用者 kafka-console-producer --broker-list localhost:9092 --topictention_test --property parse.key = true --property key.separator =:产生了以下消息:

and using the console consumer, kafka-console-producer --broker-list localhost:9092 --topic retention_test --property parse.key=true --property key.separator=: I have produced the following messages:

>1:first
>2:second
>3:third

控制台使用者 kafka-console-consumer --bootstrap-server localhost:9092 --topictention_test --from-beginning 成功使用它们的地方;

where the console consumer kafka-console-consumer --bootstrap-server localhost:9092 --topic retention_test --from-beginning consumes them successfully;

first
second
third

现在,当我尝试插入带有已添加密钥的消息时,似乎不会忽略较旧的消息,而是保留在主题中:

Now when I try to insert a message with a key which has already been added, the older message doesn't seem to be ignored and remains in the topic:

在制作人方面:

>1:updatedFirst

请注意,为了测试该行为,在经过3分钟的保留期很长时间之后,我多次重启了使用者.输出是

Note that in order to test the behavior, I have restarted the consumer multiple times, long after the retention period of 3 minutes has passed. The output is

first
second
third
updatedFirst

所需的输出应该是

second
third
updatedFirst

因为 first updatedFirst 具有相同的密钥.

since first and updatedFirst have the same key.

根据文档:

日志压缩为我们提供了更精细的保留机制,因此我们可以保证至少保留最后一次更新主键

是否可以每个键保持恰好一条消息(最新消息),而不是至少一条消息(包括最新消息)?

Is it possible to keep exactly one message (the most recent one) per key instead of at least one message (including the most recent one)?

推荐答案

我会说这通常是不可能的.Kafka将消息存储在每个主题的每个分区的分段中.每个段都是一个文件,它们只能附加到(或整体删除).压缩只能通过重写现有段文件来工作,而跳过后面的消息,而这些消息以后具有相同的键.但是,未压缩头段(当前新消息将附加到该头段)(直到创建了一个新段,成为头段).

I'd say it's not generally possible. Kafka stores messages in segments for each partition of each topic. Each segment is a file and they are only ever appended to (or deleted as a whole). Compaction only works by re-writing the existing segment files skipping the messages for which there are later messages with the same key. However, the head segment (the one to which currently new messages are being appended) is not compacted (until a new segment is created which becomes the head segment).

通过 log.retention 配置配置的3分钟在 log.cleanup.policy = compact 时不起作用,仅当 log.cleanup时有效.policy =删除

The 3 minutes you configured via log.retention config is not in play when log.cleanup.policy=compact, it's only effective when log.cleanup.policy=delete

为什么给定密钥只有一条消息很重要?如果您提供有关用例的更多信息,也许可以建议使用另一种方法.

Why is having exactly one message for a given key important? Perhaps an alternative approach can be suggested if you provide more info about your use case.

这篇关于日志压缩,每个键仅保留一条消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆