日志压缩以确保每个键只保留一条消息 [英] Log compaction to keep exactly one message per key

查看:13
本文介绍了日志压缩以确保每个键只保留一条消息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个主题,其中包含唯一键及其对应的最新值.因此,当将带有现有键的消息插入主题时,旧消息将被删除.

I want to create a topic which contains unique keys along with their corresponding most recent values. So when a message with an existing key is inserted to the topic, the old message is removed.

为此,我在 server.properties 文件中配置了以下参数:

To do so, I have configured the following parameters in server.properties file:

log.cleaner.enable=true
log.cleanup.policy=compact

# The minimum age of a log file to be eligible for deletion due to age
log.retention.minutes=3

log.retention.bytes=10737418

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=60000

# The maximum time before a new log segment is rolled out (in milliseconds).
# If not set, the value in log.roll.hours is used
log.roll.ms=600000

因此应每 3 分钟进行一次压实.为了测试压缩策略,我创建了一个主题 retention_test

So that compaction should take place every 3 minutes. In order to test compaction policy I have created a topic retention_test

kafka-topics --zookeeper localhost:2181 --create --topic retention_test --replication-factor 1 --partitions 1

并使用控制台消费者,kafka-console-producer --broker-list localhost:9092 --topic reserved_test --property parse.key=true --property key.separator=: 我产生了以下消息:

and using the console consumer, kafka-console-producer --broker-list localhost:9092 --topic retention_test --property parse.key=true --property key.separator=: I have produced the following messages:

>1:first
>2:second
>3:third

控制台消费者 kafka-console-consumer --bootstrap-server localhost:9092 --topic reserved_test --from-beginning 成功消费它们;

where the console consumer kafka-console-consumer --bootstrap-server localhost:9092 --topic retention_test --from-beginning consumes them successfully;

first
second
third

现在,当我尝试使用已添加的键插入消息时,旧消息似乎没有被忽略并保留在主题中:

Now when I try to insert a message with a key which has already been added, the older message doesn't seem to be ignored and remains in the topic:

在生产者方面:

>1:updatedFirst

请注意,为了测试行为,在 3 分钟的保留期过后很久,我已多次重新启动使用者.输出是

Note that in order to test the behavior, I have restarted the consumer multiple times, long after the retention period of 3 minutes has passed. The output is

first
second
third
updatedFirst

所需的输出应该是

second
third
updatedFirst

因为 firstupdatedFirst 具有相同的键.

since first and updatedFirst have the same key.

根据文档:

日志压缩为我们提供了更细粒度的保留机制,以便我们保证至少为每个更新保留最后一次更新主键

Log compaction gives us a more granular retention mechanism so that we are guaranteed to retain at least the last update for each primary key

是否可以为每个键恰好保留一条消息(最近一条)而不是至少一条消息(包括最近一条)?

Is it possible to keep exactly one message (the most recent one) per key instead of at least one message (including the most recent one)?

推荐答案

我认为这通常是不可能的.Kafka 为每个主题的每个分区按段存储消息.每个段都是一个文件,它们只会被附加到(或作为一个整体删除).压缩只能通过重写现有的段文件来跳过那些具有相同密钥的后续消息的消息.但是,头段(当前新消息被附加到的那个段)不会被压缩(直到创建一个成为头段的新段).

I'd say it's not generally possible. Kafka stores messages in segments for each partition of each topic. Each segment is a file and they are only ever appended to (or deleted as a whole). Compaction only works by re-writing the existing segment files skipping the messages for which there are later messages with the same key. However, the head segment (the one to which currently new messages are being appended) is not compacted (until a new segment is created which becomes the head segment).

log.cleanup.policy=compact 时通过log.retention config 配置的3 分钟不生效,只有在log.cleanup 时有效.policy=删除

The 3 minutes you configured via log.retention config is not in play when log.cleanup.policy=compact, it's only effective when log.cleanup.policy=delete

为什么给定键只有一条消息很重要?如果您提供有关您的用例的更多信息,也许可以建议另一种方法.

Why is having exactly one message for a given key important? Perhaps an alternative approach can be suggested if you provide more info about your use case.

这篇关于日志压缩以确保每个键只保留一条消息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆