kafka __consumer_offsets 主题日志的大小迅速增长,减少了磁盘空间 [英] kafka __consumer_offsets topic logs rapidly growing in size reducing disk space
问题描述
我发现 __consumer_offsets
主题日志的大小正在迅速增长,经过进一步研究后发现了数量最多的主题.我更改了这些主题的保留策略以阻止增长速度,但希望增加磁盘空间并删除 __consumer_offsets
主题的所有旧日志.
I find that the __consumer_offsets
topic log size is growing rapidly and after studying it further found the topics with the highest volume. I changed the retention policy on these topics to stop the rate of growth but would like to increase disk space and delete all the old logs for __consumer_offsets
topic.
但这会导致所有其他主题和消费者/生产者损坏或丢失有价值的元数据.有没有办法我可以做到这一点?我正在查看配置的参数,其中包括清理策略和压缩,但不确定如何专门针对导致这种快速增长的主题进行指定.
But this will cause all the other topics and consumers/producers to get corrupted or lose valuable metadata. Is there a way I can accomplish this? I'm looking at the parameters for the config which includes cleanup policy and compression but not sure how to specify this specifically for the topics that caused this rapid growth.
https://docs.confluent.io/current/installation/configuration/topic-configs.html
感谢这里的任何帮助.
推荐答案
在Kafka中,日志保留有两种类型;大小和时间保留.前者由log.retention.bytes
触发,后者由log.retention.hours
触发.
In Kafka, there are two types of log retention; size and time retention. The former is triggered by log.retention.bytes
while the latter by log.retention.hours
.
在您的情况下,您应该注意 size 保留,有时配置起来非常棘手.假设您想要一个 delete
清理策略,您需要将以下参数配置为
In your case, you should pay attention to size retention that sometimes can be quite tricky to configure. Assuming that you want a delete
cleanup policy, you'd need to configure the following parameters to
log.cleaner.enable=true
log.cleanup.policy=delete
然后需要考虑log.retention.bytes
、log.segment.bytes
和log.retention.check.interval.ms的配置代码>.为此,您必须考虑以下因素:
Then you need to think about the configuration of log.retention.bytes
, log.segment.bytes
and log.retention.check.interval.ms
. To do so, you have to take into consideration the following factors:
log.retention.bytes
是主题的单个分区的最低保证,这意味着如果您设置log.retention.bytes
到 512MB,这意味着您的磁盘中将始终有 512MB 的数据(每个分区).
log.retention.bytes
is a minimum guarantee for a single partition of a topic, meaning that if you setlog.retention.bytes
to 512MB, it means you will always have 512MB of data (per partition) in your disk.
同样,如果您将 log.retention.bytes
设置为 512MB,将 log.retention.check.interval.ms
设置为 5 分钟(这是默认值)value) 在任何给定时间,在触发保留策略之前,您将拥有至少 512MB 的数据 + 5 分钟窗口内生成的数据大小.
Again, if you set log.retention.bytes
to 512MB and log.retention.check.interval.ms
to 5 minutes (which is the default value) at any given time, you will have at least 512MB of data + the size of data produced within the 5 minute window, before the retention policy is triggered.
磁盘上的主题日志,由段组成.段大小取决于 log.segment.bytes
参数.对于 log.retention.bytes=1GB
和 log.segment.bytes=512MB
,您将始终在磁盘上最多有 3 个段(2 个段达到保留和第三个将是当前写入数据的活动段).
A topic log on disk, is made up of segments. The segment size is dependent to log.segment.bytes
parameter. For log.retention.bytes=1GB
and log.segment.bytes=512MB
, you will always have up to 3 segments on the disk (2 segments which reach the retention and the 3rd one will be the active segment where data is currently written to).
最后,您应该进行数学运算并计算在任何给定时间在您的磁盘上可能被 Kafka 日志保留的最大大小,并相应地调整上述参数.当然,我也建议设置时间保留策略并相应地配置 log.retention.hours
.如果 2 天后您不再需要数据,则设置 log.retention.hours=48
.
Finally, you should do the math and compute the maximum size that might be reserved by Kafka logs at any given time on your disk and tune the aforementioned parameters accordingly. Of course, I would also advice to set a time retention policy as well and configure log.retention.hours
accordingly. If after 2 days you don't need your data anymore, then set log.retention.hours=48
.
现在为了更改 __consumer_offsets
主题的保留策略,您只需运行:
Now in order to change the retention policy just for the __consumer_offsets
topic, you can simply run:
bin/kafka-configs.sh \
--zookeeper localhost:2181 \
--alter \
--entity-type topics \
--entity-name __consumer_offsets \
--add-config retention.bytes=...
<小时>
作为旁注,您必须非常小心 __consumer_offsets
的保留策略,因为这可能会弄乱您的所有消费者.
As a side note, you must be very careful with the retention policy for the __consumer_offsets
as this might mess up all your consumers.
这篇关于kafka __consumer_offsets 主题日志的大小迅速增长,减少了磁盘空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!