kafka __consumer_offsets 主题日志的大小迅速增长,减少了磁盘空间 [英] kafka __consumer_offsets topic logs rapidly growing in size reducing disk space

查看:57
本文介绍了kafka __consumer_offsets 主题日志的大小迅速增长,减少了磁盘空间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现 __consumer_offsets 主题日志的大小正在迅速增长,经过进一步研究后发现了数量最多的主题.我更改了这些主题的保留策略以阻止增长速度,但希望增加磁盘空间并删除 __consumer_offsets 主题的所有旧日志.

I find that the __consumer_offsets topic log size is growing rapidly and after studying it further found the topics with the highest volume. I changed the retention policy on these topics to stop the rate of growth but would like to increase disk space and delete all the old logs for __consumer_offsets topic.

但这会导致所有其他主题和消费者/生产者损坏或丢失有价值的元数据.有没有办法我可以做到这一点?我正在查看配置的参数,其中包括清理策略和压缩,但不确定如何专门针对导致这种快速增长的主题进行指定.

But this will cause all the other topics and consumers/producers to get corrupted or lose valuable metadata. Is there a way I can accomplish this? I'm looking at the parameters for the config which includes cleanup policy and compression but not sure how to specify this specifically for the topics that caused this rapid growth.

https://docs.confluent.io/current/installation/configuration/topic-configs.html

感谢这里的任何帮助.

推荐答案

在Kafka中,日志保留有两种类型;大小时间保留.前者由log.retention.bytes触发,后者由log.retention.hours触发.

In Kafka, there are two types of log retention; size and time retention. The former is triggered by log.retention.bytes while the latter by log.retention.hours.

在您的情况下,您应该注意 size 保留,有时配置起来非常棘手.假设您想要一个 delete 清理策略,您需要将以下参数配置为

In your case, you should pay attention to size retention that sometimes can be quite tricky to configure. Assuming that you want a delete cleanup policy, you'd need to configure the following parameters to

log.cleaner.enable=true
log.cleanup.policy=delete

然后需要考虑log.retention.byteslog.segment.byteslog.retention.check.interval.ms的配置.为此,您必须考虑以下因素:

Then you need to think about the configuration of log.retention.bytes, log.segment.bytes and log.retention.check.interval.ms. To do so, you have to take into consideration the following factors:

  • log.retention.bytes主题的单个分区的最低保证,这意味着如果您设置 log.retention.bytes 到 512MB,这意味着您的磁盘中将始终有 512MB 的数据(每个分区).

  • log.retention.bytes is a minimum guarantee for a single partition of a topic, meaning that if you set log.retention.bytes to 512MB, it means you will always have 512MB of data (per partition) in your disk.

同样,如果您将 log.retention.bytes 设置为 512MB,将 log.retention.check.interval.ms 设置为 5 分钟(这是默认值)value) 在任何给定时间,在触发保留策略之前,您将拥有至少 512MB 的数据 + 5 分钟窗口内生成的数据大小.

Again, if you set log.retention.bytes to 512MB and log.retention.check.interval.ms to 5 minutes (which is the default value) at any given time, you will have at least 512MB of data + the size of data produced within the 5 minute window, before the retention policy is triggered.

磁盘上的主题日志,由段组成.段大小取决于 log.segment.bytes 参数.对于 log.retention.bytes=1GBlog.segment.bytes=512MB,您将始终在磁盘上最多有 3 个段(2 个段达到保留和第三个将是当前写入数据的活动段).

A topic log on disk, is made up of segments. The segment size is dependent to log.segment.bytes parameter. For log.retention.bytes=1GB and log.segment.bytes=512MB, you will always have up to 3 segments on the disk (2 segments which reach the retention and the 3rd one will be the active segment where data is currently written to).

最后,您应该进行数学运算并计算在任何给定时间在您的磁盘上可能被 Kafka 日志保留的最大大小,并相应地调整上述参数.当然,我也建议设置时间保留策略并相应地配置 log.retention.hours.如果 2 天后您不再需要数据,则设置 log.retention.hours=48.

Finally, you should do the math and compute the maximum size that might be reserved by Kafka logs at any given time on your disk and tune the aforementioned parameters accordingly. Of course, I would also advice to set a time retention policy as well and configure log.retention.hours accordingly. If after 2 days you don't need your data anymore, then set log.retention.hours=48.

现在为了更改 __consumer_offsets 主题的保留策略,您只需运行:

Now in order to change the retention policy just for the __consumer_offsets topic, you can simply run:

bin/kafka-configs.sh \
    --zookeeper localhost:2181 \
    --alter \
    --entity-type topics \
    --entity-name __consumer_offsets \
    --add-config retention.bytes=...

<小时>

作为旁注,您必须非常小心 __consumer_offsets 的保留策略,因为这可能会弄乱您的所有消费者.


As a side note, you must be very careful with the retention policy for the __consumer_offsets as this might mess up all your consumers.

这篇关于kafka __consumer_offsets 主题日志的大小迅速增长,减少了磁盘空间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆