Kafka __consumer_offsets 规模不断扩大 [英] Kafka __consumer_offsets growing in size

查看:28
本文介绍了Kafka __consumer_offsets 规模不断扩大的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们使用 Kafka 作为严格有序的队列,因此使用 单个主题/单个分区/单个消费者组 组合.以后我应该可以使用多个分区.

We are using Kafka as a Strictly Ordered Queue and hence a single topic/single partition/single consumer group combo is in use. I should be able to use multiple partition later in future.

我的消费者是 spring-boot 应用侦听器,它从相同的主题产生和消费.所以消费者群体是固定的,永远只有一个消费者.

My consumer is spring-boot app listener, that produces and consumes from the same topic(s). So the consumer group is fixed and there is always a single consumer.

Kafka 版本 0.10.1.1

在这种情况下,topic-0 和一些 __consumer_offsets_XX 的日志文件会增长.事实上 __consumer_offsets_XX 增长得非常高,即使它应该每 60 分钟定期清除一次(默认情况下).消费者不会一直阅读,但它有 auto.commit.enabled=true

In such scenario the Log file for topic-0 and a few __consumer_offsets_XX grows. In fact __consumer_offsets_XX grows very high, even though it is supposed to be cleared periodically every 60 minutes (by default). The consumer doesn't read all the time but it has auto.commit.enabled=true

默认情况下,log.retention.minutes(默认 7 天)> offset.retention.minutes(默认 1 天);但就我而言,由于我的消费群体/消费者是固定的且单一的;一旦消息被消费,将消息保存在 topic-0 中可能没有任何意义.我应该让 log.retention.minutes 少于 3 天(比如说)吗?

By default, log.retention.minutes (default 7 days) > offset.retention.minutes (default 1 day); but in my case, since my consumer group/consumer is fixed and single; it may not make any sense to keep the messages in topic-0 once it is consumed. Shall I make log.retention.minutes as less as 3 days (say)?

我可以降低 offset.retention.minutes 以控制 __consumer_offsets_XX 不断增长的大小,而无需触摸 auto.commit 设置?

Can I make the offset.retention.minutes lower to be able to control the growing size of the __consumer_offsets_XX w/o touching the auto.commit settings?

推荐答案

offsets.retention.minuteslog.retention.XXX 属性将影响记录的物理删除/messages/logs 仅当偏移文件 发生滚动.

offsets.retention.minutes and log.retention.XXX properties will impact a physical removal of records/messages/logs only if offset file rolling occurs.

一般来说,offsets.retention.minutes 属性规定,如果消费者消失了指定的时间,代理应该忘记您的消费者,并且即使不删除日志也可以做到这一点磁盘中的文件.

In general, offsets.retention.minutes property dictates that a broker should forget about your consumer if a consumer disappeared for the specified amount of time and it can do that even without removing log files from the disk.

如果您将此值设置为相对较低的数字,并在没有活跃消费者的情况下检查您的 __consumer_offsets 主题,随着时间的推移,您会注意到以下内容:

If you set this value to a relatively low number and check your __consumer_offsets topic while there are no active consumers, over time you will notice something like:

    [group,topic,7]::OffsetAndMetadata(offset=7, leaderEpoch=Optional.empty, metadata=, commitTimestamp=1557475923142, expireTimestamp=None)
    [group,topic,8]::OffsetAndMetadata(offset=6, leaderEpoch=Optional.empty, metadata=, commitTimestamp=1557475923142, expireTimestamp=None)
    [group,topic,6]::OffsetAndMetadata(offset=7, leaderEpoch=Optional.empty, metadata=, commitTimestamp=1557475923142, expireTimestamp=None)
    [group,topic,19]::NULL
    [group,topic,5]::NULL
    [group,topic,22]::NULL

这表示事件存储系统(如 Kafka)的一般工作方式.他们记录新事件,而不是更改现有事件.

Which signifies how event store systems, like Kafka, work in general. They record new events, instead of changing the existing ones.

我不知道有任何默认情况下每 60 分钟删除/清理一次主题的 Kafka 版本,我感觉您误解了文档中的某些内容.

I am not aware of any Kafka version where topics are deleted/cleaned up every 60 minutes by default and I have a feeling you misinterpreted something from the documentation.

__consumer_offsets 的管理方式似乎与常规主题非常不同.删除 __consumer_offsets 的唯一方法是强制滚动其文件.但是,这与常规日志文件的发生方式不同.虽然常规日志文件(用于您的数据主题)在每次删除时都会自动滚动,而不管 log.roll. 属性如何,__consumer_offsets 不会这样做.如果它们没有滚动并停留在初始 ...00000 段,它们根本不会被删除.因此,似乎减少 __consumer_offsets 文件的方法是:

It seems that the way __consumer_offsets are managed is very different from regular topics. The only way to get __consumer_offsets deleted is to force rolling of its files. That, however, doesn't happen same way it does for regular log files. While regular log files(for your data topics) are rolled automatically every time they are deleted, regardless of log.roll. property, __consumer_offsets don't do that. And if they are not rolled and stay at the initial ...00000 segment, they are not deleted at all. So, it seems the way to reduce your __consumer_offsets files is:

  1. 设置相对较小的 log.roll. ;
  2. 如果您有能力与消费者断开连接,请操纵 offsets.retention.minutes
  3. 否则调整 log.retention.XXX 属性.
  1. Set relatively small log.roll. ;
  2. Manipulate offsets.retention.minutes if you can afford to disconnect your consumers;
  3. Otherwise adjust log.retention.XXX property.

这篇关于Kafka __consumer_offsets 规模不断扩大的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆