即使保留时间/大小仍然保留,数据仍然保留在Kafka主题中 [英] Data still remains in Kafka topic even after retention time/size

查看:326
本文介绍了即使保留时间/大小仍然保留,数据仍然保留在Kafka主题中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们将log retention hours设置为1小时,如下所示(以前的设置是72H)

We set the log retention hours to 1 hour as the following (previously setting was 72H)

使用以下Kafka命令行工具,将kafka retention.ms设置为1H.我们的目的是清除主题-test_topic中早于1H的数据,因此我们使用了以下命令:

Using the following Kafka command line tool, we set the kafka retention.ms to 1H. Our aim is to purge the data that is older then 1H in topic - test_topic, so we used the following command:

kafka-configs.sh --alter \
  --zookeeper localhost:2181  \
  --entity-type topics \
  --entity-name topic_test \
  --add-config retention.ms=3600000

还有

kafka-topics.sh --zookeeper localhost:2181 --alter \
  --topic topic_test \
  --config retention.ms=3600000

两个命令都运行无误.

但是问题出在Kafka数据上,该数据早于1H并仍然存在!

But the problem is about Kafka data that is older then 1H and still remains!

实际上,没有从主题topic_test分区中删除任何数据.我们有HDP Kafka集群版本1.0x和ambari

Actually no data was removed from the topic topic_test partitions. We have HDP Kafka cluster version 1.0x and ambari

我们不明白为什么仍保留有关主题-topic_test的数据?甚至在我们已经描述的同时运行cli后也没有减少

We do not understand why data on topic - topic_test still remained? and not decreased even after we run both cli as already described

以下kafka cli有什么问题?

what is wrong on the following kafka cli?

kafka-configs.sh --alter --zookeeper localhost:2181  --entity-type topics  --entity-name topic_test --add-config retention.ms=3600000

kafka-topics.sh --zookeeper localhost:2181 --alter --topic topic_test --config retention.ms=3600000

从卡夫卡server.log中我们可以看到以下内容

from the Kafka server.log we ca see the following

2020-07-28 14:47:27,394] INFO Processing override for entityPath: topics/topic_test with config: Map(retention.bytes -> 2165441552, retention.ms -> 3600000) (kafka.server.DynamicConfigManager)
[2020-07-28 14:47:27,397] WARN retention.ms for topic topic_test is set to 3600000. It is smaller than message.timestamp.difference.max.ms's value 9223372036854775807. This may result in frequent log rolling. (kafka.server.TopicConfigHandler)

参考- https://ronnieroller.com/kafka/cheat-sheet

推荐答案

日志清除器仅适用于非活动(有时也称为旧"或干净")段.只要所有数据都适合其大小由segment.bytes大小限制定义的活动(脏",不干净")段,就不会进行清理.

The log cleaner will only work on inactive (sometimes also referred to as "old" or "clean") segments. As long as all data fits into the active ("dirty", "unclean") segment where its size is defined by segment.bytes size limit there will be no cleaning happening.

配置cleanup.policy描述为:

删除"或删除"的字符串或紧凑"或两者. 此字符串指定要在旧日志段上使用的保留策略.当达到旧段的保留时间或大小限制时,默认策略(删除")将丢弃旧段.所述紧凑的"或紧凑的".设置将启用有关该主题的日志压缩.

A string that is either "delete" or "compact" or both. This string designates the retention policy to use on old log segments. The default policy ("delete") will discard old segments when their retention time or size limit has been reached. The "compact" setting will enable log compaction on the topic.

此外,segment.bytes是:

此配置控制日志的段文件大小.保留和清除操作总是一次完成一个文件,因此更大的段大小意味着更少的文件,但对保留的控制较少.

配置segment.ms也可以用于引导删除:

The configuration segment.ms can also be used to steer the deletion:

此配置控制Kafka将迫使日志滚动的时间段,即使段文件未满也可以确保保留可以删除或压缩旧数据.

This configuration controls the period of time after which Kafka will force the log to roll even if the segment file isn't full to ensure that retention can delete or compact old data.

由于默认为一周,因此您可能需要减少它以满足您的需求.

As it defaults to one week, you might want to reduce it to fit your needs.

因此,如果要将主题的保留时间设置为例如您可以设置一个小时:

Therefore, if you want to set the retention of a topic to e.g. one hour you could set:

cleanup.policy=delete
retention.ms=3600000
segment.ms=3600000
file.delete.delay.ms=1 (The time to wait before deleting a file from the filesystem)
segment.bytes=1024

注意:我指的不是retention.bytes.如上所述,segment.bytes是非常不同的东西.另外,请注意log.retention.hours集群范围内的配置.因此,如果您计划为不同的主题设置不同的保留时间,则可以解决此问题.

Note: I am not referring to retention.bytes. The segment.bytes is a very different thing as described above. Also, be aware that log.retention.hours is a cluster-wide configuration. So, if you plan to have different retention times for different topics this will solve it.

这篇关于即使保留时间/大小仍然保留,数据仍然保留在Kafka主题中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆