Kafka最优保留和删除策略 [英] Kafka optimal retention and deletion policy

查看:39
本文介绍了Kafka最优保留和删除策略的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对 kafka 相当陌生,所以如果这个问题微不足道,请原谅我.为了计时测试,我有一个非常简单的设置,如下所示:

I am fairly new to kafka so forgive me if this question is trivial. I have a very simple setup for purposes of timing tests as follows:

机器 A -> 写入主题 1 (Broker) -> 机器 B 从主题 1 读取机器 B -> 将刚刚读取的消息写入主题 2 (Broker) -> 机器 A 从主题 2 中读取

Machine A -> writes to topic 1 (Broker) -> Machine B reads from topic 1 Machine B -> writes message just read to topic 2 (Broker) -> Machine A reads from topic 2

现在我在无限循环中发送大约 1400 字节的消息,很快填满了我的小代理上的空间.我正在尝试为 log.retention.ms、log.retention.bytes、log.segment.bytes 和 log.segment.delete.delay.ms 设置不同的值.首先,我将所有值设置为允许的最小值,但似乎性能下降了,然后我将它们设置为代理在完全满之前可以采用的最大值,但是当发生删除时,性能再次下降.是否有设置这些值以获得绝对最小延迟的最佳实践?

Now I am sending messages of roughly 1400 bytes in an infinite loop filling up the space on my small broker very quickly. I'm experimenting with setting different values for log.retention.ms, log.retention.bytes, log.segment.bytes and log.segment.delete.delay.ms. First I set all of the values to the minimum allowed, but it seemed this degraded performance, then I set them to the maximum my broker could take before being completely full, but again the performance degrades when a deletion occurs. Is there a best practice for setting these values to get the absolute minimum delay?

感谢您的帮助!

推荐答案

Apache Kafka 使用 Log 数据结构来管理其消息.日志数据结构基本上是一组有序的段,而段是消息的集合.Apache Kafka 在 Segment 级别而不是 Message 级别提供保留.因此,Kafka 不断从其末尾删除 Segments,因为它们违反了保留政策.

Apache Kafka uses Log data structure to manage its messages. Log data structure is basically an ordered set of Segments whereas a Segment is a collection of messages. Apache Kafka provides retention at Segment level instead of at Message level. Hence, Kafka keeps on removing Segments from its end as these violate retention policies.

Apache Kafka 为我们提供了以下保留策略 -

Apache Kafka provides us with the following retention policies -

  1. 基于时间的保留

在此政策下,我们配置了 Segment(因此消息)可以存活的最长时间.一旦 Segment 超过了配置的保留时间,它就会根据配置的清理策略被标记为删除或压缩.细分的默认保留时间为 7 天.

Under this policy, we configure the maximum time a Segment (hence messages) can live for. Once a Segment has spanned configured retention time, it is marked for deletion or compaction depending on configured cleanup policy. Default retention time for Segments is 7 days.

以下是您可以在 Kafka 代理属性文件中设置的参数(按优先级降序排列):

Here are the parameters (in decreasing order of priority) that you can set in your Kafka broker properties file:

以毫秒为单位配置保留时间

Configures retention time in milliseconds

log.retention.ms=1680000

log.retention.ms=1680000

在未设置 log.retention.ms 时使用

Used if log.retention.ms is not set

log.retention.minutes=1680

log.retention.minutes=1680

在未设置 log.retention.minutes 时使用

Used if log.retention.minutes is not set

log.retention.hours=168

log.retention.hours=168

  1. 基于尺寸的保留

在此策略中,我们为 Topic 分区配置 Log 数据结构的最大大小.一旦日志大小达到此大小,它就会开始从其末尾删除段.此策略不受欢迎,因为它不能提供有关消息到期的良好可见性.然而,在我们由于磁盘空间有限而需要控制日志大小的场景中,它可以派上用场.

In this policy, we configure the maximum size of a Log data structure for a Topic partition. Once Log size reaches this size, it starts removing Segments from its end. This policy is not popular as this does not provide good visibility about message expiry. However it can come handy in a scenario where we need to control the size of a Log due to limited disk space.

以下是您可以在 Kafka 代理属性文件中设置的参数:

Here are the parameters that you can set in your Kafka broker properties file:

配置日志的最大大小

log.retention.bytes=104857600

log.retention.bytes=104857600

因此,根据您的用例,您应该配置 log.retention.bytes 以便您的磁盘不会变满.

So according to your use case you should configure log.retention.bytes so that your disk should not get full.

这篇关于Kafka最优保留和删除策略的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆