Kafka最佳保留和删除策略 [英] Kafka optimal retention and deletion policy

查看:1589
本文介绍了Kafka最佳保留和删除策略的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对kafka还是很陌生,所以如果这个问题很琐碎,请原谅我.对于计时测试,我有一个非常简单的设置,如下所示:

I am fairly new to kafka so forgive me if this question is trivial. I have a very simple setup for purposes of timing tests as follows:

机器A->写入主题1(代理)->机器B从主题1读取 机器B->写刚读到主题2的消息(代理)->机器A从主题2读

Machine A -> writes to topic 1 (Broker) -> Machine B reads from topic 1 Machine B -> writes message just read to topic 2 (Broker) -> Machine A reads from topic 2

现在,我正在无限循环中发送大约1400字节的消息,很快就占满了我的小经纪人的空间.我正在尝试为log.retention.ms,log.retention.bytes,log.segment.bytes和log.segment.delete.delay.ms设置不同的值.首先,我将所有值都设置为允许的最小值,但是这似乎降低了性能,然后将它们设置为经纪人在完全填满之前可以达到的最大值,但是再次发生删除时,性能降低了.设置这些值以获得绝对最小延迟是否有最佳实践?

Now I am sending messages of roughly 1400 bytes in an infinite loop filling up the space on my small broker very quickly. I'm experimenting with setting different values for log.retention.ms, log.retention.bytes, log.segment.bytes and log.segment.delete.delay.ms. First I set all of the values to the minimum allowed, but it seemed this degraded performance, then I set them to the maximum my broker could take before being completely full, but again the performance degrades when a deletion occurs. Is there a best practice for setting these values to get the absolute minimum delay?

感谢您的帮助!

推荐答案

Apache Kafka使用Log数据结构来管理其消息.日志数据结构基本上是段的有序集合,而段是消息的集合. Apache Kafka在细分级别而不是消息级别提供保留.因此,Kafka会继续从其末端删除细分,因为这些违反了保留政策.

Apache Kafka uses Log data structure to manage its messages. Log data structure is basically an ordered set of Segments whereas a Segment is a collection of messages. Apache Kafka provides retention at Segment level instead of at Message level. Hence, Kafka keeps on removing Segments from its end as these violate retention policies.

Apache Kafka为我们提供了以下保留策略-

Apache Kafka provides us with the following retention policies -

  1. 基于时间的保留

在此政策下,我们配置了细分受众群(因此,消息)可以生存的最长时间.一旦某个段已超过配置的保留时间,则根据配置的清除策略将其标记为删除或压缩.细分的默认保留时间为7天.

Under this policy, we configure the maximum time a Segment (hence messages) can live for. Once a Segment has spanned configured retention time, it is marked for deletion or compaction depending on configured cleanup policy. Default retention time for Segments is 7 days.

以下是您可以在Kafka经纪人属性文件中设置的参数(按优先级降序排列):

Here are the parameters (in decreasing order of priority) that you can set in your Kafka broker properties file:

以毫秒为单位配置保留时间

Configures retention time in milliseconds

log.retention.ms = 1680000

log.retention.ms=1680000

在未设置log.retention.ms的情况下使用

Used if log.retention.ms is not set

log.retention.minutes = 1680

log.retention.minutes=1680

在未设置log.retention.minutes的情况下使用

Used if log.retention.minutes is not set

log.retention.hours = 168

log.retention.hours=168

  1. 基于大小的保留

在此策略中,我们为主题分区配置日志数据结构的最大大小.一旦日志大小达到此大小,它将开始从末尾删除细分.此策略不受欢迎,因为它不能提供有关邮件到期的良好可见性.但是,在磁盘空间有限的情况下需要控制日志大小的情况下,它会派上用场.

In this policy, we configure the maximum size of a Log data structure for a Topic partition. Once Log size reaches this size, it starts removing Segments from its end. This policy is not popular as this does not provide good visibility about message expiry. However it can come handy in a scenario where we need to control the size of a Log due to limited disk space.

以下是您可以在Kafka经纪人属性文件中设置的参数:

Here are the parameters that you can set in your Kafka broker properties file:

配置日志的最大大小

log.retention.bytes = 104857600

log.retention.bytes=104857600

因此,根据您的用例,您应该配置 log.retention.bytes ,以使磁盘不会变满.

So according to your use case you should configure log.retention.bytes so that your disk should not get full.

这篇关于Kafka最佳保留和删除策略的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆