如何将现有的kafka主题分区散布到更多目录中? [英] How to spread existing kafka topic partitions into more directories?

查看:69
本文介绍了如何将现有的kafka主题分区散布到更多目录中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

默认情况下,kafka使用一个目录保存日志.为了提高性能,建议将更多磁盘装入代理,并将每个磁盘分配到一个目录,然后在 server.properties 中输入 log.dirs = 作为逗号.分开的目录列表.该文档说,分区将以循环方式分布在目录中.据我了解,这对于新主题是正确的.

By default, kafka uses one directory to keep the log. To increase performance, it is advised to mount more disks to the broker, and assign each disk to one directory then in server.properties enter the log.dirs= as a coma separated list of directories. The documentation says, that partitions will be distributed among the directories round-robin style. As I understand now, this is true for new topics.

我想将已经创建的主题的一半分区分配给新创建的 log.dir ,而将另一半保留在原处-是否有支持的方法?/p>

I would like to distribute half of the partitions of my already created topic to a newly created log.dir while keeping the other half where they are - is there a supported way to do that ?

推荐答案

方法1:只需删除现有数据目录内容并配置新的数据目录位置

在这种方法中,Kafka从群集的其他成员复制分区数据.完整的分区数据将从头开始复制.所有分区均在目录位置之间均匀分配.复制时间将取决于数据大小.如果我们拥有大量数据,则副本可能需要更多时间才能加入ISR.这还将在网络/群集上增加很多负载.这可能会给Kafka群集造成一些问题.我们可能会看到一些ISR更改和客户端错误.这种方法适用于小型集群(GB数据)

In this approach, Kafka replicates the partition data from other members of the cluster. Complete partition data will replicated from the beginning. All the partitions are evenly allocated across directory locations. Replication time will depend on data size. If we have huge data, replica may take more time to join the ISR. This will also put lot of load on the network/cluster. This may cause some problems to Kafka cluster. We may see, some ISR changes and client errors. This approach should be fine for small clusters ( GBs of data)

注意:在Kafka中,broker-id将存储在log.dir/meta.properties文件中.如果我们还没有配置broker.id,那么默认情况下,Kafka会生成一个新的broker-id.为了避免这种情况,请将现有的meta.properties文件保留在log.dirs目录中.

Note: In Kafka, broker-id will be stored in log.dir/meta.properties file. If we have not configured broker.id, then by-default Kafka generates a new broker-id. To avoid this, retain existing meta.properties file in log.dirs directory.

方法2:将分区目录移动到新的数据目录(不应对检查点文件)

与上述方法类似,但是Kafka仅复制移动的分区.

It is similar to above approach, but here Kafka only replicates the moved partitions.

方法3:移动分区目录并拆分检查点文件.

每个数据目录包含三个检查点文件,即复制偏移量检查点,恢复点偏移量检查点和清洁程序偏移量检查点.这些文件包含该目录中可用分区的最后提交的偏移量,日志结尾检查点和清除程序检查点详细信息.每个文件都包含版本号(完整编号),每个条目一行.

Each data directory contains three checkpoint files namely replication-offset-checkpoint, recovery-point-offset-checkpoint and cleaner-offset-checkpoint. These files contains last committed offset, log end checkpoint and cleaner checkpoint details for the partitions available in that directory. Each of the file contains version number, no.of entires, one row for each entry.

我们需要将这些文件复制/创建到新目录,并且需要更新这些文件.我们需要调整两个目录(旧目录和新目录)中的条目.如果我们有大量的分区,这可能很乏味.但是,如果我们有大量数据,这是最好的方法.通过这种方法,副本将快速加入ISR.群集/网络上的负载将减少.

We need to copy/create these files to new directory and we need to update these files. we need to adjust the entries in both the directories (old directory and new directory). This may be tedious if we have large number of partitions. But this is the best approach if we have huge data. With this approach replicas will join quickly to ISR. Load on the cluster/network will be less.

这篇关于如何将现有的kafka主题分区散布到更多目录中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆