为什么 .index 文件存在于 kafka-log 目录中? [英] Why do .index files exist in the kafka-log directory?

查看:20
本文介绍了为什么 .index 文件存在于 kafka-log 目录中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚创建了一个新主题,还没有产生任何消息.在/tmp/kafka-logs-1/topicname-0/目录下创建了一个名为00000000000000000000.index的文件,这个文件很大.我在vi中打开那个二进制文件,内容只有0000 0000 0000 0000..."这是什么意思?这个索引文件是关于什么的?

I just made a new topic, and haven't produced any message yet. A file named in 00000000000000000000.index was created in the directory /tmp/kafka-logs-1/topicname-0/, and the size of that file is really big. I opened that binary file in vi, and the contents are only "0000 0000 0000 0000..." What does this mean? What is this index file about?

推荐答案

日志的每个段(文件 *.log)都有其对应的索引(文件 *.index),它们的名称与它们代表的基本偏移.

Every segment of a log (the files *.log) has it's corresponding index (the files *.index) with the same name as they represent the base offset.

为了便于理解,日志文件包含以消息格式构建的实际消息.对于此文件中的每条消息,前 64 位描述递增的偏移量.现在,查找具有特定偏移量的消息的文件变得昂贵,因为日志文件可能会增长到千兆字节的范围.为了能够生成消息,代理实际上必须进行此类查找以确定最新的偏移量,并能够进一步正确增加传入的消息.

For understanding, the log file contains the actual messages structured in a message format. For each message within this file, the first 64bits describe the incremented offset. Now, looking up this file for messages with a specific offset becomes expensive since log files may grow in the range of gigabytes. And to be able to produce messages, the broker actually has to do such kind of lookups to determine the latest offset and be able to further increment incoming messages correctly.

这就是为什么会有一个索引文件.首先,索引文件中消息的结构只描述了 2 个字段,每个字段的长度为 32 位:

This is why there is an index file. First of all, the structure of the messages within the index file describes only 2 fields, each of them 32bit long:

  1. 4 字节:相对偏移
  2. 4 字节:物理位置

如前所述,文件名代表基本偏移量.与日志文件中每条消息的偏移量都增加不同,索引文件中的消息包含相对于基本偏移量的相对偏移量.第二个字段表示相关日志消息的物理位置(基本偏移量 + 相对偏移量),因此,O(1) 的查找成为可能.

As described before, the file name represents the base offset. In contrast to the log file where the offset is incremented for each message, the messages within the index files contain a relative offsets to the base offset. The second field represents the physical position of the related log message (base offset + relative offset) and thus, a lookup of O(1) becomes possible.

毕竟要提到的是,并非日志中的每条消息都在索引中具有相应的消息.配置参数index.interval.bytes,默认为4096字节,设置一个索引间隔,它基本上描述了一个索引条目的添加频率(在多少字节之后).

After all there is to mention, that not every message within a log has it's corresponding message within the index. The configuration parameter index.interval.bytes, which is 4096 bytes by default, sets an index interval which basically describes how frequently (after how many bytes) an index entry will be added.

关于.index文件大小的问题有如下说明: 配置参数segment.index.bytes,默认为10MB,描述了这个文件的大小.这个空间被重新分配,只有在日志滚动后才会缩小.

Regarding the question to size of the .index file there is the following to say: The configuration parameter segment.index.bytes, which is 10MB by default, describes the size of this file. This space is reallocated and will shrink only after log rolls.

这篇关于为什么 .index 文件存在于 kafka-log 目录中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆