git 索引究竟包含什么? [英] What does the git index contain EXACTLY?

查看:30
本文介绍了git 索引究竟包含什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Git 索引具体包含哪些内容,可以使用什么命令查看索引内容?


感谢您的所有回答.我知道索引充当临时区域,提交的内容在索引中而不是工作树中.我只是好奇索引对象由什么组成.我猜它可能是一个文件名/目录名列表、SHA-1 对、一种虚拟树?

在 Git 术语中,是否有任何 plumbing 命令可以用来列出索引的内容?

解决方案

Git 书包含一篇关于

<块引用>

Git 索引是 Git 中的关键数据结构.它充当文件系统上的文件和提交历史记录之间的暂存区".

  • 当您运行 git add 时,您工作目录中的文件将被散列并作为对象存储在索引中,导致它们成为阶段性更改".
  • 当您运行 git commit 时,存储在索引中的暂存更改用于创建新提交.
  • 当您运行 git checkout 时,Git 从提交中获取数据并将其写入工作目录和索引.

除了存储您的暂存更改之外,索引还存储有关您的工作目录的文件系统信息.
这有助于 Git 更快地报告更改的文件.


要查看更多信息,请参阅."git/git/Documentation/technical/index-format.txt":

Git 索引文件格式如下

<块引用>

所有二进制数均按网络字节顺序排列.
版本 2 除非另有说明,否则在此处进行描述.

<块引用>

  • 一个 12 字节的标头,包括:
  • 4 字节签名:
    签名为 { 'D', 'I', 'R', 'C' }(代表对于dircache")
  • 4 字节版本号:
    当前支持的版本为 2、3 和 4.
  • 32 位索引条目数.
  • 多个排序的索引条目.
  • 扩展:
    扩展由签名标识.
    如果 Git 不理解可选扩展,可以忽略它们.
    Git 目前支持缓存树和解析撤销扩展.
  • 4 字节扩展签名.如果第一个字节是 'A'..'Z',则扩展名是可选的,可以忽略.
  • 扩展程序的 32 位大小
  • 扩展数据
  • 在此校验和之前对索引文件的内容进行 160 位 SHA-1.


mljrg 评论:

<块引用>

如果索引是准备下一次提交的地方,为什么不git ls-files -s"?提交后什么都不返回?

因为索引代表正在跟踪的内容,并且在提交之后,正在跟踪的内容与上次提交相同(git diff --cached 不返回任何内容).

所以 git ls-files -s 列出所有跟踪的文件(输出中的对象名称、模式位和阶段号).

那个(被跟踪元素的)列表是用提交的内容初始化的.
当您切换分支时,索引内容将重置为您刚刚切换到的分支所引用的提交.


Git 2.20(2018 年第 4 季度)添加了一个索引条目偏移表 (IEOT):

参见 commit 77ff112commit 3255089, 提交 abb4bb8提交 c780b9c提交 3b1d9e0, 提交 3b1d9e0rel="nofollow noreferrer">commit 371ed0d(2018 年 10 月 10 日)由 Ben Peart (benpeart).
请参阅 commit 252d079(2018 年 9 月 26 日)Nguyễn Thái Ngọc Duy (pclouds).
(由 Junio C Hamano 合并 -- gitster --commit e27bfaa,2018 年 10 月 19 日)

<块引用>

ieot:添加索引条目偏移表 (IEOT) 扩展

<块引用>

这个补丁可以通过添加索引来解决加载索引的 CPU 成本索引的附加数据将使我们能够有效地多线程加载和转换缓存条目.

它通过添加一个(可选的)索引扩展来实现这一点,它是一个索引文件中缓存条目块的偏移量表.

为了使其适用于 V4 索引,在写入缓存条目时,它会定期重置"缓存条目.通过编码当前条目的前缀压缩就好像前一个条目的路径名完全不同并保存该条目在 IEOT 中的偏移量.
基本上,使用 V4 索引,它会在前缀压缩条目块中生成偏移量.

使用新的 index.threads 配置设置,现在索引加载速度更快.


结果(使用 IEOT), commit 7bd9631 清理read-cache_cache.c load_threaded_threaded 用于 Git 2.23(2019 年第三季度)的函数.

参见 commit 8373037commit d713e88, 提交 d92349d, 提交 113c29a href="https>,://github.com/git/git/commit/c95fc72f47341cf7cf80b3b878cc8d35684bc1e8" rel="nofollow noreferrer">commit c95fc72, 提交 7a2a721, 579a, commit be27fb7nofollow1a>13a, 提交 7bd9631, commit 3c1dce8, 提交 d64db5bcommit 76a7bc0(2019 年 5 月 9 日)由 杰夫·金 (peff).
(由 Junio C Hamano 合并 -- gitster --commit c0e78f7,2019 年 6 月 13 日)

<块引用>

读取缓存:从线程加载中删除未使用的参数

<块引用>

load_cache_entries_threaded() 函数接受一个 src_offset 参数它不使用.自 77ff112 (read-cache代码>:在工作线程上加载缓存条目,2018-10-10,Git v2.20.0-rc0).

挖掘邮件列表,该参数是的一部分该系列的早期迭代,但当代码切换到使用 IEOT 扩展时就没有必要了.


在 Git 2.29(2020 年第 4 季度)中,格式描述会根据最近的 SHA-256 工作进行调整.

参见 commit 8afa50acommit 0756e61, 提交 123712b提交 5b6422a) <8 月 21 日之前a href="https://github.com/none" rel="nofollow noreferrer">Martin Ågren (none).
(由 Junio C Hamano 合并 -- gitster --commit 74a395c,2020 年 8 月 19 日)<块引用>

index-format.txt:文档SHA-256 索引格式

签字人:Martin Ågren

<块引用>

记录在 SHA-1 存储库中,我们使用 SHA-1,而在 SHA-256 存储库中,我们使用 SHA-256,然后替换SHA-1"的所有其他用途;更中性的东西.
避免提及160 位"哈希值.

technical/index-format 现在包含在其 手册页:

<块引用>

所有二进制数均按网络字节顺序排列.
在使用传统 SHA-1、校验和和对象 ID 的存储库中下面提到的(对象名称)都是使用 SHA-1 计算的.
同样,在 SHA-256 存储库中,这些值是使用 SHA-256 计算的.

此处描述了第 2 版,除非另有说明.

What does the Git index exactly contain, and what command can I use to view the content of the index?


Thanks for all your answers. I know that the index acts as a staging area, and what is committed is in the index rather than the working tree. I am just curious about what an index object consists of. I guess it might be a list of filename/directory names, SHA-1 pairs, a kind of virtual tree maybe?

Is there, in Git terminology, any plumbing command that I can use to list the contents of the index?

解决方案

The Git book contains an article on what an index includes:

The index is a binary file (generally kept in .git/index) containing a sorted list of path names, each with permissions and the SHA1 of a blob object; git ls-files can show you the contents of the index:

$ git ls-files --stage
100644 63c918c667fa005ff12ad89437f2fdc80926e21c 0   .gitignore
100644 5529b198e8d14decbe4ad99db3f7fb632de0439d 0   .mailmap

The Racy git problem gives some more details on that structure:

The index is one of the most important data structures in git.
It represents a virtual working tree state by recording list of paths and their object names and serves as a staging area to write out the next tree object to be committed.
The state is "virtual" in the sense that it does not necessarily have to, and often does not, match the files in the working tree.


Nov. 2021: see also "Make your monorepo feel small with Git’s sparse index" from Derrick Stolee (Microsoft/GitHub)

The Git index is a critical data structure in Git. It serves as the "staging area" between the files you have on your filesystem and your commit history.

  • When you run git add, the files from your working directory are hashed and stored as objects in the index, leading them to be "staged changes".
  • When you run git commit, the staged changes as stored in the index are used to create that new commit.
  • When you run git checkout, Git takes the data from a commit and writes it to the working directory and the index.

In addition to storing your staged changes, the index also stores filesystem information about your working directory.
This helps Git report changed files more quickly.


To see more, cf. "git/git/Documentation/technical/index-format.txt":

The Git index file has the following format

All binary numbers are in network byte order.
Version 2 is described here unless stated otherwise.

  • A 12-byte header consisting of:
  • 4-byte signature:
    The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache")
  • 4-byte version number:
    The current supported versions are 2, 3 and 4.
  • 32-bit number of index entries.
  • A number of sorted index entries.
  • Extensions:
    Extensions are identified by signature.
    Optional extensions can be ignored if Git does not understand them.
    Git currently supports cached tree and resolve undo extensions.
  • 4-byte extension signature. If the first byte is 'A'..'Z' the extension is optional and can be ignored.
  • 32-bit size of the extension
  • Extension data
  • 160-bit SHA-1 over the content of the index file before this checksum.


mljrg comments:

If the index is the place where the next commit is prepared, why doesn't "git ls-files -s" return nothing after commit?

Because the index represents what is being tracked, and right after a commit, what is being tracked is identical to the last commit (git diff --cached returns nothing).

So git ls-files -s lists all files tracked (object name, mode bits and stage number in the output).

That list (of element tracked) is initialized with the content of a commit.
When you switch branch, the index content is reset to the commit referenced by the branch you just switched to.


Git 2.20 (Q4 2018) adds an Index Entry Offset Table (IEOT):

See commit 77ff112, commit 3255089, commit abb4bb8, commit c780b9c, commit 3b1d9e0, commit 371ed0d (10 Oct 2018) by Ben Peart (benpeart).
See commit 252d079 (26 Sep 2018) by Nguyễn Thái Ngọc Duy (pclouds).
(Merged by Junio C Hamano -- gitster -- in commit e27bfaa, 19 Oct 2018)

ieot: add Index Entry Offset Table (IEOT) extension

This patch enables addressing the CPU cost of loading the index by adding additional data to the index that will allow us to efficiently multi- thread the loading and conversion of cache entries.

It accomplishes this by adding an (optional) index extension that is a table of offsets to blocks of cache entries in the index file.

To make this work for V4 indexes, when writing the cache entries, it periodically"resets" the prefix-compression by encoding the current entry as if the path name for the previous entry is completely different and saves the offset of that entry in the IEOT.
Basically, with V4 indexes, it generates offsets into blocks of prefix-compressed entries.

With the new index.threads config setting, the index loading is now faster.


As a result (of using IEOT), commit 7bd9631 clean-up the read-cache.c load_cache_entries_threaded() function for Git 2.23 (Q3 2019).

See commit 8373037, commit d713e88, commit d92349d, commit 113c29a, commit c95fc72, commit 7a2a721, commit c016579, commit be27fb7, commit 13a1781, commit 7bd9631, commit 3c1dce8, commit cf7a901, commit d64db5b, commit 76a7bc0 (09 May 2019) by Jeff King (peff).
(Merged by Junio C Hamano -- gitster -- in commit c0e78f7, 13 Jun 2019)

read-cache: drop unused parameter from threaded load

The load_cache_entries_threaded() function takes a src_offset parameter that it doesn't use. This has been there since its inception in 77ff112 (read-cache: load cache entries on worker threads, 2018-10-10, Git v2.20.0-rc0).

Digging on the mailing list, that parameter was part of an earlier iteration of the series, but became unnecessary when the code switched to using the IEOT extension.


With Git 2.29 (Q4 2020), the format description adjusts to the recent SHA-256 work.

See commit 8afa50a, commit 0756e61, commit 123712b, commit 5b6422a (15 Aug 2020) by Martin Ågren (none).
(Merged by Junio C Hamano -- gitster -- in commit 74a395c, 19 Aug 2020)

index-format.txt: document SHA-256 index format

Signed-off-by: Martin Ågren

Document that in SHA-1 repositories, we use SHA-1 and in SHA-256 repositories, we use SHA-256, then replace all other uses of "SHA-1" with something more neutral.
Avoid referring to "160-bit" hash values.

technical/index-format now includes in its man page:

All binary numbers are in network byte order.
In a repository using the traditional SHA-1, checksums and object IDs (object names) mentioned below are all computed using SHA-1.
Similarly, in SHA-256 repositories, these values are computed using SHA-256.

Version 2 is described here unless stated otherwise.

这篇关于git 索引究竟包含什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆