git使用什么算法来检测工作树上的变化? [英] What algorithm does git use to detect changes on your working tree?

查看:235
本文介绍了git使用什么算法来检测工作树上的变化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是关于 git 的内部结构。



我一直在阅读'Pro Git'书,学习一些关于git如何在内部工作的知识(所有关于SHA1,blob,引用,tress,commitits等等)。顺便说一下,这是非常聪明的架构。因此,为了放入上下文,git将文件的内容作为SHA1值引用,以便它能够知道某个特定内容已经改变,只是比较哈希值。但我的问题是关于git如何检查工作树中的内容是否发生了变化。



天真的方法会认为,每次运行命令作为 git status 或类似命令,它将搜索工作目录中的所有文件,计算SHA1并将其与最后一次提交的文件进行比较。但是,对于大型项目来说,这似乎是非常低效的,就像Linux内核一样。



另一个想法可能是检查文件的最后修改日期,但我认为git没有存储信息(当你克隆一个仓库时,所有的文件都会有一个新的时间)

我确定它是以一种有效的方式进行的(git非常快),任何人如何实现?



PD:只需添加一个有趣的链接关于git索引,特别指出索引保留有关文件时间戳的信息,即使树对象没有。

解决方案

Git的索引维护git最后一次将每个文件写入工作树的时间戳(并且每当文件从工作树或提交)。您可以使用 git ls-files --debug 来查看元数据。除了时间戳之外,它还记录了 lstat ,以减少误报的可能性。

执行git-status时,它只会调用 lstat ,并比较元数据以便快速确定哪些文件未更改。这在 racy-git update-index a>。


This is about the internals of git.

I've been reading the great 'Pro Git' book and learning a little about how git is working internally (all about the SHA1, blobs, references, tress, commits, etc, etc). Pretty clever architecture, by the way.

So, to put into context, git references the content of a file as a SHA1 value, so its able to know if a specific content has changed just comparing the hash values. But my question is specifically about how git checks that the content in the working tree has changed or not.

The naive approach will be thinking that, each time you run a command as git status or similar command, it will search through all the files on the working directory, calculating the SHA1 and comparing it with the one that has the last commit. But that seems very inefficient for big projects, as the Linux kernel.

Another idea could be to check last modification date on the file, but I think git is not storing that information (when you clone a repository, all the files have a new time)

I'm sure it's doing it on a efficient way (git is really fast), does anyone how that is achieved?

PD: Just to add an interesting link about the git index, specifically stating that the index keeps information about files timestamps, even when the tree objects does not.

解决方案

Git’s index maintains timestamps of when git last wrote each file into the working tree (and updates these whenever files are cached from the working tree or from a commit). You can see the metadata with git ls-files --debug. In addition to the timestamp, it records the size, inode, and other information from lstat to reduce the chance of a false positive.

When you perform git-status, it simply calls lstat on every file in the working tree and compares the metadata in order to quickly determine which files are unchanged. This is described in the documentation under racy-git and update-index.

这篇关于git使用什么算法来检测工作树上的变化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆