git如何检测类似的文件,因为它的重命名检测? [英] How does git detect similar files, for its rename detection?

查看:250
本文介绍了git如何检测类似的文件,因为它的重命名检测?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

维基百科解释了自动重命名检测:
$ b


简而言之,给定版本N中的文件,$ b中具有相同名称的文件$ b修订版N-1是其默认祖先。但是,当修订版N-1中没有
类似命名的文件时,Git仅在修订版N-1中搜索存在
的文件,并且与非常相似新文件。

重命名检测显然归结为类似的文件检测。算法是否在任何地方记录?这将是很高兴知道自动检测到哪些类型的转换。

Git跟踪文件内容,而不是文件名。所以重命名文件而不更改其内容对于git来说很容易检测。 (Git不会跟踪,但执行 检测 ;使用 git mv git rm git add 实际上是相同的。)



当文件被添加到存储库中,文件名在树对象中。实际的文件内容作为二进制大对象( blob )添加到存储库中。 Git不会为包含相同内容的其他文件添加另一个Blob。事实上,Git不能将内容存储在文件系统中,哈希的前两个字符是目录名称,其余的是文件名称。因此,检测重命名是一个比较哈希值的问题。



为了检测重命名文件的小改动,Git使用特定算法和阈值限制来查看这是否是重命名。例如,查看 git diff -M 标志。还有一些配置值,例如 merge.renameLimit (在合并期间执行重命名检测时要考虑的文件数量)。

为了理解git如何处理类似的文件(即,哪些文件转换被视为重命名),请探索配置选项和可用的标志,如上所述。你不需要考虑如何。要了解git如何实际完成这些任务,请查看用于查找文本差异的算法,并阅读git源代码。

算法仅适用于diff,merge,和日志的目的 - 它们不影响git如何存储它们。文件内容的任何小的改变意味着为它添加新的对象。在该级别没有发生增量或差异。当然,稍后,这些对象可能会打包在存储在packfiles中的三角洲中,但与重命名检测无关。


Wikipedia explains the automatic rename detection:

Briefly, given a file in revision N, a file of the same name in revision N−1 is its default ancestor. However, when there is no like-named file in revision N−1, Git searches for a file that existed only in revision N−1 and is very similar to the new file.

Rename detection apparently boils down to similar file detection. Is that algorithm documented anywhere? It would be nice to know what kinds of transformations are detected automatically.

解决方案

Git tracks file contents, not filenames. So renaming a file without changing its content is easy for git to detect. (Git does not track, but performs detection; using git mv or git rm and git add is effectively the same.)

When a file is added to the repository, the filename is in the tree object. The actual file contents are added as a binary large object (blob) in the repository. Git will not add another blob for additional files that contain the same content. In fact, Git cannot as the content is stored in the filesystem with first two characters of the hash being the directory name and the rest being the name of file within it. So to detect renames is a matter of comparing hashes.

To detect small changes to a renamed file, Git uses certain algorithms and a threshold limit to see if this is a rename. For example, have a look at the -M flag for git diff. There are also configuration values such as merge.renameLimit (the number of files to consider when performing rename detection during a merge).

To understand how git treats similar files (i.e., what file transformations are considered as renames), explore the configuration options and flags available, as mentioned above. You need not be considered with the how. To understand how git actually accomplishes these tasks, look at the algorithms for finding differences in text, and read the git source code.

Algorithms are applied only for diff, merge, and log purposes -- they do not affect how git stores them. Any small change in file content means a new object is added for it. There is no delta or diff happening at that level. Of course, later, the objects might be packed where deltas are stored in packfiles, but that is not related to the rename detection.

这篇关于git如何检测类似的文件,因为它的重命名检测?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆