Git内部:Git如何在修订之间存储小的差异? [英] Git internals: how does Git store small differences between revisions?

查看:134
本文介绍了Git内部:Git如何在修订之间存储小的差异?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我了解,一些VCS存储版本之间的差异,因为差异有时很小 - 源代码中的一行更改或在随后的修订中添加注释。另一方面,Git为每个版本存储压缩的快照。

如果只做了很小的更改(大文本文件中的一行),Git如何处理?它是否存储两个几乎完全相同的副本?我认为这将是空间利用效率低下的原因。

解决方案

Git使用补丁或 hunks 。它计算2版本之间引入的差异并存储它。


存储两个几乎完全相同的副本?这将是一个低效率的空间使用,我认为。


Git扫描您的代码(启发式),一次只存储差异。如果git在多个文件中找到相同的代码,它会为类似的代码生成 hunk ,并将指针存储在原始位置。



为了简单起见 - 它比下面解释的要复杂得多,简单易用,让您更容易理解。



一旦你的代码被扫描,git会搜索以前提交的更改,如果发现更改,git会将旧更改拆分为 hunk 。<
如果您在文件的中间添加了代码,那么它将被分割为3个区块(顶层=旧代码,中间新代码,底层代码),现在您将拥有3个区块。下一次,git会扫描您的代码,他将使用这3个区块来搜索更改。



例如:假设您拥有许多许可协议每个文件的顶部,这是相同的所有文件。

Git将扫描文件和第一个大块将存储为补丁,在所有其他文件git将放置一个指针指向这个庞然大物。



通过这种方式,git以非常有效的方式存储信息。




如果你想看到它的动作,使用 git add -p 并选择 s 对于分裂。








如上所述,hunk是一个差异,这里有一点关于它的内容。
hunk 是一个与diff相关的术语,下面是git如何直观地显示它(补丁):


格式以与上下文格式相同的双行标题开始,除了原始文件之前是 --- ,并且新文件前面是 +++

在此之后是一个或多个包含文件中行差异的更改区块。

未更改的上下文行以空格字符开头,添加行前面加上一个加号,删除行前面加上减号。







更多信息:



https://github.com/mirage/ocaml-git/blob/master/doc/pack-heuristics.txt


As I understand, some VCSs store differences between revisions, because, well, the differences are sometimes small - one line in a source code is changed or a comment is added in a subsequent revision. Git, on the other hand, stores compressed "snapshots" for each revision.

If only a small change has been made (one line in a large text file), how does Git treat this? Does it store two copies that are almost identical? This would be an inefficient use of space, I'd think.

解决方案

Git use patches or hunks. It calculate the diff introduced between the 2 version and store it.

store two copies that are almost identical? This would be an inefficient use of space, I'd think.

Git scans your code (heuristics) and once only store differences. If git finds the same code in multiple files it generate hunk for the similar code and store pointer to it in the the original location.

To make it simple - its much more complicated than how its explained below, making it simple so you can understand it more easily.

Once your code is scanned git search for changes from previous commit, if a change is found git split the old change to a hunk.
If you added code in a middle of the file so it will be splitted to 3 hunks (top = old code, middle - new code, bottom - old code) and now you will have 3 hunks. Next time git will scan your code he will use those 3 hunks to search for changes.

For example: Lets say that you have a bunch of files with the license agreement on top of each file and this is identical in all of your files.
Git will scan the files and the first hunk will be stored as patch, on all other files git will place a pointer pointing to this hunk.

This way git store the information in a very efficient way.


If you want to see it action use git add -p and select s for split.


The patch itself looks like:


As explained above hunk is a diff and here is a little bit about that. hunk is a term related to diff, and here is how git display it visually (patch):

The format starts with the same two-line header as the context format, except that the original file is preceded by --- and the new file is preceded by +++.

Following this are one or more change hunks that contain the line differences in the file.
The unchanged, contextual lines are preceded by a space character, addition lines are preceded by a plus sign, and deletion lines are preceded by a minus sign.


More info:

https://github.com/mirage/ocaml-git/blob/master/doc/pack-heuristics.txt

这篇关于Git内部:Git如何在修订之间存储小的差异?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆