git 是如何存储文件的? [英] How does git store files?

查看:38
本文介绍了git 是如何存储文件的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚开始学习 git,为此我开始阅读

Git 不会以这种方式考虑或存储其数据.相反,Git 认为它的数据更像是一组迷你文件系统的快照.
每次提交或在 Git 中保存项目状态时,它基本上都会拍下当时所有文件的样子并存储对该快照的引用.
为了提高效率,如果文件没有改变,Git 不会再次存储文件——只是一个指向它已经存储的前一个相同文件的链接.
Git 认为它的数据更像是如下:

这是 Git 与几乎所有其他 VCS 之间的重要区别.它使 Git 重新考虑大多数其他系统从上一代复制的版本控制的几乎每个方面.这使得 Git 更像是一个迷你文件系统,在它之上构建了一些非常强大的工具,而不仅仅是一个 VCS.

另见:


Jan Hudec 添加了这个 重要评论:

<块引用>

虽然这在概念层面上是正确且重要的,但在存储层面上却并非如此.
Git 确实使用增量存储.
不仅如此,它比任何其他系统都更有效率.因为它不保留每个文件的历史记录,当它想做增量压缩时,它需要每个blob,选择一些可能相似的 blob(使用包含先前版本和其他一些最接近的近似值的启发式算法),尝试生成增量并选择最小的增量.通过这种方式,它可以(通常取决于启发式)利用其他类似文件或比以前更相似的旧版本.包装窗口"参数允许增量压缩质量的交易性能.默认值 (10) 通常会给出不错的结果,但是当空间有限或为了加快网络传输速度时,git gc --aggressive 使用值 250,这使得它运行得非常慢,但为历史数据.

I just started learning git and to do so I started reading the Git Community Book, and in this book they say that SVN and CVS store the difference between files and that git stores a snapshot of all the files.

But I didn't really get what they mean by snapshot. Does git really make a copy of all the files in each commit because that's what I understood from their explanation.

PS: If any one has any better source to learn git I would appreciate it.

解决方案

Git does include for each commit a full copy of all the files, except that, for the content already present in the Git repo, the snapshot will simply point to said content rather than duplicate it.
That also means that several files with the same content are stored only once.

So a snapshot is basically a commit, referring to the content of a directory structure.

Some good references are:

You tell Git you want to save a snapshot of your project with the git commit command and it basically records a manifest of what all of the files in your project look like at that point

Lab 12 illustrates how to get previous snapshots


The progit book has the more comprehensive description of a snapshot:

The major difference between Git and any other VCS (Subversion and friends included) is the way Git thinks about its data.
Conceptually, most other systems store information as a list of file-based changes. These systems (CVS, Subversion, Perforce, Bazaar, and so on) think of the information they keep as a set of files and the changes made to each file over time

Git doesn’t think of or store its data this way. Instead, Git thinks of its data more like a set of snapshots of a mini filesystem.
Every time you commit, or save the state of your project in Git, it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot.
To be efficient, if files have not changed, Git doesn’t store the file again—just a link to the previous identical file it has already stored.
Git thinks about its data more like as below:

This is an important distinction between Git and nearly all other VCSs. It makes Git reconsider almost every aspect of version control that most other systems copied from the previous generation. This makes Git more like a mini filesystem with some incredibly powerful tools built on top of it, rather than simply a VCS.

See also:


Jan Hudec adds this important comment:

While that's true and important on the conceptual level, it is NOT true at the storage level.
Git does use deltas for storage.
Not only that, but it's more efficient in it than any other system. Because it does not keep per-file history, when it wants to do delta compression, it takes each blob, selects some blobs that are likely to be similar (using heuristics that includes the closest approximation of previous version and some others), tries to generate the deltas and picks the smallest one. This way it can (often, depends on the heuristics) take advantage of other similar files or older versions that are more similar than the previous. The "pack window" parameter allows trading performance for delta compression quality. The default (10) generally gives decent results, but when space is limited or to speed up network transfers, git gc --aggressive uses value 250, which makes it run very slow, but provide extra compression for history data.

这篇关于git 是如何存储文件的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆