git如何存储文件? [英] How does git store files?

查看:137
本文介绍了git如何存储文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚开始学习git,为此我开始阅读添加了此重要评论


在概念层面上真实而重要的是,它在存储层面不是真实的。

Git确实使用增量存储

不仅如此,它比其他系统更高效。因为它不会保留每个文件的历史记录,所以当要进行增量压缩时,它需要每个文件blob,选择一些可能相似的blob(使用启发式方法,包括先前版本和其他版本的最接近的近似值),尝试生成变化量并选择最小的一个。这样它可以(通常取决于试探法)利用其他类似的文件或比以前更类似的较旧版本。 包装窗口参数允许交易表现为增量压缩质量。默认值(10)通常会给出相当好的结果,但是当空间有限或加速网络传输时, git gc --aggressive 使用值250,这会使其运行速度非常慢,但为历史数据提供额外的压缩。



I just started learning git and to do so I started reading the Git Community Book, and in this book they say that SVN and CVS store the difference between files and that git stores a snapshot of all the files.

But I didn't really get what they mean by snapshot. Does git really make a copy of all the files in each commit because that's what I understood from their explanation.

PS: If any one has any better source to learn git I would appreciate it.

解决方案

Git does store for each commit a full copy of all the files, except that, for the content already present in the Git repo, the snapshot will simply point to said content rather than duplicate it.
That also means that several files with the same content are stored only once.

So a snapshot is basically a commit, referring to the content of a directory structure.

Some good references are:

You tell Git you want to save a snapshot of your project with the git commit command and it basically records a manifest of what all of the files in your project look like at that point

Lab 12 illustrates how to get previous snapshots


The progit book has the more comprehensive description of a snapshot:

The major difference between Git and any other VCS (Subversion and friends included) is the way Git thinks about its data.
Conceptually, most other systems store information as a list of file-based changes. These systems (CVS, Subversion, Perforce, Bazaar, and so on) think of the information they keep as a set of files and the changes made to each file over time

Git doesn’t think of or store its data this way. Instead, Git thinks of its data more like a set of snapshots of a mini filesystem.
Every time you commit, or save the state of your project in Git, it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot.
To be efficient, if files have not changed, Git doesn’t store the file again—just a link to the previous identical file it has already stored.
Git thinks about its data more like as below:

This is an important distinction between Git and nearly all other VCSs. It makes Git reconsider almost every aspect of version control that most other systems copied from the previous generation. This makes Git more like a mini filesystem with some incredibly powerful tools built on top of it, rather than simply a VCS.


Jan Hudec adds this important comment:

While that's true and important on the conceptual level, it is NOT true at the storage level.
Git does use deltas for storage.
Not only that, but it's more efficient in it than any other system. Because it does not keep per-file history, when it wants to do delta compression, it takes each blob, selects some blobs that are likely to be similar (using heuristics that includes the closest approximation of previous version and some others), tries to generate the deltas and picks the smallest one. This way it can (often, depends on the heuristics) take advantage of other similar files or older versions that are more similar than the previous. The "pack window" parameter allows trading performance for delta compression quality. The default (10) generally gives decent results, but when space is limited or to speed up network transfers, git gc --aggressive uses value 250, which makes it run very slow, but provide extra compression for history data.

这篇关于git如何存储文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆