git:为什么声称“git是基于文件之间的差异”错误? [英] git: why exactly is the claim "git is based on differences between files" wrong?
问题描述
我知道 git add
只保存特定文件的新快照。但我对快照一词有点困惑。正如我理解git(例如 或 )来源,快照实际上只是与上次提交的区别。
引用来自 2 :
< blockquote>
它基本上会拍摄您当时所有文件的外观,并存储对该快照的引用。为了高效,如果文件没有改变,Git不会再次存储该文件,只是指向它已存储的前一个相同文件的链接。
这对我来说听起来就像是基于文件差异的系统描述 - .-
编辑: 具体一点: => 如果我打电话给vimdiff(只是为了说明这个概念)到一个blob上并将输出保存在一个新的blob中? => 更改后的blob如何看起来与其他blob有共同之处? 简短回答:不,Git始终记录整个文件。 但是,请务必记住,这是存储后端的内部实现细节。它不是的对象模型的一部分。对象模型是每个提交都包含整个树。 这是Git的对象模型: blob :一个字节流。基本上,一个文件,但只有其其内容。它没有名字。通过这种方式,Git就像Unix文件系统一样工作,文件没有名称,而是目录将文件与文件关联起来。 commit :指向树和指向零个,一个或多个父提交的指针。还包含一个datestamp和两个名称字符串(作者和提交者),最重要的是提交信息。 签名标签:包含一个带注释的标签(???)和一个数字签名[不知道这个,是建立在注释标签之上还是重复它?] I know Quote from 2: it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. To be efficient, if files have not changed, Git doesn’t store the file again, just a link to the previous identical file it has already stored. That sounds to me exactly like the description of a system based on file differences -.- EDIT: To be a bit more specific: I understood, that if a blob isn't modified, the hash isn't changed and therefore used in further commits. It also makes sense to me, that git can detect similarities between blobs and hence eliminate redundancy. => Would it be equivalent if i would call e.g. vimdiff (just to illustrate the concept) onto a blob and save the output in a new blob? => How does a changed blob look like that has things in common with other blobs? Short answer: No, Git always records the entire file. Longer answer: Okay, that's not quite true. Logically, Git always records the entire file. In the storage backend, however, Git performs delta compression across all files from all revisions, so it even detects identical content between different files and across the entire history of all branches, not just the parent commit. And since the network protocol and the storage backend share the same format ("pack files"), you get the same efficiency for However, it is important to remember that this is an internal implementation detail of the storage backend. It is not a part of the object model. The object model is that each commit contains the entire tree. This is Git's object model: blob: a bytestream. Basically, a file, but only its contents. It doesn't have a name. In this way, Git works like a Unix filesystem, files don't have names, rather directories associate names with files. tree: a flat(!!!) list of commit: a pointer to a tree and a pointer to zero, one, or many parent commits. Also contains a datestamp and two name strings (author and committer) and most importantly, the commit message. (local tag): technically, not a Git object. Just a local file pointing to a commit. annotated tag: contains a pointer to a commit, a name, and an annotation message. signed tag: contains an annotated tag(???) and a digital signature [not sure about this one, is built on top of an annotated tag or does it duplicate it?] note: a piece of text that can be attached to any Git object. This can be used to add arbitrary user-defined metadata to any Git object, e.g. a CI server could attach code coverage results to commits or a bug tracker could attach links to tickets to commits which fix a bug, a web server could attach MIME types to blobs, a release management system could attach go/no-go votes to annotated tags, … Note that only blobs actually contain file data. The rest is just pointers. And blobs don't have names, which means that as long as a blob has the same content, it is the same blob, and thus only exists once in the object store. In fact, it even exists only once in the entire Git universe! For example, the FSF's GPL 这篇关于git:为什么声称“git是基于文件之间的差异”错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
<我明白了,如果一个blob没有被修改,那么这个散列不会被改变,因此可以用于进一步的提交。对我来说,git也可以检测出斑点之间的相似性,从而消除冗余。
push
和 fetch
。
(模式,名称,{tree | blob})
三元组的扁平(!!!)列表。这相当于一个Unix目录。它将名称和模式(主要是可执行或不可用)与斑点或树木相关联。即树可以是递归的。
COPYING
文件将是完全相同的blob,即使在完全不相关的存储库中!git add
saves just a new snapshot of a particular file. But I'm a bit confused about the term "snapshot". As I understood git (e.g. by that or that) source, a snapshot is actually a just the difference to the last commit.
push
and fetch
.
(mode, name, {tree|blob})
triples. This is the equivalent to a Unix directory. It associates names and modes (mainly executable or not) with blobs or trees. I.e. trees can be recursive.COPYING
file, as long as you keep it unmodified, will be the exact same blob, even in totally unrelated repositories!