git:为什么声称“git是基于文件之间的差异”错误? [英] git: why exactly is the claim "git is based on differences between files" wrong?

查看:134
本文介绍了git:为什么声称“git是基于文件之间的差异”错误?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道 git add 只保存特定文件的新快照。但我对快照一词有点困惑。正如我理解git(例如 )来源,快照实际上只是与上次提交的区别。



引用来自 2



< blockquote>

它基本上会拍摄您当时所有文件的外观,并存储对该快照的引用。为了高效,如果文件没有改变,Git不会再次存储该文件,只是指向它已存储的前一个相同文件的链接。


这对我来说听起来就像是基于文件差异的系统描述 - .-



编辑:

具体一点:

<我明白了,如果一个blob没有被修改,那么这个散列不会被改变,因此可以用于进一步的提交。对我来说,git也可以检测出斑点之间的相似性,从而消除冗余。



=> 如果我打电话给vimdiff(只是为了说明这个概念)到一个blob上并将输出保存在一个新的blob中?



=> 更改后的blob如何看起来与其他blob有共同之处?

简短回答:不,Git始终记录整个文件。

强>更长的回答
:好的,这不是真的。 ,Git总是记录整个文件。然而,在存储后端中,Git从所有修订版所有文件执行增量压缩,因此它甚至可以检测不同文件之间以及跨全部分支的整个历史记录,而不仅仅是父提交。由于网络协议和存储后端共享相同的格式(包文件),因此对于 push fetch



但是,请务必记住,这是存储后端的内部实现细节。它不是的对象模型的一部分。对象模型是每个提交都包含整个树。



这是Git的对象模型:


  • blob :一个字节流。基本上,一个文件,但只有其内容。它没有名字。通过这种方式,Git就像Unix文件系统一样工作,文件没有名称,而是目录将文件与文件关联起来。

  • >:(模式,名称,{tree | blob})三元组的扁平(!!!)列表。这相当于一个Unix目录。它将名称和模式(主要是可执行或不可用)与斑点或树木相关联。即树可以是递归的。
  • commit :指向树和指向零个,一个或多个父提交的指针。还包含一个datestamp和两个名称字符串(作者和提交者),最重要的是提交信息。

  • (local tag):从技术上讲,不是Git对象。指向提交的本地文件。消息。
  • 签名标签:包含一个带注释的标签(???)和一个数字签名[不知道这个,是建立在注释标签之上还是重复它?]

  • 到任何Git对象。这可以用来将任意用户定义的元数据添加到任何Git对象,例如, CI服务器可以将代码覆盖率结果附加到提交,或者错误跟踪器可以将链接附加到修正错误的提交的提交单,Web服务器可以将MIME类型附加到blob,发布管理系统可以附加去/不去投票来注释标记,...... blob实际上包含文件数据。剩下的只是指针。 blob没有名称,这意味着只要blob具有相同的内容,它就是同一个blob,因此只在对象存储中存在一次。事实上,它甚至在整个Git世界中只存在一次!例如,只要你保持不变,FSF的GPL COPYING 文件将是完全相同的blob,即使在完全不相关的存储库中!


    I know git add saves just a new snapshot of a particular file. But I'm a bit confused about the term "snapshot". As I understood git (e.g. by that or that) source, a snapshot is actually a just the difference to the last commit.

    Quote from 2:

    it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot. To be efficient, if files have not changed, Git doesn’t store the file again, just a link to the previous identical file it has already stored.

    That sounds to me exactly like the description of a system based on file differences -.-

    EDIT:

    To be a bit more specific:

    I understood, that if a blob isn't modified, the hash isn't changed and therefore used in further commits. It also makes sense to me, that git can detect similarities between blobs and hence eliminate redundancy.

    => Would it be equivalent if i would call e.g. vimdiff (just to illustrate the concept) onto a blob and save the output in a new blob?

    => How does a changed blob look like that has things in common with other blobs?

    解决方案

    Short answer: No, Git always records the entire file.

    Longer answer: Okay, that's not quite true. Logically, Git always records the entire file. In the storage backend, however, Git performs delta compression across all files from all revisions, so it even detects identical content between different files and across the entire history of all branches, not just the parent commit. And since the network protocol and the storage backend share the same format ("pack files"), you get the same efficiency for push and fetch.

    However, it is important to remember that this is an internal implementation detail of the storage backend. It is not a part of the object model. The object model is that each commit contains the entire tree.

    This is Git's object model:

    • blob: a bytestream. Basically, a file, but only its contents. It doesn't have a name. In this way, Git works like a Unix filesystem, files don't have names, rather directories associate names with files.

    • tree: a flat(!!!) list of (mode, name, {tree|blob}) triples. This is the equivalent to a Unix directory. It associates names and modes (mainly executable or not) with blobs or trees. I.e. trees can be recursive.

    • commit: a pointer to a tree and a pointer to zero, one, or many parent commits. Also contains a datestamp and two name strings (author and committer) and most importantly, the commit message.

    • (local tag): technically, not a Git object. Just a local file pointing to a commit.

    • annotated tag: contains a pointer to a commit, a name, and an annotation message.

    • signed tag: contains an annotated tag(???) and a digital signature [not sure about this one, is built on top of an annotated tag or does it duplicate it?]

    • note: a piece of text that can be attached to any Git object. This can be used to add arbitrary user-defined metadata to any Git object, e.g. a CI server could attach code coverage results to commits or a bug tracker could attach links to tickets to commits which fix a bug, a web server could attach MIME types to blobs, a release management system could attach go/no-go votes to annotated tags, …

    Note that only blobs actually contain file data. The rest is just pointers. And blobs don't have names, which means that as long as a blob has the same content, it is the same blob, and thus only exists once in the object store. In fact, it even exists only once in the entire Git universe! For example, the FSF's GPL COPYING file, as long as you keep it unmodified, will be the exact same blob, even in totally unrelated repositories!

    这篇关于git:为什么声称“git是基于文件之间的差异”错误?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆