Git的blob数据和差异信息 [英] Git's blob data and diff information

查看:257
本文介绍了Git的blob数据和差异信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

据我所知,Git的blob使用SHA1哈希作为文件名,以便不复制存储库中的文件。例如,如果文件A的内容为abc,并且SHA1哈希为12345,则只要内容不变,提交/分支可以指向相同的SHA1。



但是,如果将文件A修改为def以使SHA哈希为23456,会发生什么? Git是否存储文件A和修改后的文件A(不仅仅是区别,而是整个文件)?


  • 如果是这样,为什么?存储差异信息是否更好?

  • 如果不是,diff如何跟踪文件中的更改?

  • 另一个VCS系统 - CVS / SVN / Perforce ...?



ADDED



以下是'Git Community Book'我的问题。


需要注意的是,这与您可能熟悉的大多数SCM系统有很大不同。 Subversion,CVS,Perforce,Mercurial等都使用Delta Storage系统 - 它们存储一次提交和下一次提交之间的差异。 Git不会这样做 - 它会在您每次提交时存储项目中所有文件在此树结构中的外观。这是了解使用Git时非常重要的概念。

解决方案

git按内容存储文件而不是diff,所以在你的例子中,两个版本的A(abc和def)将存储在对象数据库中。




  • 它可以更好地存储整个对象,因为它非常通过比较它们的SHA很容易看出两个版本的文件是否相同。查看 git-book 以获取详细信息如何存储对象。这样做会更好,因为如果使用diff来跟踪文件,则需要重建文件的整个历史记录。容易在中央系统中执行,但不能在可能对文件进行许多不同更改的分布式系统中执行。

  • Git直接从对象。

As far as I know, Git's blob has SHA1 hash as file name, in order not to duplicate the file in the repository.

For example, if file A has a content of "abc" and has a SHA1 hash as "12345", as long as the content doesn't change, the commits/branches can point to the same SHA1.

But, what would happen if file A is modified to "def" to have SHA hash "23456"? Does Git store file A, and modified file A (not the difference only, but the whole file)?

  • If so, why is that? Isn't it better to store the diff info?
  • If not, how does diff track the changes in a file?
  • How about the other VCS systems - CVS/SVN/Perforce...?

ADDED

The following from 'Git Community Book' answers most of my questions.

It is important to note that this is very different from most SCM systems that you may be familiar with. Subversion, CVS, Perforce, Mercurial and the like all use Delta Storage systems - they store the differences between one commit and the next. Git does not do this - it stores a snapshot of what all the files in your project look like in this tree structure each time you commit. This is a very important concept to understand when using Git.

解决方案

git stores files by content rather than diffs so in your example, both versions of A ("abc" and "def") would be stored in the object database.

  • It works out better to store whole objects because it is very easy to see if two versions of the file are the same or not just by comparing their SHAs. Have a look at the git-book for details on how the objects are stored. This works out better because if files were tracked with diffs you would need the entire history of a file to reconstruct it. Easy to do in a centralised system, but not in a distributed system where there can be many different changes to a file.

  • Git performs the diff directly from the objects.

这篇关于Git的blob数据和差异信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆