为什么Git会在Blob文件中存储(和散列)Blob大小? [英] Why does Git store (and hash) blob size in the blob file?

查看:254
本文介绍了为什么Git会在Blob文件中存储(和散列)Blob大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Git的blob目标文件格式为blob <size string>\0<data>. 识别blob的SHA-1哈希不是单独根据blob的内容计算的,而是根据标头增强的blob数据(如上所述)计算的.

Git's blob object file format is blob <size string>\0<data>. The blob-identifying SHA-1 hash is calculated not from the blob contents alone, but from the header-augmented blob data (as described above).

作为一个纯粹主义者,我不喜欢那种架构.它将数据的通用属性(其SHA1哈希)与一些特定于git的标头混合在一起.

As a purist I do not like that architecture. It mixes the universal property of the data (its SHA1 hash) with some git-specific header.

纯数据Blob存储的另一个优点是可以使用写时复制"将文件添加到索引中,而不用复制整个文件.所需的空间可以减半,并且某些操作可以变得更快.

Another advantage of pure-data blob storage is that the files can be added to the index using "copy-on-write" instead of copying the whole file. The required space could be halved and some operations could become faster.

那么,为什么Git开发人员选择使用基于标头的格式而不是纯数据格式?

So, why did Git developers choose to use the header-based format instead of the pure data format?

P.S. Git早期的AFAIK SHA-1哈希是基于压缩数据的.

P.S. AFAIK in the early days of Git the SHA-1 hash was based on the compressed data.

推荐答案

Git早期的AFAIK SHA-1哈希基于压缩数据.

AFAIK in the early days of Git the SHA-1 hash was based on the compressed data.

是的,这会导致各种优化",例如提交65c2e0c,git 0.99, 2015年6月:

Yes, and that lead to all kind of "optimizations" like commit 65c2e0c, git 0.99, June 2015:

查找SHA1对象的大小,而不会使所有内容膨胀.

Find size of SHA1 object without inflating everything.

但是" git如何计算文件哈希?中说明的新格式可以追溯到:

But that new format, illustrated in "How does git compute file hashes?", can be traced back to:

  • git diff, in commit 051308f (git 1.4.0-rc1, May 2006)
  • git fast-import, started in commit db5e523 (git 1.5.0, Aug. 2006)

每次都需要数据长度来对数据本身做任何事情.

Each time, the length of the data is needed to do anything with the data itself.

这篇关于为什么Git会在Blob文件中存储(和散列)Blob大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆