为什么Git会在Blob文件中存储(和散列)Blob大小? [英] Why does Git store (and hash) blob size in the blob file?
问题描述
Git的blob目标文件格式为blob <size string>\0<data>
.
识别blob的SHA-1哈希不是单独根据blob的内容计算的,而是根据标头增强的blob数据(如上所述)计算的.
Git's blob object file format is blob <size string>\0<data>
.
The blob-identifying SHA-1 hash is calculated not from the blob contents alone, but from the header-augmented blob data (as described above).
作为一个纯粹主义者,我不喜欢那种架构.它将数据的通用属性(其SHA1哈希)与一些特定于git的标头混合在一起.
As a purist I do not like that architecture. It mixes the universal property of the data (its SHA1 hash) with some git-specific header.
纯数据Blob存储的另一个优点是可以使用写时复制"将文件添加到索引中,而不用复制整个文件.所需的空间可以减半,并且某些操作可以变得更快.
Another advantage of pure-data blob storage is that the files can be added to the index using "copy-on-write" instead of copying the whole file. The required space could be halved and some operations could become faster.
那么,为什么Git开发人员选择使用基于标头的格式而不是纯数据格式?
So, why did Git developers choose to use the header-based format instead of the pure data format?
P.S. Git早期的AFAIK SHA-1哈希是基于压缩数据的.
P.S. AFAIK in the early days of Git the SHA-1 hash was based on the compressed data.
推荐答案
Git早期的AFAIK SHA-1哈希基于压缩数据.
AFAIK in the early days of Git the SHA-1 hash was based on the compressed data.
是的,这会导致各种优化",例如提交65c2e0c,git 0.99, 2015年6月:
Yes, and that lead to all kind of "optimizations" like commit 65c2e0c, git 0.99, June 2015:
查找SHA1对象的大小,而不会使所有内容膨胀.
Find size of SHA1 object without inflating everything.
但是" git如何计算文件哈希?中说明的新格式可以追溯到:
But that new format, illustrated in "How does git compute file hashes?", can be traced back to:
-
git diff
,在提交051308f(git 1.4.0-rc1,2006年5月) -
git fast-import
,开始于提交db5e523(git 1.5.0,2006年8月)
git diff
, in commit 051308f (git 1.4.0-rc1, May 2006)git fast-import
, started in commit db5e523 (git 1.5.0, Aug. 2006)
每次都需要数据长度来对数据本身做任何事情.
Each time, the length of the data is needed to do anything with the data itself.
这篇关于为什么Git会在Blob文件中存储(和散列)Blob大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!