Git 如何处理 blob 上的 SHA-1 冲突? [英] How would Git handle a SHA-1 collision on a blob?

查看:48
本文介绍了Git 如何处理 blob 上的 SHA-1 冲突?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这在现实世界中可能从未发生过,也可能永远不会发生,但让我们考虑一下:假设您有一个 git 存储库,进行提交,但非常不幸:其中一个 blob 最终拥有相同的SHA-1 作为另一个已经在您的存储库中.问题是,Git 将如何处理这个问题?只是失败?想办法链接这两个 blob 并根据上下文检查需要哪个 blob?

This probably never happened in the real-world yet, and may never happen, but let's consider this: say you have a git repository, make a commit, and get very very unlucky: one of the blobs ends up having the same SHA-1 as another that is already in your repository. Question is, how would Git handle this? Simply fail? Find a way to link the two blobs and check which one is needed according to the context?

与其说是一个实际问题,不如说是一个脑筋急转弯,但我发现这个问题很有趣.

More a brain-teaser than an actual problem, but I found the issue interesting.

推荐答案

我做了一个实验,以找出 Git 在这种情况下的确切行为.这是版本 2.7.9~rc0+next.20151210(Debian 版本).我基本上只是通过应用以下差异并重建 git 将哈希大小从 160 位减少到 4 位:

I did an experiment to find out exactly how Git would behave in this case. This is with version 2.7.9~rc0+next.20151210 (Debian version). I basically just reduced the hash size from 160-bit to 4-bit by applying the following diff and rebuilding git:

--- git-2.7.0~rc0+next.20151210.orig/block-sha1/sha1.c
+++ git-2.7.0~rc0+next.20151210/block-sha1/sha1.c
@@ -246,6 +246,8 @@ void blk_SHA1_Final(unsigned char hashou
    blk_SHA1_Update(ctx, padlen, 8);

    /* Output hash */
-   for (i = 0; i < 5; i++)
-       put_be32(hashout + i * 4, ctx->H[i]);
+   for (i = 0; i < 1; i++)
+       put_be32(hashout + i * 4, (ctx->H[i] & 0xf000000));
+   for (i = 1; i < 5; i++)
+       put_be32(hashout + i * 4, 0);
 }

然后我做了一些提交并注意到以下内容.

Then I did a few commits and noticed the following.

  1. 如果已经存在具有相同哈希值的 blob,您将不会收到任何警告.一切似乎都很好,但是当您推送、有人克隆或还原时,您将丢失最新版本(与上面的解释一致).
  2. 如果一个树对象已经存在并且你用相同的哈希值创建了一个 blob:一切看起来都很正常,直到你尝试推送或有人克隆你的存储库.然后您将看到该存储库已损坏.
  3. 如果提交对象已经存在并且您使用相同的散列创建一个 blob:与 #2 相同 - 损坏
  4. 如果一个 blob 已经存在,并且您使用相同的哈希值创建了一个提交对象,则更新ref"时它将失败.
  5. 如果一个 blob 已经存在并且你创建了一个具有相同哈希值的树对象.创建提交时会失败.
  6. 如果树对象已经存在并且您使用相同的散列创建提交对象,则更新ref"时它将失败.
  7. 如果一个树对象已经存在并且你创建了一个具有相同哈希值的树对象,那么一切看起来都没有问题.但是当你提交时,所有的存储库都会引用错误的树.
  8. 如果提交对象已经存在,并且您使用相同的散列创建提交对象,则一切看起来都没有问题.但是当你提交时,提交将永远不会被创建,并且 HEAD 指针将被移动到一个旧的提交.
  9. 如果一个提交对象已经存在,并且你创建了一个具有相同哈希值的树对象,那么在创建提交时它将失败.

对于#2,当你运行git push"时,你通常会得到这样的错误:

For #2 you will typically get an error like this when you run "git push":

error: object 0400000000000000000000000000000000000000 is a tree, not a blob
fatal: bad blob object
error: failed to push some refs to origin

或:

error: unable to read sha1 file of file.txt (0400000000000000000000000000000000000000)

如果你删除文件然后运行git checkout file.txt".

if you delete the file and then run "git checkout file.txt".

对于#4 和#6,您通常会收到如下错误:

For #4 and #6, you will typically get an error like this:

error: Trying to write non-commit object
f000000000000000000000000000000000000000 to branch refs/heads/master
fatal: cannot update HEAD ref

运行git commit"时.在这种情况下,您通常可以再次输入git commit",因为这将创建一个新的哈希值(因为时间戳已更改)

when running "git commit". In this case you can typically just type "git commit" again since this will create a new hash (because of the changed timestamp)

对于#5 和#9,您通常会收到如下错误:

For #5 and #9, you will typically get an error like this:

fatal: 1000000000000000000000000000000000000000 is not a valid 'tree' object

运行git commit"时

when running "git commit"

如果有人试图克隆您损坏的存储库,他们通常会看到如下内容:

If someone tries to clone your corrupt repository, they will typically see something like:

git clone (one repo with collided blob,
d000000000000000000000000000000000000000 is commit,
f000000000000000000000000000000000000000 is tree)

Cloning into 'clonedversion'...
done.
error: unable to read sha1 file of s (d000000000000000000000000000000000000000)
error: unable to read sha1 file of tullebukk
(f000000000000000000000000000000000000000)
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

让我担心"的是,在两种情况下 (2,3) 存储库在没有任何警告的情况下损坏,而在 3 种情况下 (1,7,8),一切似乎都正常,但存储库内容与内容不同你希望它是.人们克隆或拉取的内容将与您拥有的内容不同.情况 4、5、6 和 9 没问题,因为它会因错误而停止.我想如果它至少在所有情况下都失败并出现错误会更好.

What "worries" me is that in two cases (2,3) the repository becomes corrupt without any warnings, and in 3 cases (1,7,8), everything seems ok, but the repository content is different than what you expect it to be. People cloning or pulling will have a different content than what you have. The cases 4,5,6 and 9 are ok, since it will stop with an error. I suppose it would be better if it failed with an error at least in all cases.

这篇关于Git 如何处理 blob 上的 SHA-1 冲突?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆