Git如何处理blob上的SHA-1冲突? [英] How would Git handle a SHA-1 collision on a blob?

查看:272
本文介绍了Git如何处理blob上的SHA-1冲突?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这可能从来没有发生在现实世界中,但可能永远不会发生,但让我们考虑一下:假设你有一个git存储库,做一个提交,并且非常不走运:其中一个blob最终拥有相同的SHA-1作为另一个已存在于您的存储库中的文件。问题是,Git会如何处理这个问题?简单地失败?找到一种方法来链接两个blob,并根据上下文来检查哪一个是必要的?

更多的是一个脑筋急转弯而非实际问题,但我发现这个问题很有趣。

解决方案

我做了一个实验来找出Git在这种情况下的表现。这是版本2.7.9〜rc0 + next.20151210(Debian版本)。我基本上只是通过应用以下diff和rebuild git将散列大小从160位降低到4位:

   -  -  git-2.7.0〜rc0 + next.20151210.orig / block-sha1 / sha1.c 
+++ git-2.7.0〜rc0 + next.20151210 / block-sha1 / sha1.c
@@ -246,6 +246,8 @@ void blk_SHA1_Final(unsigned char hashou
blk_SHA1_Update(ctx,padlen,8);

/ *输出哈希* /
= for(i = 0; i <5; i ++)
- put_be32(hashout + i * 4,ctx-> H [i]);
+ for(i = 0; i (i = 1; i <5;< 1; i ++)
+ put_be32(hashout + i * 4,(ctx-> H [i]& 0xf000000) i ++)
+ put_be32(hashout + i * 4,0);
}

然后我做了一些提交,并注意到以下内容。


  1. 如果一个BLOB已经存在相同的哈希,您将不会得到任何一切似乎都没有问题,但是当你推动,有人克隆,或者你恢复时,你会失去最新版本(符合wha t如上所述)。
  2. 如果一个树形对象已经存在,并且你使用相同的散列形成一个blob:一切似乎都是正常的,直到你试图推送或者某人克隆你的存储库。然后您将看到repo已损坏。
  3. 如果一个提交对象已经存在,并且您使用相同的散列创建一个blob:与#2相同 - 损坏

  4. 如果一个blob已经存在,并且您使用相同的散列创建了一个commit对象,那么在更新ref时将失败。 如果一个blob已经存在,具有相同散列的树对象。创建提交时会失败。

  5. 如果树对象已经存在,并且您使用相同的哈希创建了一个提交对象,那么在更新ref时会失败。

  6. 如果一个树形对象已经存在,并且使用相同的散列形成一个树形对象,那么所有东西看起来都没问题。但是当你提交时,所有的版本库都会引用错误的树。

  7. 如果一个提交对象已经存在,并且你使用相同的哈希提交了一个提交对象,那么一切看起来都没问题。但是当你提交时,提交将永远不会被创建,并且HEAD指针将被移动到一个旧提交。

  8. 如果一个提交对象已经存在,并且你使用一个相同的树对象它会在创建提交时失败。

对于#2,当您运行git push :

 错误:object 0400000000000000000000000000000000000000是一棵树,不是blob 
致命错误blob对象
错误:未能将一些参考文献推送给原产地

或:

 错误:无法读取file.txt的sha1文件(0400000000000000000000000000000000000000)

如果您删除了该文件,然后运行git checkout file.txt。

对于#4和#6,您通常会得到像这样的错误:

 错误:尝试写非提交对象
f000000000000000000000000000000000000000到分支refs / heads / master
致命:运行git commit时无法更新HEAD ref

。在这种情况下,您通常可以再次输入git commit,因为这会创建一个新的散列(因为时间戳已更改)。

对于#5和#9,您通常会得到这样的错误:

 致命:1000000000000000000000000000000000000000不是有效的树对象



如果有人试图克隆你的损坏仓库,他们通常会看到如下所示:

  git clone(一个具有碰撞blob的回购,
d000000000000000000000000000000000000000被提交,
f000000000000000000000000000000000000000是树)

克隆到'clonedversion'中...
完成。
错误:无法读取s(d000000000000000000000000000000000000000)的sha1文件
错误:无法读取tullebukk的$ sha $文件
(f0000000000000000000000000000000000000000)
致命:无法检出工作树
警告:克隆成功,但检出失败。
您可以使用'git status'
检查签出的内容,然后使用'git checkout -f HEAD'

重试结帐。

我担心的是,在两种情况下(2,3),存储库在没有任何警告的情况下变得腐败,而在三种情况下(1,7,8)但存储库内容与您期望的不同。克隆或拉动的人将拥有与您拥有的内容不同的内容。情况4,5,6和9都可以,因为它会因错误而停止。我想如果它至少在所有情况下都失败了,会更好。


This probably never happened in the real-world yet, and may never happen, but let's consider this: say you have a git repository, make a commit, and get very very unlucky: one of the blobs ends up having the same SHA-1 as another that is already in your repository. Question is, how would Git handle this? Simply fail? Find a way to link the two blobs and check which one is needed according to the context?

More a brain-teaser than an actual problem, but I found the issue interesting.

解决方案

I did an experiment to find out exactly how Git would behave in this case. This is with version 2.7.9~rc0+next.20151210 (Debian version). I basically just reduced the hash size from 160-bit to 4-bit by applying the following diff and rebuilding git:

--- git-2.7.0~rc0+next.20151210.orig/block-sha1/sha1.c
+++ git-2.7.0~rc0+next.20151210/block-sha1/sha1.c
@@ -246,6 +246,8 @@ void blk_SHA1_Final(unsigned char hashou
    blk_SHA1_Update(ctx, padlen, 8);

    /* Output hash */
-   for (i = 0; i < 5; i++)
-       put_be32(hashout + i * 4, ctx->H[i]);
+   for (i = 0; i < 1; i++)
+       put_be32(hashout + i * 4, (ctx->H[i] & 0xf000000));
+   for (i = 1; i < 5; i++)
+       put_be32(hashout + i * 4, 0);
 }

Then I did a few commits and noticed the following.

  1. If a blob already exists with the same hash, you will not get any warnings at all. Everything seems to be ok, but when you push, someone clones, or you revert, you will lose the latest version (in line with what is explained above).
  2. If a tree object already exists and you make a blob with the same hash: Everything will seem normal, until you either try to push or someone clones your repository. Then you will see that the repo is corrupt.
  3. If a commit object already exists and you make a blob with the same hash: same as #2 - corrupt
  4. If a blob already exists and you make a commit object with the same hash, it will fail when updating the "ref".
  5. If a blob already exists and you make a tree object with the same hash. It will fail when creating the commit.
  6. If a tree object already exists and you make a commit object with the same hash, it will fail when updating the "ref".
  7. If a tree object already exists and you make a tree object with the same hash, everything will seem ok. But when you commit, all of the repository will reference the wrong tree.
  8. If a commit object already exists and you make a commit object with the same hash, everything will seem ok. But when you commit, the commit will never be created, and the HEAD pointer will be moved to an old commit.
  9. If a commit object already exists and you make a tree object with the same hash, it will fail when creating the commit.

For #2 you will typically get an error like this when you run "git push":

error: object 0400000000000000000000000000000000000000 is a tree, not a blob
fatal: bad blob object
error: failed to push some refs to origin

or:

error: unable to read sha1 file of file.txt (0400000000000000000000000000000000000000)

if you delete the file and then run "git checkout file.txt".

For #4 and #6, you will typically get an error like this:

error: Trying to write non-commit object
f000000000000000000000000000000000000000 to branch refs/heads/master
fatal: cannot update HEAD ref

when running "git commit". In this case you can typically just type "git commit" again since this will create a new hash (because of the changed timestamp)

For #5 and #9, you will typically get an error like this:

fatal: 1000000000000000000000000000000000000000 is not a valid 'tree' object

when running "git commit"

If someone tries to clone your corrupt repository, they will typically see something like:

git clone (one repo with collided blob,
d000000000000000000000000000000000000000 is commit,
f000000000000000000000000000000000000000 is tree)

Cloning into 'clonedversion'...
done.
error: unable to read sha1 file of s (d000000000000000000000000000000000000000)
error: unable to read sha1 file of tullebukk
(f000000000000000000000000000000000000000)
fatal: unable to checkout working tree
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry the checkout with 'git checkout -f HEAD'

What "worries" me is that in two cases (2,3) the repository becomes corrupt without any warnings, and in 3 cases (1,7,8), everything seems ok, but the repository content is different than what you expect it to be. People cloning or pulling will have a different content than what you have. The cases 4,5,6 and 9 are ok, since it will stop with an error. I suppose it would be better if it failed with an error at least in all cases.

这篇关于Git如何处理blob上的SHA-1冲突?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆