git中的哈希冲突 [英] Hash collision in git

查看:123
本文介绍了git中的哈希冲突的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我在使用 git 时发生哈希冲突,实际会发生什么?

例如我设法提交了两个具有相同 sha1 校验和的文件,git 会注意到它还是损坏其中一个文件?

是否可以改进 git 以适应它,或者我是否必须更改为新的哈希算法?

(请不要通过讨论不太可能来转移这个问题 - 谢谢)

解决方案

在 10 个卫星上拾取原子

一个 SHA-1 哈希是一个 40 个十六进制字符串......每个字符 4 位乘以 40...160 位.现在我们知道 10 位大约是 1000(准确地说是 1024),这意味着有 1 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 40 000 种不同的 SHA1-8 散列值sup>.

这相当于什么?好吧,月球由大约 1047 个原子组成.因此,如果我们有 10 个卫星……并且您在其中一个卫星上随机选择一个原子……然后继续在它们上再次随机选择一个原子……那么您将选择同一个原子两次的可能性, 是两个给定的 git 提交具有相同 SHA-1 哈希值的可能性.

扩展到这个我们可以问这个问题...

在开始担心冲突之前,您需要在存储库中提交多少次?

这与所谓的生日攻击"有关,而后者又指的是生日悖论".或生日问题",它指出当您从给定的集合中随机选择时,您需要的选择少得惊人,您更有可能选择两次.但是令人惊讶的很少"在这里是一个非常相对的术语.

维基百科有一个关于生日悖论碰撞概率的表格.没有 40 个字符的哈希条目.但是 32 和 48 个字符的条目的插值使我们处于 5*1022 git commits 的范围内,碰撞概率为 0.1%.这是五万亿次不同的提交,或五十个Zettacommits,在您发生碰撞的几率甚至达到 0.1% 之前.

仅这些提交的哈希的字节总和将比地球上一年生成的所有数据更多,也就是说,您需要以比 YouTube 流式传输视频更快的速度生成代码.祝你好运.:D

重点是除非有人故意造成碰撞,否则随机发生的概率非常小,您可以忽略此问题

但是当碰撞确实发生时,实际会发生什么?"

好吧,假设不可能的事情确实发生了,或者假设有人设法定制了a故意的 SHA-1 哈希冲突.那会发生什么?

在那种情况下,有 一个很好的回答有人尝试过的地方.我会引用那个答案:

<块引用>

  1. 如果已经存在具有相同哈希值的 blob,您将不会收到任何警告.一切似乎都很好,但是当您推送、有人克隆或还原时,您将丢失最新版本(与上面的解释一致).
  2. 如果一个树对象已经存在并且你用相同的哈希值创建了一个 blob:一切看起来都很正常,直到你尝试推送或有人克隆你的存储库.然后您将看到该存储库已损坏.
  3. 如果提交对象已经存在并且您使用相同的散列创建一个 blob:与 #2 相同 - 损坏
  4. 如果 blob 已经存在,并且您使用相同的哈希值创建提交对象,则更新ref"时它将失败.
  5. 如果一个 blob 已经存在并且你创建了一个具有相同哈希值的树对象.创建提交时会失败.
  6. 如果树对象已经存在,并且您使用相同的哈希值创建提交对象,则更新ref"时它将失败.
  7. 如果一个树对象已经存在并且你创建了一个具有相同哈希值的树对象,那么一切看起来都没有问题.但是当你提交时,所有的存储库都会引用错误的树.
  8. 如果提交对象已经存在,并且您使用相同的散列创建提交对象,则一切看起来都没有问题.但是当你提交时,提交将永远不会被创建,并且 HEAD 指针将被移动到一个旧的提交.
  9. 如果一个提交对象已经存在,并且你创建了一个具有相同哈希值的树对象,那么在创建提交时它将失败.

如您所见,有些情况并不好.尤其是情况 #2 和 #3 弄乱了您的存储库.但是,故障似乎确实存在于该存储库中,并且攻击或奇怪的不可能性不会传播到其他存储库.

此外,似乎故意碰撞的问题被认为是一种真正的威胁,例如 GitHub 正在采取措施防止它.

What would actually happen if I had a hash collision while using git?

E.g. I manage to commit two files with the same sha1 checksum, would git notice it or corrupt one of the files?

Could git be improved to live with that, or would I have to change to a new hash algorithm?

(Please do not deflect this question by discussing how unlikely that is - Thanks)

解决方案

Picking atoms on 10 Moons

An SHA-1 hash is a 40 hex character string... that's 4 bits per character times 40... 160 bits. Now we know 10 bits is approximately 1000 (1024 to be exact) meaning that there are 1 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 000 different SHA-1 hashes... 1048.

What is this equivalent of? Well the Moon is made up of about 1047 atoms. So if we have 10 Moons... and you randomly pick one atom on one of these moons... and then go ahead and pick a random atom on them again... then the likelihood that you'll pick the same atom twice, is the likelihood that two given git commits will have the same SHA-1 hash.

Expanding on this we can ask the question...

How many commits do you need in a repository before you should start worrying about collisions?

This relates to so called "Birthday attacks", which in turn refers to the "Birthday Paradox" or "Birthday Problem", which states that when you pick randomly from a given set, you need surprisingly few picks before you are more likely than not to have picked something twice. But "surprisingly few" is a very relative term here.

Wikipedia has a table on the probability of Birthday Paradox collisions. There is no entry for a 40 character hash. But an interpolation of the entries for 32 and 48 characters lands us in the range of 5*1022 git commits for a 0.1% probability of a collision. That is fifty thousand billion billion different commits, or fifty Zettacommits, before you have reached even a 0.1% chance that you have a collision.

The byte sum of the hashes alone for these commits would be more data than all the data generated on Earth for a year, which is to say you would need to churn out code faster than YouTube streams out video. Good luck with that. :D

The point of this is that unless someone is deliberately causing a collision, the probability of one happening at random is so staggeringly small you can ignore this issue

"But when a collision does occur, then what actually happens?"

Ok, suppose the improbable does happen, or suppose someone managed to tailor a deliberate SHA-1 hash collision. What happens then?

In that case there is an excellent answer where someone experimented on it. I will quote from that answer:

  1. If a blob already exists with the same hash, you will not get any warnings at all. Everything seems to be ok, but when you push, someone clones, or you revert, you will lose the latest version (in line with what is explained above).
  2. If a tree object already exists and you make a blob with the same hash: Everything will seem normal, until you either try to push or someone clones your repository. Then you will see that the repo is corrupt.
  3. If a commit object already exists and you make a blob with the same hash: same as #2 - corrupt
  4. If a blob already exists and you make a commit object with the same hash, it will fail when updating the "ref".
  5. If a blob already exists and you make a tree object with the same hash. It will fail when creating the commit.
  6. If a tree object already exists and you make a commit object with the same hash, it will fail when updating the "ref".
  7. If a tree object already exists and you make a tree object with the same hash, everything will seem ok. But when you commit, all of the repository will reference the wrong tree.
  8. If a commit object already exists and you make a commit object with the same hash, everything will seem ok. But when you commit, the commit will never be created, and the HEAD pointer will be moved to an old commit.
  9. If a commit object already exists and you make a tree object with the same hash, it will fail when creating the commit.

As you can see some cases are not good. Especially cases #2 and #3 mess up your repository. However, it does seem that the fault stays within that repository, and the attack or bizarre improbability does not propagate to other repositories.

Also, it seems that the issue of deliberate collisions is being recognised as a real threat, and so for instance GitHub is taking measures to prevent it.

这篇关于git中的哈希冲突的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆