新发现的 SHA-1 冲突对 Git 有何影响? [英] How does the newly found SHA-1 collision affect Git?

查看:32
本文介绍了新发现的 SHA-1 冲突对 Git 有何影响?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近一组研究人员生成了两个具有相同 SHA-1 哈希值的文件 (https://shattered.it/).

Recently a team of researchers generated two files with the same SHA-1 hash (https://shattered.it/).

既然 Git 使用这个哈希作为其内部存储,那么这种攻击对 Git 的影响有多大?

Since Git uses this hash for its internal storage, how far does this kind of attack influence Git?

推荐答案

编辑,2017 年 12 月下旬:Git 2.16 版正在逐步获取内部接口以允许不同的哈希值.还有很长的路要走.

Edit, late December 2017: Git version 2.16 is gradually acquiring internal interfaces to allow for different hashes. There is a long way to go yet.

简短(但不令人满意)的答案是示例文件对 Git 来说不是问题——但两个其他(仔细计算的)文件可能是.

The short (but unsatisfying) answer is that the example files are not a problem for Git—but two other (carefully calculated) files could be.

我下载了这两个文件,shattered-1.pdfshattered-2.pdf,并将它们放入一个新的空存储库:

I downloaded both of these files, shattered-1.pdf and shattered-2.pdf, and put them into a new empty repository:

macbook$ shasum shattered-*
38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-1.pdf
38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-2.pdf
macbook$ cmp shattered-*
shattered-1.pdf shattered-2.pdf differ: char 193, line 8
macbook$ git init
Initialized empty Git repository in .../tmp/.git/
macbook$ git add shattered-1.pdf 
macbook$ git add shattered-2.pdf 
macbook$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

    new file:   shattered-1.pdf
    new file:   shattered-2.pdf

即使这两个文件具有相同的 SHA-1 校验和(并且显示大致相同,尽管一个具有红色背景,另一个具有蓝色背景),但它们获得不同的 Git 哈希:

Even though the two files have the same SHA-1 checksum (and display mostly the same, although one has a red background and the other has a blue background), they get different Git hashes:

macbook$ git ls-files --stage
100644 ba9aaa145ccd24ef760cf31c74d8f7ca1a2e47b0 0   shattered-1.pdf
100644 b621eeccd5c7edac9b7dcba35a8d5afd075e24f2 0   shattered-2.pdf

那些是存储在 Git 中的文件的两个 SHA-1 校验和:一个是 ba9aa...,另一个是 b621e....38762c... 也不是.但是——为什么?

Those are the two SHA-1 checksums for the files as stored in Git: one is ba9aa... and the other is b621e.... Neither is 38762c.... But—why?

答案是 Git 存储文件,而不是作为它们本身,而是作为字符串文字 blob、一个空格、十进制化的文件大小和一个 ASCII NUL 字节,以及 然后文件数据.两个文件的大小完全相同:

The answer is that Git stores files, not as themselves, but rather as the string literal blob, a blank, the size of the file decimalized, and an ASCII NUL byte, and then the file data. Both files are exactly the same size:

macbook$ ls -l shattered-?.pdf
...  422435 Feb 24 00:55 shattered-1.pdf
...  422435 Feb 24 00:55 shattered-2.pdf

所以两者都以文字文本 blob 422435 为前缀(其中 表示单个字节,字符串中的 C 或 Python 八进制转义).

so both are prefixed with the literal text blob 422435 (where represents a single byte, a la C or Python octal escapes in strings).

也许令人惊讶——或者不知道,如果你知道如何计算 SHA-1——将相同的前缀添加到两个不同的文件中,但它们之前产生了相同的校验和,使它们现在产生不同校验和.

Perhaps surprisingly—or not, if you know anything of how SHA-1 is calculated—adding the same prefix to two different files that nonetheless produced the same checksum before, causes them to now produce different checksums.

这应该变得不足为奇的原因是,如果最终校验和结果对位置以及每个输入位的值不是非常敏感,它通过获取已知的输入文件并仅重新排列其中的一些位,很容易按需产生冲突.尽管在 char 193, line 8 有不同的字节,这两个输入文件产生相同的总和,但根据研究人员的说法,通过尝试超过 9 quintillion (short scale) 输入.为了获得这个结果,他们在他们控制的位置放入了精心挑选的原始数据块,这会影响总和,直到他们找到导致冲突的输入对.

The reason this should become unsurprising is that if the final checksum result were not exquisitely sensitive to the position, as well as the value, of each input bit, it would be easy to produce collisions on demand by taking a known input file and merely re-arranging some of its bits. These two input files produce the same sum despite having a different byte at char 193, line 8, but this result was achieved, according to the researchers, by trying over 9 quintillion (short scale) inputs. To get that result, they put in carefully chosen blocks of raw data, at a position they controlled, that would affect the sums, until they found pairs of inputs that resulted in a collision.

通过添加 blob 标头,Git 移动了位置,在一次或多或少的偶然打嗝中破坏了 110 个 GPU 年的计算.

By adding the blob header, Git moved the position, destroying the 110-GPU-years of computation in a single more or less accidental burp.

现在,知道 Git 会这样做,他们可以重复他们 110-GPU 年的计算,输入以 blob 422435 开头(提供他们的牺牲块不会被推得太多;实际所需的 GPU 年计算数量可能会有所不同,因为这个过程有点 随机).然后他们会想出两个不同文件,它们可以去除blob标头.这两个文件现在彼此具有不同的 SHA-1 校验和,但是当 git add-ed 时,两者都会产生 相同 SHA-1 校验和.

Now, knowing that Git will do this, they could repeat their 110-GPU-years of computation with inputs that begin with blob 422435 (provided their sacrificial blocks don't get pushed around too much; and the actual number of GPU-years of computation needed would probably vary, as the process is a bit stochastic). They would then come up with two different files that could have the blob header stripped off. These two files would now have different SHA-1 checksums from each other, but when git add-ed, both would produce the same SHA-1 checksum.

在这种特殊情况下,添加的第一个文件将赢得"插槽.(假设它名为 shattered-3.pdf.)一个足够好的 Git——我完全不确定当前的 Git 是否如此好;请参阅鲁本基于实验的答案 Git 如何处理 blob 上的 SHA-1 冲突?— 会注意到 git add shattered-4.pdf 尝试添加第二个文件,与第一个发生冲突 - 但-不同的 shattered-3.pdf 并且会警告你并失败 git add 步骤.在任何情况下,您都无法将这两个文件添加到一个存储库中.

In that particular case, the first file added would "win" the slot. (Let's assume it's named shattered-3.pdf.) A good-enough Git—I'm not at all sure that the current Git is this good; see Ruben's experiment-based answer to How would Git handle a SHA-1 collision on a blob?—would notice that git add shattered-4.pdf, attempting to add the second file, collided with the first-but-different shattered-3.pdf and would warn you and fail the git add step. In any case you would be unable to add both of these files to a single repository.

但首先,有人必须花费更多的时间和金钱来计算新的哈希冲突.

But first, someone has to spend a lot more time and money to compute the new hash collision.

这篇关于新发现的 SHA-1 冲突对 Git 有何影响?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆