新发现的SHA-1碰撞如何影响Git? [英] How does the newly found SHA-1 collision affect Git?

查看:225
本文介绍了新发现的SHA-1碰撞如何影响Git?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近,一组研究人员用相同的SHA-1哈希生成了两个文件( https://shattered.it/).

Recently a team of researchers generated two files with the same SHA-1 hash (https://shattered.it/).

由于Git使用此哈希作为其内部存储,这种攻击对Git的影响有多大?

Since Git uses this hash for its internal storage, how far does this kind of attack influence Git?

推荐答案

编辑,2017年12月下旬: Git版本2.16正在逐步获取内部接口,以允许使用不同的哈希值.还有很长的路要走.

Edit, late December 2017: Git version 2.16 is gradually acquiring internal interfaces to allow for different hashes. There is a long way to go yet.

简短(但不令人满意)的答案是示例文件对于Git来说不是问题,但可以是两个 other (经过精心计算)文件.

The short (but unsatisfying) answer is that the example files are not a problem for Git—but two other (carefully calculated) files could be.

我下载了这两个文件shattered-1.pdfshattered-2.pdf,并将它们放入新的空存储库中:

I downloaded both of these files, shattered-1.pdf and shattered-2.pdf, and put them into a new empty repository:

macbook$ shasum shattered-*
38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-1.pdf
38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-2.pdf
macbook$ cmp shattered-*
shattered-1.pdf shattered-2.pdf differ: char 193, line 8
macbook$ git init
Initialized empty Git repository in .../tmp/.git/
macbook$ git add shattered-1.pdf 
macbook$ git add shattered-2.pdf 
macbook$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

    new file:   shattered-1.pdf
    new file:   shattered-2.pdf

即使这两个文件具有相同的SHA-1校验和(并且显示的内容几乎相同,尽管一个背景为红色,而另一个背景为蓝色),它们也会获得不同的Git哈希:

Even though the two files have the same SHA-1 checksum (and display mostly the same, although one has a red background and the other has a blue background), they get different Git hashes:

macbook$ git ls-files --stage
100644 ba9aaa145ccd24ef760cf31c74d8f7ca1a2e47b0 0   shattered-1.pdf
100644 b621eeccd5c7edac9b7dcba35a8d5afd075e24f2 0   shattered-2.pdf

这是存储在Git中的文件 的两个SHA-1校验和:一个是ba9aa...,另一个是b621e.... 38762c...都不是.但是-为什么?

Those are the two SHA-1 checksums for the files as stored in Git: one is ba9aa... and the other is b621e.... Neither is 38762c.... But—why?

答案是,Git并不是将文件存储为文件本身,而是以字符串文字blob,空格,文件大小十进制化,ASCII NUL字节和 then 的形式存储.文件数据.这两个文件的大小完全相同:

The answer is that Git stores files, not as themselves, but rather as the string literal blob, a blank, the size of the file decimalized, and an ASCII NUL byte, and then the file data. Both files are exactly the same size:

macbook$ ls -l shattered-?.pdf
...  422435 Feb 24 00:55 shattered-1.pdf
...  422435 Feb 24 00:55 shattered-2.pdf

因此两者均以文字文本blob 422435\0为前缀(其中\0表示单个字节,字符串中的la C或Python八进制转义符).

so both are prefixed with the literal text blob 422435\0 (where \0 represents a single byte, a la C or Python octal escapes in strings).

也许令人惊讶-还是不知道,如果您不知道SHA-1的计算方式的话-将相同前缀添加到两个不同的文件中,这些文件仍然会在之前产生相同的校验和 ,导致它们现在生成不同校验和.

Perhaps surprisingly—or not, if you know anything of how SHA-1 is calculated—adding the same prefix to two different files that nonetheless produced the same checksum before, causes them to now produce different checksums.

之所以变得毫不奇怪,是因为如果最终的校验和结果对每个输入位的位置以及其值 都不是非常敏感,通过获取已知的输入文件并仅重新排列其中的一些位,很容易按需产生冲突.这两个输入文件尽管在char 193, line 8处具有不同的字节,但它们产生的总和相同,但是研究人员认为,通过尝试9位数超过5十亿字节(

The reason this should become unsurprising is that if the final checksum result were not exquisitely sensitive to the position, as well as the value, of each input bit, it would be easy to produce collisions on demand by taking a known input file and merely re-arranging some of its bits. These two input files produce the same sum despite having a different byte at char 193, line 8, but this result was achieved, according to the researchers, by trying over 9 quintillion (short scale) inputs. To get that result, they put in carefully chosen blocks of raw data, at a position they controlled, that would affect the sums, until they found pairs of inputs that resulted in a collision.

通过添加blob标头,Git 移动了位置,一次或多或少一次偶然的破坏就破坏了110-GPU年的计算.

By adding the blob header, Git moved the position, destroying the 110-GPU-years of computation in a single more or less accidental burp.

现在,知道Git会做到这一点,他们可以使用blob 422435\0开头的输入来重复 110年GPU的计算(前提是牺牲块也不会被压倒)很多;并且实际需要的GPU年计算量可能会有所不同,因为该过程有点 stochastic ).然后,他们将提出两个不同文件,这些文件可能会删除blob标头.这两个文件现在将具有彼此不同的SHA-1校验和,但是当git add -ed时,两者都将产生相同 SHA-1校验和.

Now, knowing that Git will do this, they could repeat their 110-GPU-years of computation with inputs that begin with blob 422435\0 (provided their sacrificial blocks don't get pushed around too much; and the actual number of GPU-years of computation needed would probably vary, as the process is a bit stochastic). They would then come up with two different files that could have the blob header stripped off. These two files would now have different SHA-1 checksums from each other, but when git add-ed, both would produce the same SHA-1 checksum.

在这种情况下,添加的第一个文件将赢得"该插槽. (让我们将其命名为shattered-3.pdf.)足够好的Git-我一点也不知道当前的Git就是这个好东西.参见鲁本基于实验的答案 Git如何处理Blob上的SHA-1冲突? —会注意到git add shattered-4.pdf试图添加第二个文件,与第一个但不同的shattered-3.pdf发生碰撞,并会警告您并失败git add步骤.无论如何,您将无法将这两个文件都添加到单个存储库中.

In that particular case, the first file added would "win" the slot. (Let's assume it's named shattered-3.pdf.) A good-enough Git—I'm not at all sure that the current Git is this good; see Ruben's experiment-based answer to How would Git handle a SHA-1 collision on a blob?—would notice that git add shattered-4.pdf, attempting to add the second file, collided with the first-but-different shattered-3.pdf and would warn you and fail the git add step. In any case you would be unable to add both of these files to a single repository.

但是首先,有人必须花费更多的时间和金钱来计算新的哈希冲突.

But first, someone has to spend a lot more time and money to compute the new hash collision.

这篇关于新发现的SHA-1碰撞如何影响Git?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆