新发现的sha1碰撞如何影响git? [英] How does the newly found sha1 collision affect git?

查看:145
本文介绍了新发现的sha1碰撞如何影响git?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

最近一组研究人员用相同的sha1-hash生成了两个文件( https://shattered.it/ )。

因为git使用这个散列作为它的内部存储空间,这种攻击会影响git的程度有多大?

Git 2.16版本是逐渐获取内部接口以允许不同的哈希值。还有很长的路要走。






简短但不令人满意的答案是示例文件不是Git的问题 - 但是两个其他(仔细计算的)文件可以

我下载了这两个文件, shattered-1.pdf shattered-2.pdf ,并将它们放入一个新的空存储库中:

  macbook $ shasum shattered- * 
38762cf7f55934b34d179ae6a4c80cadccbb7f0a shattered-1.pdf
38762cf7f55934b34d179ae6a4c80cadccbb7f0a shattered-2.pdf
macbook $ cmp shattered- *
shattered-1.pdf shattered-2.pdf不同:char 193,第8行
macbook $ git init
初始化的空Git仓库在... / tmp / .git /
macbook $ git add shattered-1.pdf
macbook $ git add shattered-2.pdf
macbook $ git status
关于分支主

初始提交

要提交的更改:
(使用git rm --cached< file> ...来暂停)

新的fi le:shattered-1.pdf
新文件:shattered-2.pdf

即使这两个文件具有相同的SHA-1校验和(并且显示大部分相同,尽管一个具有红色背景并且另一个具有蓝色背景),他们获得不同的Git哈希

  macbook $ git ls-files --stage 
100644 ba9aaa145ccd24ef760cf31c74d8f7ca1a2e47b0 0 shattered-1.pdf
100644 b621eeccd5c7edac9b7dcba35a8d5afd075e24f2 0破碎-2 .pdf

这些是存储在Git中的文件的两个SHA-1校验和:一个是 ba9aa ... ,另一个是 b621e ... 。既不是 38762c ... 。但是 - 为什么?



答案是Git存储文件,而不是它们本身,而是作为字符串文字 blob ,空白,文件大小为十进制,ASCII NUL字节,然后,然后文件数据。这两个文件的大小完全相同:

  macbook $ ls -l破碎 -  ?. pdf 
... 422435 Feb 24 00:55 shattered-1.pdf
... 422435 Feb 24 00:55 shattered-2.pdf

,所以两者都以字面文本 blob 422435 \ 0 为前缀(其中 \ 0 代表一个字节,一个la C或Python八进制字符串转义)。



如果您知道SHA-1的计算方式,在两个不同的文件中添加相同的前缀,然后在之前产生相同的校验和,导致它们现在产生不同的校验和。 b
$ b

如果最终校验和结果不是对位置非常敏感,那么这应该不会令人惊讶的原因,以及值对于每个输入位,通过采用已知的输入文件并仅仅重新排列其中的一些位就可以容易地按需产生冲突。尽管在 char 193,line 8 中有一个不同的字节,但这两个输入文件产生的是相同的总和,但是根据研究人员的说法,这个结果已经达到9个百分点小规模)投入。为了得到结果,他们将精心挑选的原始数据块放在他们控制的位置,这会影响总和,直到他们发现导致碰撞的输入对为止。



通过添加 blob 标题,Git 移动了位置,破坏了110-GPU-年计算在一个或多或少意外的打嗝中。

现在,知道Git会这样做,他们可以重复他们的110-GPU-year以 blob 422435 \0 开始的输入计算(假设它们的牺牲块不会被过多推送;并且实际需要GPU计算的年数)可能会有所不同,因为这个过程有点像随机)。然后他们会拿出两个不同的文件,这些文件可以将 blob 标头去掉。这两个文件现在将具有不同的SHA-1校验和,但是当 git add -ed时,两者都会产生相同的 SHA-1校验和。



在这种特殊情况下,添加的第一个文件将赢该插槽。 (让我们假设它被命名为 shattered-3.pdf 。)一个足够好的Git-我不确定当前的Git是否好用;请参阅 Ruben基于实验的答案 Git如何处理blob上的SHA-1冲突? - 会注意到 git add shattered-4.pdf ,尝试添加第二个文件,与第一个不同的 shattered-3.pdf 发生冲突,并警告您并且失败 git add 步骤。在任何情况下,您都无法将这两个文件添加到单个存储库中。



但是,首先,有人不得不花费更多的时间和金钱来计算新的哈希碰撞。

Recently a team of researchers generated two files with the same sha1-hash (https://shattered.it/).

Since git uses this hash for its internal storage, how far does this kind of attack influence git?

解决方案

Edit, late December 2017: Git version 2.16 is gradually acquiring internal interfaces to allow for different hashes. There is a long way to go yet.


The short (but unsatisfying) answer is that the example files are not a problem for Git—but two other (carefully calculated) files could be.

I downloaded both of these files, shattered-1.pdf and shattered-2.pdf, and put them into a new empty repository:

macbook$ shasum shattered-*
38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-1.pdf
38762cf7f55934b34d179ae6a4c80cadccbb7f0a  shattered-2.pdf
macbook$ cmp shattered-*
shattered-1.pdf shattered-2.pdf differ: char 193, line 8
macbook$ git init
Initialized empty Git repository in .../tmp/.git/
macbook$ git add shattered-1.pdf 
macbook$ git add shattered-2.pdf 
macbook$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

    new file:   shattered-1.pdf
    new file:   shattered-2.pdf

Even though the two files have the same SHA-1 checksum (and display mostly the same, although one has a red background and the other has a blue background), they get different Git hashes:

macbook$ git ls-files --stage
100644 ba9aaa145ccd24ef760cf31c74d8f7ca1a2e47b0 0   shattered-1.pdf
100644 b621eeccd5c7edac9b7dcba35a8d5afd075e24f2 0   shattered-2.pdf

Those are the two SHA-1 checksums for the files as stored in Git: one is ba9aa... and the other is b621e.... Neither is 38762c.... But—why?

The answer is that Git stores files, not as themselves, but rather as the string literal blob, a blank, the size of the file decimalized, and an ASCII NUL byte, and then the file data. Both files are exactly the same size:

macbook$ ls -l shattered-?.pdf
...  422435 Feb 24 00:55 shattered-1.pdf
...  422435 Feb 24 00:55 shattered-2.pdf

so both are prefixed with the literal text blob 422435\0 (where \0 represents a single byte, a la C or Python octal escapes in strings).

Perhaps surprisingly—or not, if you know anything of how SHA-1 is calculated—adding the same prefix to two different files that nonetheless produced the same checksum before, causes them to now produce different checksums.

The reason this should become unsurprising is that if the final checksum result were not exquisitely sensitive to the position, as well as the value, of each input bit, it would be easy to produce collisions on demand by taking a known input file and merely re-arranging some of its bits. These two input files produce the same sum despite having a different byte at char 193, line 8, but this result was achieved, according to the researchers, by trying over 9 quintillion (short scale) inputs. To get that result, they put in carefully chosen blocks of raw data, at a position they controlled, that would affect the sums, until they found pairs of inputs that resulted in a collision.

By adding the blob header, Git moved the position, destroying the 110-GPU-years of computation in a single more or less accidental burp.

Now, knowing that Git will do this, they could repeat their 110-GPU-years of computation with inputs that begin with blob 422435\0 (provided their sacrificial blocks don't get pushed around too much; and the actual number of GPU-years of computation needed would probably vary, as the process is a bit stochastic). They would then come up with two different files that could have the blob header stripped off. These two files would now have different SHA-1 checksums from each other, but when git add-ed, both would produce the same SHA-1 checksum.

In that particular case, the first file added would "win" the slot. (Let's assume it's named shattered-3.pdf.) A good-enough Git—I'm not at all sure that the current Git is this good; see Ruben's experiment-based answer to How would Git handle a SHA-1 collision on a blob?—would notice that git add shattered-4.pdf, attempting to add the second file, collided with the first-but-different shattered-3.pdf and would warn you and fail the git add step. In any case you would be unable to add both of these files to a single repository.

But first, someone has to spend a lot more time and money to compute the new hash collision.

这篇关于新发现的sha1碰撞如何影响git?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆