新发现的 SHA-1 冲突对 Git 有何影响? [英] How does the newly found SHA-1 collision affect Git?
问题描述
最近一组研究人员生成了两个具有相同 SHA-1 哈希值的文件 (https://shattered.it/).
Recently a team of researchers generated two files with the same SHA-1 hash (https://shattered.it/).
既然 Git 使用这个哈希作为其内部存储,那么这种攻击对 Git 的影响有多大?
Since Git uses this hash for its internal storage, how far does this kind of attack influence Git?
推荐答案
编辑,2017 年 12 月下旬:Git 2.16 版正在逐步获取内部接口以允许不同的哈希值.还有很长的路要走.
Edit, late December 2017: Git version 2.16 is gradually acquiring internal interfaces to allow for different hashes. There is a long way to go yet.
简短(但不令人满意)的答案是示例文件对 Git 来说不是问题——但两个其他(仔细计算的)文件可能是.
The short (but unsatisfying) answer is that the example files are not a problem for Git—but two other (carefully calculated) files could be.
我下载了这两个文件,shattered-1.pdf
和 shattered-2.pdf
,并将它们放入一个新的空存储库:
I downloaded both of these files, shattered-1.pdf
and shattered-2.pdf
, and put them into a new empty repository:
macbook$ shasum shattered-*
38762cf7f55934b34d179ae6a4c80cadccbb7f0a shattered-1.pdf
38762cf7f55934b34d179ae6a4c80cadccbb7f0a shattered-2.pdf
macbook$ cmp shattered-*
shattered-1.pdf shattered-2.pdf differ: char 193, line 8
macbook$ git init
Initialized empty Git repository in .../tmp/.git/
macbook$ git add shattered-1.pdf
macbook$ git add shattered-2.pdf
macbook$ git status
On branch master
Initial commit
Changes to be committed:
(use "git rm --cached <file>..." to unstage)
new file: shattered-1.pdf
new file: shattered-2.pdf
即使这两个文件具有相同的 SHA-1 校验和(并且显示大致相同,尽管一个具有红色背景,另一个具有蓝色背景),但它们获得不同的 Git 哈希:
Even though the two files have the same SHA-1 checksum (and display mostly the same, although one has a red background and the other has a blue background), they get different Git hashes:
macbook$ git ls-files --stage
100644 ba9aaa145ccd24ef760cf31c74d8f7ca1a2e47b0 0 shattered-1.pdf
100644 b621eeccd5c7edac9b7dcba35a8d5afd075e24f2 0 shattered-2.pdf
那些是存储在 Git 中的文件的两个 SHA-1 校验和:一个是 ba9aa...
,另一个是 b621e...代码>.
38762c...
也不是.但是——为什么?
Those are the two SHA-1 checksums for the files as stored in Git: one is ba9aa...
and the other is b621e...
. Neither is 38762c...
. But—why?
答案是 Git 存储文件,而不是作为它们本身,而是作为字符串文字 blob
、一个空格、十进制化的文件大小和一个 ASCII NUL 字节,以及 然后文件数据.两个文件的大小完全相同:
The answer is that Git stores files, not as themselves, but rather as the string literal blob
, a blank, the size of the file decimalized, and an ASCII NUL byte, and then the file data. Both files are exactly the same size:
macbook$ ls -l shattered-?.pdf
... 422435 Feb 24 00:55 shattered-1.pdf
... 422435 Feb 24 00:55 shattered-2.pdf
所以两者都以文字文本 blob 422435
为前缀(其中