为什么Git使用密码散列函数? [英] Why does Git use a cryptographic hash function?

查看:139
本文介绍了为什么Git使用密码散列函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么Git使用加密哈希函数 SHA-1 而不是更快的非加密哈希函数?



相关问题:

Stack  Overflow问题 为什么Git使用SHA-1作为版本号? 询问为什么Git使用SHA-1而不是您可以检查 / watch?v = 4XpnKHJAok8& t = 56m16srel =noreferrer> Linus Torvalds本人,当他于2007年向Google呈现Git

(强调我的)


我们检查被认为是加密安全的校验和。没有人能够打破SHA-1,但问题是,就git而言, SHA-1甚至不是安全功能。这纯粹是一致性检查

安全部分在其他地方。很多人都认为git使用SHA-1和SHA-1用于加密安全的东西,他们认为这是一个巨大的安全功能。它与安全没有任何关系,它只是你可以得到的最好的散列。


拥有良好的散列能够信任您的数据,它恰巧也具有其他一些优秀功能,这意味着何时我们散列对象,我们知道散列分布很好,我们不必担心某些分布问题。从内部来看,从实现的角度来看,我们可以相信散列非常好,我们可以使用散列算法,并且知道没有不好的情况。



因此,喜欢加密方面也有一些原因,但它确实是关于信任您的数据的能力。

我保证,如果你把你的数据放在git中,你可以相信五年之后,在它从硬盘到DVD转换为任何新技术并将其复制之后,五年之后,你可以验证你获得的数据out与您输入的数据完全相同。这是您真正应该在源代码管理系统中寻找的东西。






2017年12月更新Git 2.16(Q1 2018):这项努力支持另一种SHA正在进行中:请参阅为什么Git不使用更现代的SHA?






我在如何处理SHA -1碰撞blob?你可以使用特定的SHA1 前缀工程提交(仍然是一个非常昂贵的努力)。

但是问题仍然存在,因为 Eric Sink Git:Cryptographic Hashes 版本控制示例(2011)书


DVCS从不会遇到两个不同的数据片段有 同样的摘要。幸运的是,好的加密散列函数旨在使这种冲突极不可能发生。


很难找到好的非密码散列,低冲突率,除非你考虑研究如使用遗传编程寻找最先进的非密码哈希。 p>

您也可以阅读考虑使用non-哈希加速的加密哈希算法,其中提到了例如 xxhash <






讨论改变哈sh在Git中并不新鲜:



(Linus Torvalds)


mozilla代码没有任何其他的剩余 ,但是,嘿,我从它开始。回想起来,我可能应该已经从PPC asm代码开始,这些代码已经完全阻止了它 - 但这是一种20/20事后类型的事情。

另外,嘿, mozilla代码是一堆可怕的东西,这就是为什么我确信我可以改进事情。所以这就是它的一个来源,即使它比其他实际剩余的代码更关心动机方面;)

你需要要小心如何衡量实际的优化收益



(Linus Torvalds)


我几乎可以保证你能改善只是因为它让gcc生成垃圾代码,然后隐藏了一些P4问题。






(John Tapsell - johnflux


将git从SHA-1升级到a的工程成本新算法要高得多。我不确定如何做得好。首先,我们可能需要部署一个git版本(让我们称之为版本2,用于这个对话),它允许有一个新的散列值的插槽,甚至是尽管它不读取或使用该空间 - 它只是使用另一个插槽中的SHA-1散列值。



这样,一旦我们最终部署了一个更新版本的git,我们称之为版本3,除了生成SHA-3哈希SHA-1哈希,使用git版本2的用户将能够继续互操作。

(尽管根据这个讨论,他们可能很容易受到攻击,而依赖sha-1补丁的用户)

总之,切换到任何哈希值并不容易。






2017年2月更新:是的,理论上可以计算冲突SHA1: shattered.io


GIT如何影响?

GIT强烈依赖SHA-1对所有文件对象和提交进行标识和完整性检查。

实际上可以创建两个GIT存储库具有相同的头部提交散列和不同的内容,这是一个良性的说法urce代码和一个后备的代码。

攻击者可能有选择性地将存储库提供给目标用户。这将要求攻击者计算他们自己的碰撞。


但是:


需要此次攻击超过9,223,372,036,854,775,808 SHA1计算。这花费了6,500年单CPU计算和110年单GPU计算的等效处理能力。


所以,让我们不要惊慌。 stackoverflow.com/a/42450327/6309\"> Git会如何处理blob上的SHA-1冲突?


Why does Git use SHA-1, a cryptographic hash function, instead of a faster non-cryptographic hash function?

Related question:

Stack Overflow question Why does Git use SHA-1 as version numbers? asks why Git uses SHA-1 as opposed to sequential numbers for commits.

解决方案

You can check that from Linus Torvalds himself, when he presented Git to Google back in 2007:
(emphasis mine)

We check checksums that is considered cryptographically secure. Nobody has been able to break SHA-1, but the point is, SHA-1 as far as git is concerned, isn't even a security feature. It's purely a consistency check.
The security parts are elsewhere. A lot of people assume since git uses SHA-1 and SHA-1 is used for cryptographically secure stuff, they think that it's a huge security feature. It has nothing at all to do with security, it's just the best hash you can get.

Having a good hash is good for being able to trust your data, it happens to have some other good features, too, it means when we hash objects, we know the hash is well distributed and we do not have to worry about certain distribution issues.

Internally it means from the implementation standpoint, we can trust that the hash is so good that we can use hashing algorithms and know there are no bad cases.

So there are some reasons to like the cryptographic side too, but it's really about the ability to trust your data.
I guarantee you, if you put your data in git, you can trust the fact that five years later, after it is converted from your harddisc to DVD to whatever new technology and you copied it along, five years later you can verify the data you get back out is the exact same data you put in. And that is something you really should look for in a source code management system.


Update Dec. 2017 with Git 2.16 (Q1 2018): this effort to support an alternative SHA is underway: see "Why doesn't Git use more modern SHA?".


I mentioned in "How would git handle a SHA-1 collision on a blob?" that you could engineer a commit with a particular SHA1 prefix (still an extremely costly endeavor).
But the point remains, as Eric Sink mentions in "Git: Cryptographic Hashes" (Version Control by Example (2011) book:

It is rather important that the DVCS never encounter two different pieces of data which have the same digest. Fortunately, good cryptographic hash functions are designed to make such collisions extremely unlikely.

It is harder to find good non-cryptographic hash with low collision rate, unless you consider research like "Finding State-of-the-Art Non-cryptographic Hashes with Genetic Programming".

You can also read "Consider use of non-cryptographic hash algorithm for hashing speed-up", which mentions for instance "xxhash", an extremely fast non-cryptographic Hash algorithm, working at speeds close to RAM limits.


Discussions around changing the hash in Git are not new:

(Linus Torvalds)

There's not really anything remaining of the mozilla code, but hey, I started from it. In retrospect I probably should have started from the PPC asm code that already did the blocking sanely - but that's a "20/20 hindsight" kind of thing.

Plus hey, the mozilla code being a horrid pile of crud was why I was so convinced that I could improve on things. So that's a kind of source for it, even if it's more about the motivational side than any actual remaining code ;)

And you need to be careful about how to measure the actual optimization gain

(Linus Torvalds)

I pretty much can guarantee you that it improves things only because it makes gcc generate crap code, which then hides some of the P4 issues.

(John Tapsell - johnflux)

The engineering cost for upgrading git from SHA-1 to a new algorithm is much higher. I'm not sure how it can be done well.

First of all we probably need to deploy a version of git (let's call it version 2 for this conversation) which allows there to be a slot for a new hash value even though it doesn't read or use that space -- it just uses the SHA-1 hash value which is in the other slot.

That way once we eventually deploy yet a newer version of git, let's call it version 3, which produces SHA-3 hashes in addition to SHA-1 hashes, people using git version 2 will be able to continue to inter-operate.
(Although, per this discussion, they may be vulnerable and people who rely on their SHA-1-only patches may be vulnerable.)

In short, switching to any hash is not easy.


Update February 2017: yes, it is in theory possible to compute a colliding SHA1: shattered.io

How is GIT affected?

GIT strongly relies on SHA-1 for the identification and integrity checking of all file objects and commits.
It is essentially possible to create two GIT repositories with the same head commit hash and different contents, say a benign source code and a backdoored one.
An attacker could potentially selectively serve either repository to targeted users. This will require attackers to compute their own collision.

But:

This attack required over 9,223,372,036,854,775,808 SHA1 computations. This took the equivalent processing power as 6,500 years of single-CPU computations and 110 years of single-GPU computations.

So let's not panic just yet.
See more at "How would Git handle a SHA-1 collision on a blob?".

这篇关于为什么Git使用密码散列函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆