SHA-1是否安全用于密码存储? [英] Is SHA-1 secure for password storage?

查看:206
本文介绍了SHA-1是否安全用于密码存储?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

结论: SHA-1与任何防止preimage攻击的安全一样安全,但它很容易计算,这意味着更容易安装bruteforce或字典攻击。 (对于诸如SHA-256的后继者也是如此。)根据情况,被设计为计算上昂贵的哈希函数(例如bcrypt)可能是更好的选择。






有些人很多地抛出SHA-1坏了这样的话,所以我想了解到底是什么意思。让我们假设我有一个SHA-1密码哈希的数据库,一个攻击者拥有一个最先进的SHA-1破解算法,一个拥有100,000台机器的僵尸网络可以访问它。 (控制超过10万台家用电脑意味着他们每秒可以做10 ^ 15次操作。)他们需要多少时间。


  1. 找到任何一个用户的密码?

  2. 找出给定用户的密码

  3. 找出所有用户的密码?

  4. 找到一种以特定用户身份登录的方法?

如果密码被盐化,这会如何变化?是否盐化方法(前缀,后缀,两者,或更复杂的东西,如xoring)很重要?



这是我目前的理解,一些谷歌之后。




  • 如果没有盐,彩虹攻击会立即找到所有密码

  • 如果有足够长的随机盐,找出密码的最有效的方法是强力或字典攻击。无论是冲突还是前瞻性攻击都无法帮助找到实际的密码,因此对SHA-1的加密攻击在这里没有帮助。它甚至没有什么使用什么算法 - 甚至可以使用MD5或MD4和密码将是安全的(有一个细微的差异,因为计算SHA-1哈希更慢)。

  • 为了评估安全的安全性,我们假设单个sha1运行需要1000次操作,密码包含大写,小写和数字(即60个字符)。这意味着攻击者每天可以测试10 15 * 60 * 60 * 24/1000〜= 10 17 潜在密码。对于暴力攻击,这意味着在3小时内测试所有密码,最多9个字符,一周最多10个字符,一年最多11个字符。 (对于每个附加字符,它需要60倍的字符)。字典攻击是多得多,更快(即使一个计算机的攻击者可以在几个小时内将其关闭),但只能找到弱密码。

  • 要以用户身份登录,攻击者不需要找到确切的密码;它足以找到一个字符串,导致相同的哈希。这被称为第一次preimage攻击。就我所能找到的,没有对SHA-1的preimage攻击。 (暴力攻击需要2 160 次操作,这意味着我们的理论攻击者需要10次 30年才能将其拉下来。理论可能性的极限值约为2 60 操作,攻击需要几年时间)。 160 操作下降到2 157 )。存在对SHA-1的碰撞攻击,这在理论上是可能的(我找到的最好的时间从2 80 到2 52



简而言之,使用SHA-1存储密码似乎是非常安全的。我错过了什么?



更新: Marcelo指出了一篇文章,其中提到 2 106 操作中的第二个镜像攻击。 (编辑:由于 Thomas解释,这个攻击是一个假设的结构,不适用于现实生活场景。)我仍然没有看到这是如何使用SHA-1作为密钥导出函数的危险。是否有充分的理由认为碰撞攻击或第二个preimage攻击最终会变成第一个preimage攻击?

解决方案

你的问题的简单答案是:SHA-1是安全的,你可以得到。 MD5也会很好,甚至MD4;但它可以使一些投资者紧张。对于公共关系,最好使用更好的哈希函数,例如SHA-256,即使您将其输出截断为160或128位(以节省存储成本)。部分 SHA-3第2轮候选人 a>似乎比SHA-1更快,同时可以说是更安全;但它们仍然有点新,所以坚持SHA-256或SHA-512将是一个更安全的路线,现在。



请注意,尽可能安全与完全安全是不一样的。


$ b

关于已知的攻击 对MD4,MD5和SHA-1的攻击是关于碰撞,这不影响原像抗性。已经表明,MD4有几个弱点,可以(只有理论上)利用时试图打破HMAC / MD4,但这不适用于您的问题。 Kesley和Schneier的论文中的2 106 秒前象攻击是一种通用的权衡,它只适用于非常长的输入(2 60 字节;这是一个百万兆字节 - - 注意如何106 + 60超过160;这是你看到的权衡在它没有什么神奇的地方。)



这个消息的其余部分假设散列函数你使用(例如SHA-1)是一个黑盒子,没有特殊的属性,攻击者可以使用。



关于彩虹表:

这是您现在所拥有的,即使使用破碎散列函数MD5和SHA-

彩虹攻击实际上是字典的成本分摊或暴力攻击。它是来自的衍生作品时间存储器权衡,首先由Hellman在1980年描述。假设你有 N 个可能的密码(这是你的字典的大小,或2 n 如果你考虑使用 n 位的输出强制执行散列函数),则会发生时间共享攻击,其中预先计算 N 哈希密码并将它们存储在一个大表中。如果对哈希输出进行排序,则可以在单个查找中获取密码。彩虹表是一种智能的方式来存储具有大大减少的空间的表。您只储存了 N / t 散列的密码,并且使用O( t 2 )查找破解密码。彩虹表允许你虚拟地处理比你真正存储的预计算表大得多。



然而,彩虹或者不是,攻击者仍然必须运行完全攻击一旦。这可以被视为几个连续的优化层:


  1. 暴力/字典攻击的开销为 N 破解每个密码。

  2. 使用预先计算的表格,攻击者会支付 N 一次

  3. 如果预先计算的表是彩虹表,则 N 可能有点大,因为存储成本降低。 的瓶颈成为攻击者可以分配的CPU权力,而不是其硬盘的大小。

如果 N 足够大,散列 N 密码的CPU成本是可笑的,那么这种攻击是不可行的,不管彩虹表是否使用。这意味着输出为80位或更多的(防前像)散列函数足以使暴力攻击不可行。



关于盐:



盐是一种破坏预先计算的方法。在上面的描述中,盐使攻击者回到步骤1:盐化防止攻击者在几个被攻击的密码之间共享O( N )成本。



您希望进行盐化,因为当散列数据包含在 密码,即适合随机人类的大脑,那么 N 可能相当低:人类真的很难选择和记住密码。这是什么字典攻击是:使用减少的潜在密码空间(字典),假设许多用户密码将在该特别选择的空间。



因此,盐化将至少防止攻击者使用预先计算的表,特别是预先计算的彩虹表。这假设攻击者 能够打破一个或两个密码;我们不希望他打破1000个其他密码,只需一点额外的开销。



此外,盐腌对公共关系有利。



关于SHA-1费用:



SHA-1的基本成本是关于散列64字节的块。这就是SHA-1的工作原理:数据被填充,然后拆分成64字节块。在Intel Core2系统上处理单个块的成本约为500个时钟周期,这是单个核心的成本。 MD5和MD4更快,分别计数约400和250个周期。不要忘记,大多数现代CPU有几个核心,因此倍增。



一些盐化方案规定了巨大的盐;例如什么进入哈希函数其实是40000连续拷贝一个单一的128位盐,其次是密码本身。这使得密码哈希更昂贵(对我的例子,10000的系数),无论是对合法用户和攻击者。这是否一个好主意取决于设置。对于在桌面系统上登录,这是好的:用户甚至不会注意到花了10毫秒来哈希他的密码,而不是1μs;但攻击者的成本已经上升了一个非常明显的因素10000.在每秒数以千计客户端的共享服务器上,总成本可能变得禁止。从概念上讲,提高合法用户和攻击者的相同因素的阻力不是最终好的安全性;

上述所有内容都是关于离线攻击的。离线攻击是一种攻击,攻击者拥有其所需的所有数据,以便测试密码;例如攻击者可以获取保存有哈希密码的数据库的副本。在离线攻击中,攻击者仅受其计算资源的限制。相反,在线攻击是攻击者的每次猜测必须通过一个诚实的验证者(例如,攻击者只是试图登录受攻击的系统)的攻击。通过强制限制每秒可以尝试多少密码来阻止在线攻击。极端的例子是在三个错误的PIN码之后关闭的智能卡。



通常,对于密码安全,它安排系统不让攻击者建立离线攻击。这是Unix系统所做的:哈希密码,以前在世界上可读的 / etc / password 文件,现在在 / etc / shadow 文件,它受到读访问的保护,除了少数特权应用程序。这里的假设是,如果攻击者可以读取 / etc / shadow ,那么他可能对系统有足够的控制,他不再需要密码了...


Conclusion: SHA-1 is as safe as anything against preimage attacks, however it is easy to compute, which means it is easier to mount a bruteforce or dictionary attack. (The same is true for successors like SHA-256.) Depending on the circumstances, a hash function which was designed to be computationally expensive (such as bcrypt) might be a better choice.


Some people throw around remarks like "SHA-1 is broken" a lot, so I'm trying to understand what exactly that means. Let's assume I have a database of SHA-1 password hashes, and an attacker whith a state of the art SHA-1 breaking algorithm and a botnet with 100,000 machines gets access to it. (Having control over 100k home computers would mean they can do about 10^15 operations per second.) How much time would they need to

  1. find out the password of any one user?
  2. find out the password of a given user?
  3. find out the password of all users?
  4. find a way to log in as one of the users?
  5. find a way to log in as a specific user?

How does that change if the passwords are salted? Does the method of salting (prefix, postfix, both, or something more complicated like xor-ing) matter?

Here is my current understanding, after some googling. Please correct in the answers if I misunderstood something.

In short, storing passwords with SHA-1 seems perfectly safe. Did I miss something?

Update: Marcelo pointed out an article which mentions a second preimage attack in 2106 operations. (Edit: As Thomas explains, this attack is a hypothetical construct which does not apply to real-life scenarios.) I still don't see how this spells danger for the use of SHA-1 as a key derivation function, though. Are there generally good reasons to think that a collision attack or a second preimage attack can be eventually turned into a first preimage attack?

解决方案

The short answer to your question is: SHA-1 is as secure as you can get. MD5 would be fine too, even MD4; but it could make some investors nervous. For public relations, it is best to use a "better" hash function, e.g. SHA-256, even if you truncate its output to 160 or 128 bits (to save on storage cost). Some of the SHA-3 round-2 candidates appear to be faster than SHA-1 while being arguably "more secure"; yet they are still a bit new, so sticking to SHA-256 or SHA-512 would be a safer route right now. It would make you look professional and cautious, which is good.

Note that "as secure as you can get" is not the same as "perfectly safe". See below for rather lengthy explanations.

About known attacks:

The known attacks on MD4, MD5 and SHA-1 are about collisions, which do not impact preimage resistance. It has been shown that MD4 has a few weaknesses which can be (only theoretically) exploited when trying to break HMAC/MD4, but this does not apply to your problem. The 2106 second preimage attack in the paper by Kesley and Schneier is a generic trade-off which applies only to very long inputs (260 bytes; that's a million terabytes -- notice how 106+60 exceeds 160; that's where you see that the trade-off has nothing magic in it).

The rest of this message assumes that the hash function you use (e.g. SHA-1) is a "black box" with no special property that the attacker may use. That's what you have right now even with the "broken" hash functions MD5 and SHA-1.

About rainbow tables:

The "rainbow attack" is actually cost-sharing of a dictionary or brute force attack. It is a derivative from the time-memory trade-off first described by Hellman in 1980. Assuming that you have N possible passwords (that's the size of your dictionary, or 2n if you consider brute-forcing a hash function with an output of n bits), there is a time-sharing attack in which you precompute the N hashed passwords and store them in a big table. If you sort the hash outputs, you can get your password in a single lookup. A rainbow table is a smart way to store that table with a much reduced space. You store only N/t hashed passwords, and you crack passwords with O(t2) lookups. Rainbow tables allow you to virtually handle precomputed tables much larger than what you can realistically store.

However, rainbow or not, the attacker still has to run the full attack at least once. This can be viewed as several successive optimization layers:

  1. The brute-force / dictionary attack has cost N for cracking each password.
  2. With a pre-computed table, the attacker pays that cost N once and can thereafter attack many passwords with very small extra cost per password.
  3. If the pre-computed table is a rainbow table, then N can be somewhat bigger, because storage cost is reduced. The bottleneck on N becomes the CPU power that the attacker can muster, not the size of his harddisks.

If N is large enough that the CPU-cost of hashing N passwords is ludicrous, then such an attack is not feasible, regardless of whether rainbow tables are used or not. This means that a (preimage-resistant) hash function with an output of 80 bits or more is enough to make brute-force attack infeasible.

About salts:

Salts are a way to defeat pre-computations. In the description above, the salt brings back the attacker to step 1: salting prevents the attacker from sharing the O(N) cost between several attacked passwords. Pre-computed tables, a fortiori rainbow tables, are no longer feasible.

You want salting because when the hashed data consists in passwords, i.e. something which fits within the brain of a random human being, then N can be quite low: humans are really bad at choosing and remembering passwords. This is what "dictionary attacks" are about: that's using a reduced space of potential passwords (the "dictionary") under the assumption that many user passwords will be in that specially selected space.

Hence salting will at least prevent the attacker from using pre-computed tables, in particular pre-computed rainbow tables. This assumes that the attacker will be able to break one password or two; we do not want him to break 1000 other passwords with little extra overhead.

Also, salting is good for public relations.

About SHA-1 cost:

The elementary cost of SHA-1 is about hashing a 64-byte block. That's how SHA-1 works: data is padded, then split into 64-byte blocks. The cost of processing a single block is about 500 clock cycles on an Intel Core2 system, and that's for a single core. MD5 and MD4 are faster, count about 400 and 250 cycles, respectively. Do not forget that most modern CPU have several cores, so multiply accordingly.

Some salting schemes prescribe huge salts; e.g. what enters the hash function is actually 40000 successive copies of a single 128-bit salt, followed by the password itself. This makes password hashing more expensive (by a factor of 10000 with my example), both for the legitimate user and for the attacker. Whether this is a good idea depends on the setup. For login on a desktop system, this is good: the user will not even notice that it took 10ms to hash his password, instead of 1µs; but the cost for the attacker has risen by a very noticeable factor 10000. On shared servers with thousands of clients per second, the aggregate cost may become prohibitive. Conceptually, raising the bar by the same factor for the legitimate user and the attacker is not ultimately good security; but it can be a worthwhile idea in some specific situations.

About online attacks:

All of the above is about defeating offline attacks. An offline attack is an attack where the attacker has all the data he needs in order to "test" passwords; e.g. the attacker could get a copy of the database holding the hashed passwords. In an offline attack, the attacker is limited only by his computational resources. Conversely, an online attack is an attack where each guess by the attacker must go through an honest verifier (e.g. the attacker simply tries to log in on the attacked system). Online attacks are thwarted by enforcing limits on how many passwords can be tried per second. Extreme examples are smartcards which shut down after three wrong PINs.

Usually, for password security, it pays off much more to arrange the system for not letting an attacker build an offline attack. That's what Unix systems do: the hashed passwords, which used to be in the world-readable /etc/password file, are now in the /etc/shadow file which is protected against read access, except by a few privileged applications. The assumption here is that if the attacker can read /etc/shadow, then he probably has enough control over the system that he does not really need passwords anymore...

这篇关于SHA-1是否安全用于密码存储?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆