为什么不能反转加密散列? [英] Why is it not possible to reverse a cryptographic hash?

查看:151
本文介绍了为什么不能反转加密散列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为什么你不能逆转算法,就像你可以扭转一个数学函数?如何使一个不可逆的算法成为可能?



如果您使用彩虹表,使用盐不可能破解它?如果你用暴力制作彩虹表来生成彩虹表,那么它会发明每个明文值(可能是一个长度),最终会包括每个可能的密码和每个可能的盐的盐(盐和密码/文本将只需一起作为单一文本。

解决方案

MD5设计为加密不可逆。在这种情况下,最重要的特性是在计算上找不到散列的反向是不可行的,但是很容易找到任何数据的散列。例如,让我们考虑一下只是操作数字(二进制​​文件毕竟可以解释为只是一个很长的数字)。



假设我们有数字7 ,而我们想把它的哈希值。也许我们尝试的第一件事是我们的哈希函数是乘以2。我们将看到,这不是一个很好的哈希函数,但是我们将尝试一下,来说明一点。在这种情况下,数字的散列值将为14。这很容易计算。但现在,如果我们看看扭转它有多难,我们发现它也是一样容易!给定任何哈希,我们可以将它除以两个得到原来的数字!这不是一个好的哈希,因为散列的整个点是,比计算哈希的反算要难得多(这是至少在一些上下文中最重要的属性) )。



现在,让我们尝试另一个哈希。对于这一个,我将要介绍时钟算术的想法。在时钟上,没有无数的数字。事实上,它只是从0到11(记住,0和12在时钟上是一样的)。所以如果你把添加一个到11,你只需要零。您可以将乘法,加法和乘幂的想法扩展到时钟。例如,8 + 7 = 15,但15时钟真的只有3!所以在一个时钟,你会说8 + 7 = 3! 6 * 6 = 36,但在时钟上,36 = 0!所以6 * 6 = 0!现在,为了权力的概念,你可以做同样的事情。 2 ^ 4 = 16,但16只是4.所以2 ^ 4 = 4!现在,这是它如何联系到哈希。我们如何尝试散列函数f(x)= 5 ^ x,但是使用时钟算术。如您所见,这将导致一些有趣的结果。让我们尝试像以前一样使用7的哈希值。



我们看到5 ^ 7 = 78125,但在时钟上,只有5(如果你做数学,你会看到我们已经包围了6510次)。所以我们得到f(7)= 5。现在的问题是,如果我告诉你我的号码的哈希数是5,那么你可以弄清楚我的号码是7吗?那么它实际上很难在一般情况下计算出这个函数的相反。比我更聪明的人已经证明,在某些情况下,逆转这个功能是比计算前进更难的方式。 (编辑:Nemo已经指出,这实际上并没有被证明,事实上,唯一的保证就是很多聪明人尝试了很长时间才能找到一个简单的方法,并且都没有成功。)将此操作反转的问题称为离散对数问题。查看更多深入报道。这至少是一个好的哈希函数的开头。



使用真实世界的哈希函数,这个想法基本上是一样的:你发现一些很难扭转的功能。比我更聪明的人已经设计了MD5和其他哈希值,使它们难以扭转。



现在,或许更早的想法已经发生在你身上:这很容易来计算倒数!我只需要使用每个数字的哈希值,直到找到匹配的数据!现在,对于数字不足十二点的情况,这是可行的。但是,对于真实世界哈希函数的模拟,可以想象所有涉及的数字都是巨大的。这个想法是,为这些大数字计算哈希函数还是比较容易的,但是通过所有可能的输入搜索变得越来越快。但是你所绊倒的是仍然是一个非常重要的想法,但是通过输入空间搜索可以给出匹配输出的输入。彩虹表是一个更为复杂的变化,它以智能方式使用输入输出对的预计算表,以便能够快速搜索大量可能的输入。



现在我们假设您正在使用哈希函数来存储计算机上的密码。这个想法是这样的:电脑只是存储正确密码的散列。当用户尝试登录时,将输入密码的哈希与正确密码的散列进行比较。如果它们匹配,则假设用户具有正确的密码。这是有利的原因是,如果有人窃取您的电脑,他们仍然无法访问您的密码,只是其哈希。因为哈希函数是由聪明人设计的,难以反过来,他们无法轻松地从中检索您的密码。



攻击者最好的选择是暴力攻击,他们尝试一堆密码。就像您可能尝试在之前的问题中少于12的数字,攻击者可能尝试所有密码仅由少于7个字符的数字和字母组成,或字典中显示的所有字词。这里重要的是,他无法尝试使用所有可能的密码,因为有太多可能的16个字符的密码,例如 测试。所以重点是,攻击者必须限制他测试的密码,否则他甚至不会检查一小部分。



现在,就盐而言,这个想法是这样的:如果两个用户有相同的密码怎么办?他们会有相同的哈希。如果您想到这一点,攻击者并不需要单独破解每个用户的密码。他只是简单地浏览每个可能的输入密码,并将散列与所有哈希进行比较。如果匹配其中一个,那么他找到了一个新密码。我们真的要强迫他做的是为每个用户+密码组合计算一个新的哈希值。这就是盐的想法,就是让每个用户使哈希函数略有不同,所以他不能为所有用户重用一组预先计算的值。最直接的方法是在使用散列之前,将每个用户密码的一些随机字符串粘贴到一起,其中每个用户的随机字符串不同。所以,例如,如果我的密码是shittypassword,我的哈希可能显示为MD5(6n93nshittypassword),如果您的密码是shittypassword,您的哈希可能会显示为MD5(fa9elshittypassword)。这一点fa9el被称为盐,每个用户都不同。例如,我的盐是6n93n。现在,这个一点点被贴在你的密码上也只是存储在你的电脑上。当您尝试使用密码X登录时,计算机只能计算MD5(fa9el+ X),并查看它是否与存储的哈希匹配。所以登录的基本机制保持不变,但对于攻击者来说,他们现在面临着更为艰巨的挑战:而不是MD5哈希列表,它们是面临MD5和盐的列表。他们基本上有两个选择:


  1. 他们可以忽略散列哈希的事实,并尝试用他们的密码破解密码查找表。然而,他们实际破解密码的机会大大减少。例如,即使shittypassword在他们的输入列表中进行检查,很可能fa9elshittypassword不是。为了获得破译他们以前使用的密码的可能性的一小部分,他们需要测试更多可能密码的数量级。


  2. 他们可以在每个用户的基础上重新计算哈希值。因此,不是计算MD5(passwordguess),对于每个用户X,他们计算MD5(Salt_of_user_X + passwordguess)。这不仅迫使他们为每个想要破解的用户计算一个新的哈希值,而且最重要的是它阻止他们能够使用预先计算的表(例如彩虹表),因为他们不知道什么Salt_of_user_X在手之前,所以他们无法预先计算散列值来测试。


所以基本上,如果他们试图为了使用预先计算的表,使用salt 有效地大大增加了为了破解密码而必须进行测试的可能输入,即使它们没有使用预先计算的表,它仍然会减慢因素N,其中N是您存储的密码数。



希望这可以回答您的所有问题。


Why can't you just reverse the algorithm like you could reverse a math function? How is it possible to make an algorithm that isn't reversible?

And if you use a rainbow table, what makes using a salt impossible to crack it? If you are making a rainbow table with brute force to generate it, then it invents each plaintext value possible (to a length), which would end up including the salt for each possible password and each possible salt (the salt and password/text would just come together as a single piece of text).

解决方案

MD5 is designed to be cryptographically irreversible. In this case, the most important property is that it is computationally unfeasible to find the reverse of a hash, but it is easy to find the hash of any data. For example, let's think about just operating on numbers (binary files after all, could be interpreted as just a very long number).

Let's say we have the number "7", and we want to take the hash of it. Perhaps the first thing we try as our hash function is "multiply by two". As we'll see, this is not a very good hash function, but we'll try it, to illustrate a point. In this case, the hash of the number will be "14". That was pretty easy to calculate. But now, if we look at how hard it is to reverse it, we find that it is also just as easy! Given any hash, we can just divide it by two to get the original number! This is not a good hash, because the whole point of a hash is that it is much harder to calculate the inverse than it is to calculate the hash (this is the most important property in at least some contexts).

Now, let's try another hash. For this one, I'm going to have to introduce the idea of clock arithmetic. On a clock, there aren't an infinite amount of number. In fact, it just goes from 0 to 11 (remember, 0 and 12 are the same on a clock). So if you "add one" to 11, you just get zero. You can extend the ideas of multiplication, addition, and exponentiation to a clock. For example, 8+7=15, but 15 on a clock is really just 3! So on a clock, you would say 8+7=3! 6*6=36, but on a clock, 36=0! so 6*6=0! Now, for the concept of powers, you can do the same thing. 2^4=16, but 16 is just 4. So 2^4=4! Now, here's how it ties into hashing. How about we try the hash function f(x)=5^x, but with clock arithmetic. As you'll see, this leads to some interesting results. Let's try taking the hash of 7 as before.

We see that 5^7=78125 but on a clock, that's just 5 (if you do the math, you see that we've wrapped around the clock 6510 times). So we get f(7)=5. Now, the question is, if I told you that the hash of my number was 5, would you be able to figure out that my number was 7? Well, it's actually very hard to calculate the reverse of this function in the general case. People much smarter than me have proved that in certain cases, reversing this function is way harder than calculating it forward. (EDIT: Nemo has pointed out that this in fact has not been "proven"; in fact, the only guarantee you get is that a lot of smart people have tried a long time to find an easy way to do so, and none of them have succeeded.) The problem of reversing this operation is called the "Discrete Logarithm Problem". Look it up for more in depth coverage. This is at least the beginning of a good hash function.

With real world hash functions, the idea is basically the same: You find some function that is hard to reverse. People much smarter than me have engineered MD5 and other hashes to make them provably hard to reverse.

Now, perhaps earlier the thought has occurred to you: "it would be easy to calculate the inverse! I'd just take the hash of every number until I found the one that matched!" Now, for the case where the numbers are all less than twelve, this would be feasible. But for the analog of a real-world hash function, imagine all the numbers involved are huge. The idea is that it is still relatively easy to calculate the hash function for these large numbers, but to search through all possible inputs becomes harder much quicker. But what you've stumbled upon is the still a very important idea though: searching through the input space for an input which will give a matching output. Rainbow tables are a more complex variation on the idea, which use precomputed tables of input-output pairs in smart ways in order to make it possible to quickly search through a large number of possible inputs.

Now let's say that you are using a hash function to store passwords on your computer. The idea is this: The computer just stores the hash of the correct password. When a user tries to login, you compare the hash of the input password to the hash of the correct password. If they match, you assume the user has the correct password. The reason this is advantageous is because if someone steals your computer, they still don't have access to your password, just the hash of it. Because the hash function was designed by smart people to be hard to take the reverse of, they can't easily retrieve your password from it.

An attacker's best bet is a bruteforce attack, where they try a bunch of passwords. Just like you might try the numbers less that 12 in the previous problem, an attacker might try all the passwords just composed of numbers and letters less than 7 characters long, or all words which show up in the dictionary. The important thing here is that he can't try all possible passwords, because there are way too many possible 16 character passwords, for example, to ever test. So the point is that an attacker has to restrict the possible passwords he tests, otherwise he will never even check a small percentage of them.

Now, as for a salt, the idea is this: What if two users had the same password? They would have the same hash. If you think about it, the attacker doesn't really have to crack every users password individually. He simply goes through every possible input password, and compares the hash to all the hashes. If it matches one of them, then he has found a new password. What we'd really like to force him to do is calculate a new hash for every user+password combination he wants to check. That's the idea of a salt, is that you make the hash function be slightly different for every user, so he can't reuse a single set of precomputed values for all users. The most straightforward way to do this is to tack on some random string to each user's password before you take the hash, where the random string is different for each user. So, for example, if my password is "shittypassword", my hash might show up as MD5("6n93nshittypassword") and if your password is "shittypassword", your hash might show up as MD5("fa9elshittypassword"). This little bit "fa9el" is called the "salt", and it's different for every user. For example, my salt is "6n93n". Now, this little bit which is tacked on to your password is just stored on your computer as well. When you try to login with the password X, the computer can just calculate MD5("fa9el"+X) and see if it matches the stored hash.

So the basic mechanics of logging in remain unchanged, but for an attacker, they are now faced with a more daunting challenge: rather than a list of MD5 hashes, they are faced with a list of MD5 sums and salts. They essentially have two options:

  1. They can ignore the fact that the hashes are salted, and try to crack the passwords with their lookup table as is. However, the chances that they'll actually crack a password are much reduced. For example, even if "shittypassword" is on their list of inputs to check, most likely "fa9elshittypassword" isn't. In order to get even a small percentage of the probability of cracking a password that they had before, they'll need to test orders of magnitude more possible passwords.

  2. They can recalculate the hashes on a per-user basis. So rather than calculating MD5(passwordguess), for each user X, they calculate MD5( Salt_of_user_X + passwordguess). Not only does this force them to calculate a new hash for each user they want to crack, but also most importantly, it prevents them from being able to use precalculated tables (like rainbow table, for example), because they can't know what Salt_of_user_X is before hand, so they can't precalculate the hashes to test.

So basically, if they are trying to use precalculated tables, using a salt effectively greatly increases the possible inputs they have to test in order to crack the password, and even if they aren't using precalculated tables, it still slows them down by a factor of N, where N is the number of passwords you are storing.

Hopefully this answers all your questions.

这篇关于为什么不能反转加密散列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆