散列和加密算法之间的基本差异 [英] Fundamental difference between Hashing and Encryption algorithms

查看:219
本文介绍了散列和加密算法之间的基本差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我发现散列和加密算法之间有很多混乱,我想听听一些关于以下方面的专家建议:


  1. 何时使用哈希与加密


  2. 什么使散列或加密算法不同于理论/数学水平
    即什么使哈希不可逆(没有彩虹树的帮助)


以下是一些不会像我在找的那么详细。



混淆,散列和加密之间有什么区别?

加密和散列之间的区别

解决方案

Well ,您可以在维基百科中查找。但是因为你想要一个解释,我会在这里做最好的:



Hash函数



它们提供任意长度输入和(通常)固定长度或更小长度)输出。它可以是从简单的crc32到完全破解的加密散列函数,例如MD5或SHA1 / 2/256/512。关键是有一个单向映射。它总是一个很多:1映射(意味着将总是有冲突),因为每个函数产生比它能够输入更小的输出(如果你把每个可能的1mb文件送入MD5,你会得到一吨冲突)。

他们很难(或不可能实际)逆转的原因是因为他们如何在内部工作。大多数加密散列函数对输入集进行多次迭代以产生输出。因此,如果我们看每个固定长度的输入块(这是算法依赖),哈希函数将调用当前状态。然后它将对状态进行迭代,并将其更改为一个新的,并将其作为反馈用于自身(MD5对每512位数据块进行64次)。然后,它以某种方式将所有这些迭代的结果状态合并在一起,形成结果散列。



现在,如果你想解码散列,你首先需要找出如何将给定的哈希分割为其迭代状态(对于小于数据块大小的输入,可能有1种可能性,对于较大的输入,很多)。然后你需要反转每个状态的迭代。现在,为了解释为什么这是非常困难的,想象试图从以下公式推导 a b 10 = a + b 。有 a b 可以工作的10个正组合。现在循环一次: tmp = a + b; a = b; b = tmp 。对于64次迭代,你可以尝试超过10 ^ 64的可能性。这只是一个简单的添加,其中一些状态被保留从迭代到迭代。真正的哈希函数做了很多操作(MD5做了大约15个操作对4个状态变量)。由于下一次迭代取决于前一个状态,前一个状态在创建当前状态时被破坏,所以只能不可能确定导致给定输出状态的输入状态(每次迭代不少)。结合,涉及大量的可能性,并且解码甚至MD5将采取接近无限(但不是无限)的资源量。如此多的资源,如果你有一个想法的输入(对于较小的输入),甚至试图解码哈希,实际上是明显更便宜的暴力强制哈希。



加密函数



它们提供任意长度输入和输出之间的1:1映射。它们总是可逆的。重要的是要注意的是,它是可逆的使用一些方法。对于给定的键,它总是1:1。现在,有多个输入:可能生成相同输出的密钥对(实际上通常有,取决于加密函数)。良好的加密数据与随机噪声无法区分。这与总是具有一致格式的好的哈希输出不同。



使用案例



使用哈希函数,当你想要比较一个值,但不能存储纯粹的表示(由于任何数量的原因)。密码应该适合这个用例,因为您不想为了安全原因(而不应该)存储纯文本。但是,如果你想检查文件系统的盗版音乐文件怎么办?对于每个音乐文件存储3mb是不切实际的。所以,取而代之,文件的哈希,并存储(md5将存储16字节,而不是3mb)。这样,你只需要哈希每个文件,并与存储的哈希数据库(这实际上不能很好地工作,因为重新编码,更改文件头等,但它是一个示例用例)。



在检查输入数据的有效性时使用哈希函数。这是他们的设计。如果您有2个输入,并且想要检查它们是否相同,则通过散列函数运行。对于小的输入大小(假定良好的散列函数),碰撞的概率是天文学低的。这就是为什么它的建议密码。对于最多32个字符的密码,md5具有4倍的输出空间。 SHA1具有6倍的输出空间(大约)。 SHA512具有约16倍的输出空间。你真的不在乎什么密码,你关心它是否与存储的密码相同。这就是为什么你应该使用哈希密码。



当你需要输入输入数据时使用加密。请注意需要一词。如果您要存储信用卡号码,则需要在某个时间将其退回,但不想将其存储为纯文本。所以,存储加密的版本,并保持密钥尽可能安全。



散列函数也适用于签名数据。例如,如果您使用HMAC,则通过将数据的散列与已知但未传输的值(秘密值)连接在一起来签名数据。所以你发送纯文本和HMAC哈希。然后,接收器简单地对具有已知值的提交数据进行散列,并检查其是否与所传输的HMAC匹配。如果它是相同的,你知道它没有篡改没有秘密价值的一方。这通常用于通过HTTP框架的安全cookie系统中,以及通过HTTP的数据的消息传输中,您需要一些数据完整性的保证。



对密码的散列:



加密哈希函数的一个关键特性是,它们应该非常快速地创建,并且非常反向(这么多,以至于它几乎不可能)。这会导致密码的问题。如果你存储 sha512(密码),你不会做一个事情来防止彩虹表或暴力攻击。记住,哈希函数是为速度而设计的。因此,攻击者只需通过哈希函数运行字典并测试每个结果即可。



添加一个salt有助于解决问题,因为它会向数据库中添加一些未知数据哈希。因此,不是找到匹配 md5(foo)的任何东西,他们需要找到一些东西,当添加到已知的盐产生 md5(foo.salt) (这是非常困难做)。但是它仍然不能解决速度问题,因为如果他们知道盐,它只是一个运行字典的问题。



所以,有办法处理这个。一种流行的方法称为键强化(或键伸缩)。基本上,你遍历哈希很多次(通常是千位)。这做了两件事。首先,它显着减慢了散列算法的运行时间。第二,如果实施正确(在每次迭代中将输入和盐反馈回来)实际上增加了输出的熵(可用空间),减少了冲突的机会。一个简单的实现是:

  var hash = password + salt; 
for(var i = 0; i <5000; i ++){
hash = sha512(hash + password + salt);
}

还有其他更标准的实现,例如PBKDF2 BCrypt 。但是这种技术被很多安全相关的系统(如PGP,WPA,Apache和OpenSSL)使用。



底线 (密码)不够好。 hash(password + salt)更好,但还不够好...使用拉伸的哈希机制来产生密码哈希...



关于简单伸展的另一个注释



在任何情况下不要将一个散列的输出直接反馈到散列函数

  hash = sha512(password + salt); 
for(i = 0; i <1000; i ++){
hash = sha512(hash); //< - 不要这样做!
}

这样做的原因与碰撞有关。请记住,所有哈希函数都有冲突,因为可能的输出空间(可能的输出数)小于输入空间。看看为什么,让我们看看会发生什么。为了说明这一点,让我们假设从 sha1()(实际上它 更低)有0.001%的碰撞概率,但是对于演示目的)。

  hash1 = sha1(password + salt); 

现在,hash1碰撞概率为0.001%。但是当我们做下一个 hash2 = sha1(hash1); 时, hash1 的所有冲突都会自动变成冲突 hash2 。现在,我们的hash1的比率为0.001%,第二个 sha1()调用增加了。所以现在, hash2 的冲突概率为0.002%。这是机会的两倍!每次迭代将向结果添加另一个 0.001%的机会。因此,使用1000次迭代,碰撞的机会从微不足道的0.001%跳到1%。现在,退化是线性的,并且实际概率较小,但效果是相同的(对 md5 大约是 1 /(2 ^ 128) 1 / 3e38 生日攻击并不像看起来那么小。)



相反,通过每次重新附加salt和密码,你将数据重新引入哈希函数。因此任何特定轮次的任何冲突不再是下一轮的冲突。所以:

  hash = sha512(password + salt); 
for(i = 0; i <1000; i ++){
hash = sha512(hash + password + salt);
}

与原生 sha512碰撞的机会相同函数。这是你想要的。请改用


I see a lot of confusion between hashes and encryption algorithms and I would like to hear some more expert advice about:

  1. When to use hashes vs encryptions

  2. What makes a hash or encryption algorithm different (from a theoretical/mathematical level) i.e. what makes hashes irreversible (without aid of a rainbow tree)

Here are some similar SO Questions that didn't go into as much detail as I was looking for:

What is the difference between Obfuscation, Hashing, and Encryption?
Difference between encryption and hashing

解决方案

Well, you could look it up in Wikipedia... But since you want an explanation, I'll do my best here:

Hash Functions

They provide a mapping between an arbitrary length input, and a (usually) fixed length (or smaller length) output. It can be anything from a simple crc32, to a full blown cryptographic hash function such as MD5 or SHA1/2/256/512. The point is that there's a one-way mapping going on. It's always a many:1 mapping (meaning there will always be collisions) since every function produces a smaller output than it's capable of inputting (If you feed every possible 1mb file into MD5, you'll get a ton of collisions).

The reason they are hard (or impossible in practicality) to reverse is because of how they work internally. Most cryptographic hash functions iterate over the input set many times to produce the output. So if we look at each fixed length chunk of input (which is algorithm dependent), the hash function will call that the current state. It will then iterate over the state and change it to a new one and use that as feedback into itself (MD5 does this 64 times for each 512bit chunk of data). It then somehow combines the resultant states from all these iterations back together to form the resultant hash.

Now, if you wanted to decode the hash, you'd first need to figure out how to split the given hash into its iterated states (1 possibility for inputs smaller than the size of a chunk of data, many for larger inputs). Then you'd need to reverse the iteration for each state. Now, to explain why this is VERY hard, imagine trying to deduce a and b from the following formula: 10 = a + b. There are 10 positive combinations of a and b that can work. Now loop over that a bunch of times: tmp = a + b; a = b; b = tmp. For 64 iterations, you'd have over 10^64 possibilities to try. And that's just a simple addition where some state is preserved from iteration to iteration. Real hash functions do a lot more than 1 operation (MD5 does about 15 operations on 4 state variables). And since the next iteration depends on the state of the previous and the previous is destroyed in creating the current state, it's all but impossible to determine the input state that led to a given output state (for each iteration no less). Combine that, with the large number of possibilities involved, and decoding even an MD5 will take a near infinite (but not infinite) amount of resources. So many resources that it's actually significantly cheaper to brute-force the hash if you have an idea of the size of the input (for smaller inputs) than it is to even try to decode the hash.

Encryption Functions

They provide a 1:1 mapping between an arbitrary length input and and output. And they are always reversible. The important thing to note is that it's reversible using some method. And it's always 1:1 for a given key. Now, there are multiple input:key pairs that might generate the same output (in fact there usually are, depending on the encryption function). Good encrypted data is indistinguishable from random noise. This is different from a good hash output which is always of a consistent format.

Use Cases

Use a hash function when you want to compare a value but can't store the plain representation (for any number of reasons). Passwords should fit this use-case very well since you don't want to store them plain-text for security reasons (and shouldn't). But what if you wanted to check a filesystem for pirated music files? It would be impractical to store 3 mb per music file. So instead, take the hash of the file, and store that (md5 would store 16 bytes instead of 3mb). That way, you just hash each file and compare to the stored database of hashes (This doesn't work as well in practice because of re-encoding, changing file headers, etc, but it's an example use-case).

Use a hash function when you're checking validity of input data. That's what they are designed for. If you have 2 pieces of input, and want to check to see if they are the same, run both through a hash function. The probability of a collision is astronomically low for small input sizes (assuming a good hash function). That's why it's recommended for passwords. For passwords up to 32 characters, md5 has 4 times the output space. SHA1 has 6 times the output space (approximately). SHA512 has about 16 times the output space. You don't really care what the password was, you care if it's the same as the one that was stored. That's why you should use hashes for passwords.

Use encryption whenever you need to get the input data back out. Notice the word need. If you're storing credit card numbers, you need to get them back out at some point, but don't want to store them plain text. So instead, store the encrypted version and keep the key as safe as possible.

Hash functions are also great for signing data. For example, if you're using HMAC, you sign a piece of data by taking a hash of the data concatenated with a known but not transmitted value (a secret value). So you send the plain-text and the HMAC hash. Then, the receiver simply hashes the submitted data with the known value and checks to see if it matches the transmitted HMAC. If it's the same, you know it wasn't tampered with by a party without the secret value. This is commonly used in secure cookie systems by HTTP frameworks, as well as in message transmission of data over HTTP where you want some assurance of integrity in the data.

A note on hashes for passwords:

A key feature of cryptographic hash functions is that they should be very fast to create, and very difficult/slow to reverse (so much so that it's practically impossible). This poses a problem with passwords. If you store sha512(password), you're not doing a thing to guard against rainbow tables or brute force attacks. Remember, the hash function was designed for speed. So it's trivial for an attacker to just run a dictionary through the hash function and test each result.

Adding a salt helps matters since it adds a bit of unknown data to the hash. So instead of finding anything that matches md5(foo), they need to find something that when added to the known salt produces md5(foo.salt) (which is very much harder to do). But it still doesn't solve the speed problem since if they know the salt it's just a matter of running the dictionary through.

So, there are ways of dealing with this. One popular method is called key strengthening (or key stretching). Basically, you iterate over a hash many times (thousands usually). This does two things. First, it slows down the runtime of the hashing algorithm significantly. Second, if implemented right (passing the input and salt back in on each iteration) actually increases the entropy (available space) for the output, reducing the chances of collisions. A trivial implementation is:

var hash = password + salt;
for (var i = 0; i < 5000; i++) {
    hash = sha512(hash + password + salt);
}

There are other, more standard implementations such as PBKDF2, BCrypt. But this technique is used by quite a few security related systems (such as PGP, WPA, Apache and OpenSSL).

The bottom line, hash(password) is not good enough. hash(password + salt) is better, but still not good enough... Use a stretched hash mechanism to produce your password hashes...

Another note on trivial stretching

Do not under any circumstances feed the output of one hash directly back into the hash function:

hash = sha512(password + salt); 
for (i = 0; i < 1000; i++) {
    hash = sha512(hash); // <-- Do NOT do this!
}

The reason for this has to do with collisions. Remember that all hash functions have collisions because the possible output space (the number of possible outputs) is smaller than then input space. To see why, let's look at what happens. To preface this, let's make the assumption that there's a 0.001% chance of collision from sha1() (it's much lower in reality, but for demonstration purposes).

hash1 = sha1(password + salt);

Now, hash1 has a probability of collision of 0.001%. But when we do the next hash2 = sha1(hash1);, all collisions of hash1 automatically become collisions of hash2. So now, we have hash1's rate at 0.001%, and the 2nd sha1() call adds to that. So now, hash2 has a probability of collision of 0.002%. That's twice as many chances! Each iteration will add another 0.001% chance of collision to the result. So, with 1000 iterations, the chance of collision jumped from a trivial 0.001% to 1%. Now, the degradation is linear, and the real probabilities are far smaller, but the effect is the same (an estimation of the chance of a single collision with md5 is about 1/(2^128) or 1/3e38. While that seems small, thanks to the birthday attack it's not really as small as it seems).

Instead, by re-appending the salt and password each time, you're re-introducing data back into the hash function. So any collisions of any particular round are no longer collisions of the next round. So:

hash = sha512(password + salt);
for (i = 0; i < 1000; i++) {
    hash = sha512(hash + password + salt);
}

Has the same chance of collision as the native sha512 function. Which is what you want. Use that instead.

这篇关于散列和加密算法之间的基本差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆