哈希和加密算法的基本差异 [英] Fundamental difference between Hashing and Encryption algorithms

查看:166
本文介绍了哈希和加密算法的基本差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


  1. 我看到了哈希和加密算法之间的很多混乱,我想听一些更专业的建议: >何时使用散列与加密


  2. 什么使哈希或加密算法不同(从理论/数学级)
    即使哈希不可逆转(没有彩虹树的帮助)


以下是一些类似的不要像我在寻找的细节:



混淆,散列和加密有什么区别?

加密和散列之间的区别

解决方案

您可以在维基百科 ...但是,由于你想要一个解释,我会尽力在这里:



哈希函数< h2>

它们提供任意长度输入和(通常)固定长度(或较小长度)输出之间的映射。它可以是从简单的crc32到完整的加密散列函数(如MD5或SHA1 / 2/256/512)。关键是有一个单向映射。它始终是一个很多的1映射(意味着总会有碰撞),因为每个函数产生的输出都比输入的要小一些(如果你把每个可能的1mb文件都输入到MD5,你会得到大量的碰撞) p>

他们很难(或不切实际)扭转的原因是因为他们在内部工作。大多数加密哈希函数多次迭代输入集合以产生输出。所以如果我们看每个固定长度的输入块(这是算法依赖的),则哈希函数将调用当前状态。然后它将遍历状态并将其更改为新的状态,并将其用作自身的反馈(MD5对于每个512bit数据块为64次)。然后以某种方式将所有这些迭代的结果状态组合在一起形成合成的哈希。



现在,如果要解码哈希,首先需要找出如何将给定的散列分解成其迭代状态(输入的可能性小于数据块的大小,许多用于较大的输入)。那么你需要反转每个州的迭代。现在,为了解释为什么这是非常困难的,想象试图从以下公式推导出 a b :code> 10 = a + b 。有 a b 可以工作的10个积极的组合。现在循环一遍: tmp = a + b; a = b b = tmp 。对于64次迭代,您可以尝试超过10 ^ 64种可能性。这只是一个简单的补充,其中一些状态从迭代保留到迭代。真正的哈希函数做了大量的操作(MD5在4个状态变量上执行大约15个操作)。并且由于下一次迭代取决于先前的状态,并且前一个在创建当前状态时被破坏,所以决定导致给定输出状态的输入状态(对于每个迭代不少)都是不可能的。结合这一点,在涉及大量可能性的情况下,甚至MD5的解码都将接近无限(但不是无限的)资源。如果你想了解输入的大小(对于较小的输入),甚至尝试解码哈希,那么这么多的资源实际上是显着地便宜的强制哈希。



加密功能



它们提供任意长度输入和输出之间的1:1映射。而且它们总是可逆的。需要注意的是使用某种方法是可逆的。对于给定的键,它总是1:1。现在,有多个输入:可能产生相同输出的密钥对(实际上通常有,取决于加密功能)。良好的加密数据与随机噪声无法区分。这是一个总是一致的格式的好的哈希输出。



使用案例



使用哈希函数当你想比较一个值,但不能存储简单的表示(出于任何原因)。密码应适合这种用例,因为您不希望以安全原因存储他们的纯文本(而不是)。但是如果你想检查一个文件系统的盗版音乐文件怎么办?每个音乐文件存储3 mb将是不切实际的。相反,拿取文件的哈希,并存储(md5将存储16个字节而不是3mb)。这样,你只是散列每个文件,并与存储的哈希数据库进行比较(这在实践中也不是因为重新编码,更改文件头等),而是一个用例)。



在检查输入数据的有效性时使用哈希函数。这就是他们设计的。如果您有2个输入,并且要检查它们是否相同,请通过哈希函数运行。碰撞的概率对于小的输入大小(假设一个很好的散列函数)在天文学上很低。这就是为什么建议使用密码。对于最多32个字符的密码,md5具有4倍的输出空间。 SHA1具有6倍的输出空间(约)。 SHA512具有约16倍的输出空间。您不太在乎密码,您关心的是否与存储密码相同。这就是为什么你应该使用哈希值作为密码。



只要你需要输入数据就可以使用加密。注意需要。如果您正在存储信用卡号码,则需要在某些时间将其退回,但不希望将其存储为纯文本。相反,存储加密的版本并保持密钥尽可能安全。



散列函数对于签名数据也是非常好的。例如,如果您正在使用HMAC,则通过以已知但未传输的值(秘密值)连接的数据的哈希来签名数据。所以,你发送明文和HMAC哈希。然后,接收者将已经提交的数据与已知值进行哈希,并检查它是否与传输的HMAC相匹配。如果是相同的,你知道没有秘密价值的一方没有篡改。这通常在HTTP框架的安全cookie系统中使用,以及通过HTTP的数据的消息传输,您希望在数据中保证完整性。



一个注释密码的哈希值:



加密散列函数的一个关键特性是它们应该非常快速地创建,并且非常难以/慢相反(这么多,这实际上是不可能的)。这造成密码问题。如果您存储 sha512(密码),那么您没有采取任何措施来防范彩虹表或暴力攻击。记住,哈希函数是为速度而设计的。所以攻击者只需要通过散列函数运行一个字典,并测试每一个结果,这是微不足道的。



添加一个盐有帮助,因为它添加了一些未知数据到哈希值。所以,而不是找到与 md5(foo)匹配的任何东西,他们需要找到当添加到已知的盐时产生 md5(foo.salt) (这很难做)。但是它仍然没有解决速度问题,因为如果他们知道盐,那只是运行字典的问题。



所以,有办法处理这个。一种流行的方法称为关键强化(或按键拉伸)。基本上,你可以多次迭代哈希(通常数千)。这有两件事情。首先,它显着减慢了散列算法的运行时间。第二,如果实施权利(在每次迭代中传递输入和返回值)实际上增加了输出的熵(可用空间),减少了碰撞的机会。一个简单的实现是:

  var hash = password + salt; (var i = 0; i <5000; i ++){
hash = sha512(hash + password + salt)

}

还有其他更标准的实现,如 PBKDF2 BCrypt 。但是这种技术被很多安全相关系统(如PGP,WPA,Apache和OpenSSL)所使用。



底线 hash (密码)不够好 hash(password + salt)更好,但还不够好...使用拉伸的哈希机制来产生密码哈希...



另一个简单的说明



不要在任何情况下将一个哈希的输出直接送回哈希函数

  hash = sha512(password + salt); (i = 0; i <1000; i ++)
{
哈希= sha512(哈希); //< - 不要这样做!
}

其原因与碰撞有关。请记住,所有哈希函数都有冲突,因为可能的输出空间(可能输出的数量)小于然后输入空间。看看为什么,让我们来看看会发生什么。为了说明这一点,让我们假设现实中有一个0.001%的碰撞机会,实际上它们的碰撞可能比 sha1()(它的很低目的)。

  hash1 = sha1(password + salt); 

现在, hash1 有碰撞概率为0.001%。但是当我们执行下一个 hash2 = sha1(hash1); hash1 的所有冲突自动成为冲突的 hash2 。所以现在我们的hash1的比率是0.001%,第二个 sha1()调用增加了。所以现在, hash2 的碰撞概率为0.002%。这是机会的两倍!每次迭代都会向结果添加另一个 0.001%碰撞机会。因此,在1000次迭代中,碰撞的机会从微不足道的0.001%跃升到1%。现在,退化是线性的,真实概率比较小,但效果是相同的(与 md5 约为1 /(2 128 )或1 /(3x10 38 )。虽然看起来很小,感谢生日攻击它不是真的像看起来那么小)。



相反,由每次重新添加盐和密码,您将数据重新导入哈希函数。所以任何一轮的任何碰撞都不再是下一轮的碰撞。所以:

  hash = sha512(password + salt); (i = 0; i <1000; i ++)
{
hash = sha512(hash + password + salt);
}

与原生的 sha512有相同的碰撞机会函数。这是你想要的


I see a lot of confusion between hashes and encryption algorithms and I would like to hear some more expert advice about:

  1. When to use hashes vs encryptions

  2. What makes a hash or encryption algorithm different (from a theoretical/mathematical level) i.e. what makes hashes irreversible (without aid of a rainbow tree)

Here are some similar SO Questions that didn't go into as much detail as I was looking for:

What is the difference between Obfuscation, Hashing, and Encryption?
Difference between encryption and hashing

解决方案

Well, you could look it up in Wikipedia... But since you want an explanation, I'll do my best here:

Hash Functions

They provide a mapping between an arbitrary length input, and a (usually) fixed length (or smaller length) output. It can be anything from a simple crc32, to a full blown cryptographic hash function such as MD5 or SHA1/2/256/512. The point is that there's a one-way mapping going on. It's always a many:1 mapping (meaning there will always be collisions) since every function produces a smaller output than it's capable of inputting (If you feed every possible 1mb file into MD5, you'll get a ton of collisions).

The reason they are hard (or impossible in practicality) to reverse is because of how they work internally. Most cryptographic hash functions iterate over the input set many times to produce the output. So if we look at each fixed length chunk of input (which is algorithm dependent), the hash function will call that the current state. It will then iterate over the state and change it to a new one and use that as feedback into itself (MD5 does this 64 times for each 512bit chunk of data). It then somehow combines the resultant states from all these iterations back together to form the resultant hash.

Now, if you wanted to decode the hash, you'd first need to figure out how to split the given hash into its iterated states (1 possibility for inputs smaller than the size of a chunk of data, many for larger inputs). Then you'd need to reverse the iteration for each state. Now, to explain why this is VERY hard, imagine trying to deduce a and b from the following formula: 10 = a + b. There are 10 positive combinations of a and b that can work. Now loop over that a bunch of times: tmp = a + b; a = b; b = tmp. For 64 iterations, you'd have over 10^64 possibilities to try. And that's just a simple addition where some state is preserved from iteration to iteration. Real hash functions do a lot more than 1 operation (MD5 does about 15 operations on 4 state variables). And since the next iteration depends on the state of the previous and the previous is destroyed in creating the current state, it's all but impossible to determine the input state that led to a given output state (for each iteration no less). Combine that, with the large number of possibilities involved, and decoding even an MD5 will take a near infinite (but not infinite) amount of resources. So many resources that it's actually significantly cheaper to brute-force the hash if you have an idea of the size of the input (for smaller inputs) than it is to even try to decode the hash.

Encryption Functions

They provide a 1:1 mapping between an arbitrary length input and output. And they are always reversible. The important thing to note is that it's reversible using some method. And it's always 1:1 for a given key. Now, there are multiple input:key pairs that might generate the same output (in fact there usually are, depending on the encryption function). Good encrypted data is indistinguishable from random noise. This is different from a good hash output which is always of a consistent format.

Use Cases

Use a hash function when you want to compare a value but can't store the plain representation (for any number of reasons). Passwords should fit this use-case very well since you don't want to store them plain-text for security reasons (and shouldn't). But what if you wanted to check a filesystem for pirated music files? It would be impractical to store 3 mb per music file. So instead, take the hash of the file, and store that (md5 would store 16 bytes instead of 3mb). That way, you just hash each file and compare to the stored database of hashes (This doesn't work as well in practice because of re-encoding, changing file headers, etc, but it's an example use-case).

Use a hash function when you're checking validity of input data. That's what they are designed for. If you have 2 pieces of input, and want to check to see if they are the same, run both through a hash function. The probability of a collision is astronomically low for small input sizes (assuming a good hash function). That's why it's recommended for passwords. For passwords up to 32 characters, md5 has 4 times the output space. SHA1 has 6 times the output space (approximately). SHA512 has about 16 times the output space. You don't really care what the password was, you care if it's the same as the one that was stored. That's why you should use hashes for passwords.

Use encryption whenever you need to get the input data back out. Notice the word need. If you're storing credit card numbers, you need to get them back out at some point, but don't want to store them plain text. So instead, store the encrypted version and keep the key as safe as possible.

Hash functions are also great for signing data. For example, if you're using HMAC, you sign a piece of data by taking a hash of the data concatenated with a known but not transmitted value (a secret value). So, you send the plain-text and the HMAC hash. Then, the receiver simply hashes the submitted data with the known value and checks to see if it matches the transmitted HMAC. If it's the same, you know it wasn't tampered with by a party without the secret value. This is commonly used in secure cookie systems by HTTP frameworks, as well as in message transmission of data over HTTP where you want some assurance of integrity in the data.

A note on hashes for passwords:

A key feature of cryptographic hash functions is that they should be very fast to create, and very difficult/slow to reverse (so much so that it's practically impossible). This poses a problem with passwords. If you store sha512(password), you're not doing a thing to guard against rainbow tables or brute force attacks. Remember, the hash function was designed for speed. So it's trivial for an attacker to just run a dictionary through the hash function and test each result.

Adding a salt helps matters since it adds a bit of unknown data to the hash. So instead of finding anything that matches md5(foo), they need to find something that when added to the known salt produces md5(foo.salt) (which is very much harder to do). But it still doesn't solve the speed problem since if they know the salt it's just a matter of running the dictionary through.

So, there are ways of dealing with this. One popular method is called key strengthening (or key stretching). Basically, you iterate over a hash many times (thousands usually). This does two things. First, it slows down the runtime of the hashing algorithm significantly. Second, if implemented right (passing the input and salt back in on each iteration) actually increases the entropy (available space) for the output, reducing the chances of collisions. A trivial implementation is:

var hash = password + salt;
for (var i = 0; i < 5000; i++) {
    hash = sha512(hash + password + salt);
}

There are other, more standard implementations such as PBKDF2, BCrypt. But this technique is used by quite a few security related systems (such as PGP, WPA, Apache and OpenSSL).

The bottom line, hash(password) is not good enough. hash(password + salt) is better, but still not good enough... Use a stretched hash mechanism to produce your password hashes...

Another note on trivial stretching

Do not under any circumstances feed the output of one hash directly back into the hash function:

hash = sha512(password + salt); 
for (i = 0; i < 1000; i++) {
    hash = sha512(hash); // <-- Do NOT do this!
}

The reason for this has to do with collisions. Remember that all hash functions have collisions because the possible output space (the number of possible outputs) is smaller than then input space. To see why, let's look at what happens. To preface this, let's make the assumption that there's a 0.001% chance of collision from sha1() (it's much lower in reality, but for demonstration purposes).

hash1 = sha1(password + salt);

Now, hash1 has a probability of collision of 0.001%. But when we do the next hash2 = sha1(hash1);, all collisions of hash1 automatically become collisions of hash2. So now, we have hash1's rate at 0.001%, and the 2nd sha1() call adds to that. So now, hash2 has a probability of collision of 0.002%. That's twice as many chances! Each iteration will add another 0.001% chance of collision to the result. So, with 1000 iterations, the chance of collision jumped from a trivial 0.001% to 1%. Now, the degradation is linear, and the real probabilities are far smaller, but the effect is the same (an estimation of the chance of a single collision with md5 is about 1/(2128) or 1/(3x1038). While that seems small, thanks to the birthday attack it's not really as small as it seems).

Instead, by re-appending the salt and password each time, you're re-introducing data back into the hash function. So any collisions of any particular round are no longer collisions of the next round. So:

hash = sha512(password + salt);
for (i = 0; i < 1000; i++) {
    hash = sha512(hash + password + salt);
}

Has the same chance of collision as the native sha512 function. Which is what you want. Use that instead.

这篇关于哈希和加密算法的基本差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆