生成唯一哈希的最安全方法? [英] Safest way to generate a unique hash?

查看:42
本文介绍了生成唯一哈希的最安全方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要生成可以在文件名中使用的唯一标识符,并且可以在给定相同的输入值的情况下进行复制.我需要生成数百万个这样的标识符,因为源输入有数百万种组合.

I need to produce unique identifiers that can be used in filenames and can be reproduced given the same input values. I need to produce millions of these identifiers as the source input has millions of combinations.

为简单起见,我将在示例中使用一个小的集合,但实际的集合可能会相当大(数百个,也许数千个);大于可以手动编码为文件名的大小.

For simplicity's sake, I will use a small set in the example, but the actual sets can be rather large (hundreds, maybe thousands, of items); larger than could be manually encoded into a filename.

我注意到 第 5 种生成 UUID 的方法 允许你提供一个字符串输入.

I noticed that the 5th method of generating UUID's allows you to provide a string input.

> input_set = {'apple', 'banana', 'orange'}
> uuid.uuid5(uuid.NAMESPACE_URL, pickle.dumps(input_set)).hex
'f39926529ad45997984643816c1bc403'

文档说它在底层使用了 SHA1.碰撞风险是否太高?有没有更好的方法来可靠地散列唯一标识符?

The documentation says it uses SHA1 under the hood. Is the risk of a collision too high? Is there a better way of reliably hashing unique identifiers?

推荐答案

从字符串中获得 SHA1 冲突的几率低得惊人.目前 SHA1 的已知冲突少于 63 个.

The odds that you'd get an SHA1 collision from strings is astoundingly low. Currently there are less than 63 known collisions for SHA1.

首次发现 SHA1 冲突

首次计算 SHA-1 哈希冲突.只需要五个聪明的大脑......和 ​​6,610 年的处理器时间

First ever' SHA-1 hash collision calculated. All it took were five clever brains... and 6,610 years of processor time

SHA1 在密码学世界中不再被认为是安全的,但在这里肯定超出了您的预期.

SHA1 is no longer considered secure in the cryptography world, but certainly exceeds your expectations here.

加密散列函数被设计为单向函数.这意味着函数逆是硬"的计算.(即,知道输出绝不会帮助您确定输入)正如 Blender 在评论中指出的那样,这与发生碰撞的可能性无关.

Cryptographic hashing functions are designed to be one way functions.This means the functions inverse is "hard" to calculate. (i.e. knowing the output in no way helps you determine the input) As Blender pointed out in the comments this has nothing to do with the chance of collisions.

查看生日悖论,了解一些基本信息计算碰撞.

Take a look at the Birthday Paradox for some basic information on how the probability of a collision is calculated.

这个问题解决了 SHA1 冲突的可能原因.此文章指出

This question addresses the likely hood of a SHA1 collision. This article states

如果发现冲突可以证明是多项式时间可从问题 P 中减少的,则加密散列函数具有可证明的安全性,以防止在多项式时间内无法解决.然后将该函数称为可证明安全的,或者只是可证明的.

A cryptographic hash function has provable security against collision attacks if finding collisions is provably polynomial-time reducible from problem P which is supposed to be unsolvable in polynomial time. The function is then called provably secure, or just provable.

这里是安全"的列表.哈希算法.

Here is a list of "secure" hash algorithms.

更新您在评论中指出您的输入远大于 SHA1 的 160 位限制.我建议您在这种情况下使用 SHA3,因为输入的大小没有限制.查看 Python 文档了解更多信息.

UPDATE You stated in the comments your input is much larger than the 160 bit limit for SHA1. I recommend you use SHA3 in this case as there is no limit on the size of your input. Check out the Python documentation for more information.

这是一个基本示例:

import sha3
k = sha3.keccak_512()
k.update(b"data")
k.hexdigest()
'1065aceeded3a5e4412e2187e919bffeadf815f5bd73d37fe00d384fe29f55f08462fdabe1007b993ce5b8119630e7db93101d9425d6e352e22ffe3dcb56b825'

这篇关于生成唯一哈希的最安全方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆