为Django模型生成唯一的哈希 [英] Generate unique hashes for django models

查看:169
本文介绍了为Django模型生成唯一的哈希的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想为每个模型使用唯一的哈希,而不要使用id.

I want to use unique hashes for each model rather than ids.

我实现了以下功能,可以轻松地全面使用它.

I implemented the following function to use it across the board easily.

import random,hashlib
from base64 import urlsafe_b64encode

def set_unique_random_value(model_object,field_name='hash_uuid',length=5,use_sha=True,urlencode=False):
    while 1:
        uuid_number = str(random.random())[2:]
        uuid = hashlib.sha256(uuid_number).hexdigest() if use_sha else uuid_number
        uuid = uuid[:length]
        if urlencode:
            uuid = urlsafe_b64encode(uuid)[:-1]
        hash_id_dict = {field_name:uuid}
        try:
            model_object.__class__.objects.get(**hash_id_dict)
        except model_object.__class__.DoesNotExist:
            setattr(model_object,field_name,uuid)
            return

我正在寻求反馈,我还能怎么做?我该如何改善?有什么好不好和丑陋的?

I'm seeking feedback, how else could I do it? How can I improve it? What is good bad and ugly about it?

推荐答案

我不喜欢此位:

uuid = uuid[:5]

在最好的情况下(uuid是均匀分布的),在1k个元素之后,您将获得概率大于0.5的碰撞!

In the best scenario (uuid are uniformly distributed) you will get a collision with probability greater than 0.5 after 1k of elements!

这是因为生日问题.简而言之,证明了当元素数大于可能的标签数的平方根时,碰撞的可能性超过0.5.

It is because of the birthday problem. In a brief it is proven that the probability of collision exceeds 0.5 when number of elements is larger than square root from number of possible labels.

您有0xFFFFF = 10 ^ 6个标签(不同的数字),因此在生成1000个值之后,您将开始发生冲突.

You have 0xFFFFF=10^6 labels (different numbers) so after a 1000 of generated values you will start having collisions.

即使将长度增加到-1,您仍然在这里遇到问题:

Even if you enlarge length to -1 you have still problem here:

str(random.random())[2:]

3 * 10 ^ 6之后,您将开始发生碰撞(遵循相同的计算方法).

You will start having collisions after 3 * 10^6 (the same calculations follows).

我认为您最好的选择是使用更有可能具有唯一性的uuid,这是一个示例

I think your best bet is to use uuid that is more likely to be unique, here is an example

>>> import uuid
>>> uuid.uuid1().hex
'7e0e52d0386411df81ce001b631bdd31'

更新 如果您不信任数学,请运行以下示例以查看冲突:

Update If you do not trust math just run the following sample to see the collision:

 >>> len(set(hashlib.sha256(str(i)).hexdigest()[:5] for i in range(0,2000)))
 1999 # it should obviously print 2000 if there wasn't any collision

这篇关于为Django模型生成唯一的哈希的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆