如何使用不同于该表格的整数替换Django的主键 [英] How to replace Django's primary key with a different integer that is unique for that table

查看:163
本文介绍了如何使用不同于该表格的整数替换Django的主键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Django Web应用程序使用默认的自动递增的正整数作为主键。该密钥在整个应用程序中使用,并经常插入到URL中。我不想向公众公开这个号码,以便他们可以猜测我的数据库中的用户数或其他实体的数量。



这是一个常见的要求,我已经看到有类似我的答案的问题。大多数解决方案建议对原始主键值进行哈希处理。但是,这些答案都不符合我的需要。这些是我的要求:


  1. 我希望将主键字段类型保留为整数。

  2. 我也不希望在每次读取或写入数据库或与数据库进行比较时都必须对此值进行散列/取消。这似乎浪费了,这样做只是一次:当记录最初插入数据库

  3. 哈希/加密功能不需要是可逆的,因为我不需要恢复原始顺序键。散列值只需要独一无二。

  4. 散列值仅对该表是唯一的 - 不是普遍唯一的。

  5. 散列价值应该尽可能的短。我想避免非常长的(20多个字符)的URL

实现此目的最好的方法是什么?以下工作?

  def hash_function(int):
return fancy-hash-function#使用??


def obfuscate_pk(发件人,实例,创建,** kwargs):
如果创建:
logger.info(MyClass#%s,created with created =%s:%s%(instance.pk,created,instance))
instance.pk = hash_function(instance.pk)
instance.save()
logger.info( \tNew Pk =%s%instance.pk)

class MyClass(models.Model):
blahblah = models.CharField(max_length = 50,null = False,blank = False ,)


post_save.connect(obfuscate_pk,sender = MyClass)


解决方案

想法



我建议您使用与 Instragam 。他们的要求似乎密切关注你。


生成的ID应该按时间排序(所以照片ID列表为
例如,可以排序,而无需获取有关
照片的更多信息)ID应该理想地为64位(对于较小的索引,更好的
存储在像Redis这样的系统中)系统应该引入少量新的
'运动部件'尽可能的大部分是我们能够以非常少的工程师获得
规模的Instagram,通过选择我们信任的简单的
易于理解的解决方案。他们提出了一个基于时间戳的41位系统,数据库分片为13,自动增量部分为10的系统。 Sincce你似乎没有使用碎片。您可以只有41位的时间基于copmonent和23位随机选择。如果您同时插入记录,那么这样做的确不会造成830万的冲突机会。但实际上你绝对不会打这个。对于一些代码如何:



生成ID



  START_TIME =一个表示unix时间戳的常数

def make_id():
''
灵感来源于http://instagram-engineering.tumblr.com/post/10853187575/sharding -id-at-instagram
'''

t = int(time.time()* 1000) - START_TIME
u = random.SystemRandom()。getrandbits(23)
id =(t <<23)| u

return id


def reverse_id(id):
t = id>> 23
返回t + START_TIME

注意, START_TIME 在上面的代码是一些任意的开始时间。您可以使用time.time()* 1000,获取该值并将其设置为 START_TIME



请注意我发布的 reverse_id 方法可以让您了解创建记录的时间。如果您需要跟踪该信息,您可以这样做,而无需为此添加另一个字段!所以你的主要关键在于保存你的存储空间,而不是增加存储空间!



模型



你的模型看起来就像。

  class MyClass(models.Model):
id = models.BigIntegerField(default = fields.make_id,primary_key = True)

如果您在django之外更改数据库,您需要创建相当于 make_id 作为sql函数



作为脚注。这有点像Mongodb使用的方法来生成它的每一个的 _ID 对象。


I have a Django web application that uses the default auto-incremented positive integers as the primary key. This key is used throughout the application and is frequently inserted into the URL. I don't want to expose this number to the public so that they can guess the number of users or other entities in my Database.

This is a frequent requirement and I have seen questions to similar mine with answers. Most solutions recommend hashing the original primary key value. However, none of those answers fit my need exactly. These are my requirements:

  1. I would like to keep the Primary Key field type as Integer.
  2. I also would prefer not to have to hash/unhash this value every time it is read or written or compared to the database. That seems wastefuly It would be nice to do it just once: When the record is initially inserted into the Database
  3. The hashing/encryption function need not be reversible since I don't need to recover the original sequential key. The hashed value just needs to be unique.
  4. The hashed value needs to be unique ONLY for that table -- not universally unique.
  5. The hashed value should be as short as possible. I would like to avoid extremely long (20+ characters) URLs

What is the best way to do achieve this? Would the following work?

def hash_function(int):
    return fancy-hash-function # What function should I use??


def obfuscate_pk(sender, instance, created, **kwargs):
    if created:
        logger.info("MyClass #%s, created with created=%s: %s" % (instance.pk, created, instance))
        instance.pk = hash_function(instance.pk)
        instance.save()
        logger.info("\tNew Pk=%s" % instance.pk)

class MyClass(models.Model):
    blahblah = models.CharField(max_length=50, null=False, blank=False,)


post_save.connect(obfuscate_pk, sender=MyClass)

解决方案

The Idea

I would recommend to you the same approach that is used by Instragam. Their requirements seems to closely follow yours.

Generated IDs should be sortable by time (so a list of photo IDs, for example, could be sorted without fetching more information about the photos) IDs should ideally be 64 bits (for smaller indexes, and better storage in systems like Redis) The system should introduce as few new ‘moving parts’ as possible—a large part of how we’ve been able to scale Instagram with very few engineers is by choosing simple, easy-to-understand solutions that we trust.

They came up with a system that has 41 bits based on the timestamp, 13 o the database shard and 10 for an auto increment portion. Sincce you don't appear to be using shards. You can just have 41 bits for a time based copmonent and 23 bits chosen at random. That does produce an extremely unlikely 1 in 8.3 million chance of getting a conflict if you insert records at the same time. But in practice you are never likely to hit this. Right so how about some code:

Generating IDs

START_TIME = a constant that represents a unix timestamp

def make_id():
    '''
    inspired by http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
        '''

    t = int(time.time()*1000) - START_TIME
    u = random.SystemRandom().getrandbits(23)
    id = (t << 23 ) | u

    return id


def reverse_id(id):
    t  = id >> 23
    return t + START_TIME 

Note, START_TIME in the above code is some arbitary starting time. You can use time.time()*1000 , get the value and set that as START_TIME

Notice that the reverse_id method I have posted allows you to find out at which time the record was created. If you need to keep track of that information you can do so without having to add another field for it! So your primary key is actually saving your storage rather than increasing it!

The Model

Now this is what your model would look like.

class MyClass(models.Model):
   id = models.BigIntegerField(default = fields.make_id, primary_key=True)  

If you make changes to your database outside django you would need to create the equivalent of make_id as an sql function

As a foot note. This is somewhat like the approach used by Mongodb to generate it's _ID for each object.

这篇关于如何使用不同于该表格的整数替换Django的主键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆