将整数转换为随机但确定性可重复的选择 [英] Convert integer to a random but deterministically repeatable choice

查看:119
本文介绍了将整数转换为随机但确定性可重复的选择的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将无符号整数(表示用户ID)转换为随机外观,但实际上是确定性可重复的选择?必须以相等的概率选择该选择(与输入整数的分布无关).例如,如果我有3个选项,即[0, 1, 2],则用户ID 123可能总是被随机分配为选项2,而用户ID 234可能总是被分配为选项1.

跨语言和跨平台算法的可重复性是可取的.我倾向于使用哈希函数和模,除非有更好的方法.这是我所拥有的:

>>> num_choices = 3
>>> id_num = 123
>>> int(hashlib.sha256(str(id_num).encode()).hexdigest(), 16) % num_choices
2

我正在使用最新的稳定Python3.请注意,这个问题与解决方案

使用哈希和模数

import hashlib

def id_to_choice(id_num, num_choices):
    id_bytes = id_num.to_bytes((id_num.bit_length() + 7) // 8, 'big')
    id_hash = hashlib.sha512(id_bytes)
    id_hash_int = int.from_bytes(id_hash.digest(), 'big')  # Uses explicit byteorder for system-agnostic reproducibility
    choice = id_hash_int % num_choices  # Use with small num_choices only
    return choice

>>> id_to_choice(123, 3)
0
>>> id_to_choice(456, 3)
1

注意:

  • 内置 hash 不能使用该方法,因为它可以保留输入的 分布,例如使用hash(123).另外,重新启动Python时,它可以返回不同的值,例如与hash('123').

  • bytes(id_num)可以将int转换为字节,但是它返回的是空字节数组,因此效率很低,因此不能使用.使用 int.to_bytes 更好.使用str(id_num).encode()可以但浪费一些字节.

  • 诚然,使用模并不能提供完全一致的概率, [1] random 模块可以与id_num作为其种子一起使用,同时解决了有关线程安全和连续性的问题.以这种方式使用randrange与对种子进行散列并取模的结果相当并且更简单.

    使用这种方法,不仅要考虑跨语言的可重复性,而且还要考虑多个未来版本的Python的可重复性.因此不建议这样做.

    import random
    
    def id_to_choice(id_num, num_choices):
        localrandom = random.Random(id_num)
        choice = localrandom.randrange(num_choices)
        return choice
    
    >>> id_to_choice(123, 3)
    0
    >>> id_to_choice(456, 3)
    2
    

    How do I convert an unsigned integer (representing a user ID) to a random looking but actually a deterministically repeatable choice? The choice must be selected with equal probability (irrespective of the distribution of the the input integers). For example, if I have 3 choices, i.e. [0, 1, 2], the user ID 123 may always be randomly assigned choice 2, whereas the user ID 234 may always be assigned choice 1.

    Cross-language and cross-platform algorithmic reproducibility is desirable. I'm inclined to use a hash function and modulo unless there is a better way. Here is what I have:

    >>> num_choices = 3
    >>> id_num = 123
    >>> int(hashlib.sha256(str(id_num).encode()).hexdigest(), 16) % num_choices
    2
    

    I'm using the latest stable Python 3. Please note that this question is similar but not exactly identical to the related question to convert a string to random but deterministically repeatable uniform probability.

    解决方案

    Using hash and modulo

    import hashlib
    
    def id_to_choice(id_num, num_choices):
        id_bytes = id_num.to_bytes((id_num.bit_length() + 7) // 8, 'big')
        id_hash = hashlib.sha512(id_bytes)
        id_hash_int = int.from_bytes(id_hash.digest(), 'big')  # Uses explicit byteorder for system-agnostic reproducibility
        choice = id_hash_int % num_choices  # Use with small num_choices only
        return choice
    
    >>> id_to_choice(123, 3)
    0
    >>> id_to_choice(456, 3)
    1
    

    Notes:

    • The built-in hash method must not be used because it can preserve the input's distribution, e.g. with hash(123). Alternatively, it can return values that differ when Python is restarted, e.g. with hash('123').

    • For converting an int to bytes, bytes(id_num) works but is grossly inefficient as it returns an array of null bytes, and so it must not be used. Using int.to_bytes is better. Using str(id_num).encode() works but wastes a few bytes.

    • Admittedly, using modulo doesn't offer exactly uniform probability,[1][2] but this shouldn't bias much for this application because id_hash_int is expected to be very large and num_choices is assumed to be small.

    Using random

    The random module can be used with id_num as its seed, while addressing concerns surrounding both thread safety and continuity. Using randrange in this manner is comparable to and simpler than hashing the seed and taking modulo.

    With this approach, not only is cross-language reproducibility a concern, but reproducibility across multiple future versions of Python could also be a concern. It is therefore not recommended.

    import random
    
    def id_to_choice(id_num, num_choices):
        localrandom = random.Random(id_num)
        choice = localrandom.randrange(num_choices)
        return choice
    
    >>> id_to_choice(123, 3)
    0
    >>> id_to_choice(456, 3)
    2
    

    这篇关于将整数转换为随机但确定性可重复的选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆