将整数转换为随机但确定性可重复的选择 [英] Convert integer to a random but deterministically repeatable choice
问题描述
如何将无符号整数(表示用户ID)转换为随机外观,但实际上是确定性可重复的选择?必须以相等的概率选择该选择(与输入整数的分布无关).例如,如果我有3个选项,即[0, 1, 2]
,则用户ID 123可能总是被随机分配为选项2,而用户ID 234可能总是被分配为选项1.
>>> num_choices = 3
>>> id_num = 123
>>> int(hashlib.sha256(str(id_num).encode()).hexdigest(), 16) % num_choices
2
我正在使用最新的稳定Python3.请注意,这个问题与解决方案
使用哈希和模数
import hashlib
def id_to_choice(id_num, num_choices):
id_bytes = id_num.to_bytes((id_num.bit_length() + 7) // 8, 'big')
id_hash = hashlib.sha512(id_bytes)
id_hash_int = int.from_bytes(id_hash.digest(), 'big') # Uses explicit byteorder for system-agnostic reproducibility
choice = id_hash_int % num_choices # Use with small num_choices only
return choice
>>> id_to_choice(123, 3)
0
>>> id_to_choice(456, 3)
1
注意:
-
内置
hash
不能使用该方法,因为它可以保留输入的 分布,例如使用hash(123)
.另外,重新启动Python时,它可以返回不同的值,例如与hash('123')
. -
bytes(id_num)
可以将int转换为字节,但是它返回的是空字节数组,因此效率很低,因此不能使用.使用int.to_bytes
更好.使用str(id_num).encode()
可以但浪费一些字节. -
诚然,使用模并不能提供完全一致的概率, [1]
random
模块可以与id_num
作为其种子一起使用,同时解决了有关线程安全和连续性的问题.以这种方式使用randrange
与对种子进行散列并取模的结果相当并且更简单.使用这种方法,不仅要考虑跨语言的可重复性,而且还要考虑多个未来版本的Python的可重复性.因此不建议这样做.
import random def id_to_choice(id_num, num_choices): localrandom = random.Random(id_num) choice = localrandom.randrange(num_choices) return choice >>> id_to_choice(123, 3) 0 >>> id_to_choice(456, 3) 2
How do I convert an unsigned integer (representing a user ID) to a random looking but actually a deterministically repeatable choice? The choice must be selected with equal probability (irrespective of the distribution of the the input integers). For example, if I have 3 choices, i.e.
[0, 1, 2]
, the user ID 123 may always be randomly assigned choice 2, whereas the user ID 234 may always be assigned choice 1.Cross-language and cross-platform algorithmic reproducibility is desirable. I'm inclined to use a hash function and modulo unless there is a better way. Here is what I have:
>>> num_choices = 3 >>> id_num = 123 >>> int(hashlib.sha256(str(id_num).encode()).hexdigest(), 16) % num_choices 2
I'm using the latest stable Python 3. Please note that this question is similar but not exactly identical to the related question to convert a string to random but deterministically repeatable uniform probability.
解决方案Using hash and modulo
import hashlib def id_to_choice(id_num, num_choices): id_bytes = id_num.to_bytes((id_num.bit_length() + 7) // 8, 'big') id_hash = hashlib.sha512(id_bytes) id_hash_int = int.from_bytes(id_hash.digest(), 'big') # Uses explicit byteorder for system-agnostic reproducibility choice = id_hash_int % num_choices # Use with small num_choices only return choice >>> id_to_choice(123, 3) 0 >>> id_to_choice(456, 3) 1
Notes:
The built-in
hash
method must not be used because it can preserve the input's distribution, e.g. withhash(123)
. Alternatively, it can return values that differ when Python is restarted, e.g. withhash('123')
.For converting an int to bytes,
bytes(id_num)
works but is grossly inefficient as it returns an array of null bytes, and so it must not be used. Usingint.to_bytes
is better. Usingstr(id_num).encode()
works but wastes a few bytes.Admittedly, using modulo doesn't offer exactly uniform probability,[1][2] but this shouldn't bias much for this application because
id_hash_int
is expected to be very large andnum_choices
is assumed to be small.
Using random
The
random
module can be used withid_num
as its seed, while addressing concerns surrounding both thread safety and continuity. Usingrandrange
in this manner is comparable to and simpler than hashing the seed and taking modulo.With this approach, not only is cross-language reproducibility a concern, but reproducibility across multiple future versions of Python could also be a concern. It is therefore not recommended.
import random def id_to_choice(id_num, num_choices): localrandom = random.Random(id_num) choice = localrandom.randrange(num_choices) return choice >>> id_to_choice(123, 3) 0 >>> id_to_choice(456, 3) 2
这篇关于将整数转换为随机但确定性可重复的选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!