将字符串转换为随机但确定性可重复的均匀概率 [英] Convert string to random but deterministically repeatable uniform probability
问题描述
如何转换字符串,例如用户ID加盐,看起来很随机,但实际上在半开放范围[0.0,1.0)中具有确定性可重复的均匀概率?此表示表示输出≥0.0且< 1.0.无论输入分布如何,输出分布都必须是均匀的.例如,如果输入字符串为"a3b2Foobar",则输出概率可以重复为0.40341504.
跨语言和跨平台算法的可重复性是可取的.我倾向于使用哈希函数,除非有更好的方法.这是我所拥有的:>>> in_str = 'a3b2Foobar'
>>> (int(hashlib.sha256(in_str.encode()).hexdigest(), 16) % 1e8) / 1e8
0.40341504
我正在使用最新的稳定Python3.请注意,这个问题与解决方案
使用哈希
密码哈希可能是[0,MAX_HASH]范围内的均匀分布的整数.因此,通过将其除以MAX_HASH + 1,可以将其缩放为[0,1)范围内的浮点数.
import hashlib
Hash = hashlib.sha512
MAX_HASH_PLUS_ONE = 2**(Hash().digest_size * 8)
def str_to_probability(in_str):
"""Return a reproducible uniformly random float in the interval [0, 1) for the given string."""
seed = in_str.encode()
hash_digest = Hash(seed).digest()
hash_int = int.from_bytes(hash_digest, 'big') # Uses explicit byteorder for system-agnostic reproducibility
return hash_int / MAX_HASH_PLUS_ONE # Float division
>>> str_to_probability('a3b2Foobar')
0.3659629991207491
注意:
- 内置
hash
不能使用该方法,因为它可以保留输入的 分布,例如使用hash(123)
.另外,重新启动Python时,它可以返回不同的值,例如与hash('123')
. - 因为浮点数就足够了,所以不必使用模.
使用随机
random
模块可以与in_str
一起使用,同时解决了有关线程安全和连续性的问题.
使用这种方法,不仅要考虑跨语言的可重复性,而且还要考虑多个未来版本的Python的可重复性.因此不建议这样做.
import random
def str_to_probability(in_str):
"""Return a reproducible uniformly random float in the interval [0, 1) for the given seed."""
return random.Random(in_str).random()
>>> str_to_probability('a3b2Foobar')
0.4662507245848473
How do I convert a string, e.g. a user ID plus salt, to a random looking but actually a deterministically repeatable uniform probability in the semi-open range [0.0, 1.0)? This means that the output is ≥ 0.0 and < 1.0. The output distribution must be uniform irrespective of the input distribution. For example, if the input string is 'a3b2Foobar', the output probability could repeatably be 0.40341504.
Cross-language and cross-platform algorithmic reproducibility is desirable. I'm inclined to use a hash function unless there is a better way. Here is what I have:
>>> in_str = 'a3b2Foobar'
>>> (int(hashlib.sha256(in_str.encode()).hexdigest(), 16) % 1e8) / 1e8
0.40341504
I'm using the latest stable Python 3. Please note that this question is similar but not exactly identical to the related question to convert an integer to a random but deterministically repeatable choice.
Using hash
A cryptographic hash is assumably a uniformly distributed integer in the range [0, MAX_HASH]. Accordingly, it can be scaled to a floating-point number in the range [0, 1) by dividing it by MAX_HASH + 1.
import hashlib
Hash = hashlib.sha512
MAX_HASH_PLUS_ONE = 2**(Hash().digest_size * 8)
def str_to_probability(in_str):
"""Return a reproducible uniformly random float in the interval [0, 1) for the given string."""
seed = in_str.encode()
hash_digest = Hash(seed).digest()
hash_int = int.from_bytes(hash_digest, 'big') # Uses explicit byteorder for system-agnostic reproducibility
return hash_int / MAX_HASH_PLUS_ONE # Float division
>>> str_to_probability('a3b2Foobar')
0.3659629991207491
Notes:
- The built-in
hash
method must not be used because it can preserve the input's distribution, e.g. withhash(123)
. Alternatively, it can return values that differ when Python is restarted, e.g. withhash('123')
. - Using modulo is not necessary as float division is sufficient.
Using random
The random
module can be used with in_str
as its seed, while addressing concerns surrounding both thread safety and continuity.
With this approach, not only is cross-language reproducibility a concern, but reproducibility across multiple future versions of Python could also be a concern. It is therefore not recommended.
import random
def str_to_probability(in_str):
"""Return a reproducible uniformly random float in the interval [0, 1) for the given seed."""
return random.Random(in_str).random()
>>> str_to_probability('a3b2Foobar')
0.4662507245848473
这篇关于将字符串转换为随机但确定性可重复的均匀概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!