将字符串转换为随机但确定性可重复的均匀概率 [英] Convert string to random but deterministically repeatable uniform probability

查看:103
本文介绍了将字符串转换为随机但确定性可重复的均匀概率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何转换字符串,例如用户ID加盐,看起来很随机,但实际上在半开放范围[0.0,1.0)中具有确定性可重复的均匀概率?此表示表示输出≥0.0且< 1.0.无论输入分布如何,输出分布都必须是均匀的.例如,如果输入字符串为"a3b2Foobar",则输出概率可以重复为0.40341504.

跨语言和跨平台算法的可重复性是可取的.我倾向于使用哈希函数,除非有更好的方法.这是我所拥有的:

>>> in_str = 'a3b2Foobar'
>>> (int(hashlib.sha256(in_str.encode()).hexdigest(), 16) % 1e8) / 1e8
0.40341504

我正在使用最新的稳定Python3.请注意,这个问题与解决方案

使用哈希

密码哈希可能是[0,MAX_HASH]范围内的均匀分布的整数.因此,通过将其除以MAX_HASH + 1,可以将其缩放为[0,1)范围内的浮点数.

import hashlib

Hash = hashlib.sha512
MAX_HASH_PLUS_ONE = 2**(Hash().digest_size * 8)

def str_to_probability(in_str):
    """Return a reproducible uniformly random float in the interval [0, 1) for the given string."""
    seed = in_str.encode()
    hash_digest = Hash(seed).digest()
    hash_int = int.from_bytes(hash_digest, 'big')  # Uses explicit byteorder for system-agnostic reproducibility
    return hash_int / MAX_HASH_PLUS_ONE  # Float division

>>> str_to_probability('a3b2Foobar')
0.3659629991207491

注意:

  • 内置 hash 不能使用该方法,因为它可以保留输入的 分布,例如使用hash(123).另外,重新启动Python时,它可以返回不同的值,例如与hash('123').
  • 因为浮点数就足够了,所以不必使用模.

使用随机

random 模块可以与in_str一起使用,同时解决了有关线程安全和连续性的问题.

使用这种方法,不仅要考虑跨语言的可重复性,而且还要考虑多个未来版本的Python的可重复性.因此不建议这样做.

import random

def str_to_probability(in_str):
    """Return a reproducible uniformly random float in the interval [0, 1) for the given seed."""
    return random.Random(in_str).random()

>>> str_to_probability('a3b2Foobar')
0.4662507245848473

How do I convert a string, e.g. a user ID plus salt, to a random looking but actually a deterministically repeatable uniform probability in the semi-open range [0.0, 1.0)? This means that the output is ≥ 0.0 and < 1.0. The output distribution must be uniform irrespective of the input distribution. For example, if the input string is 'a3b2Foobar', the output probability could repeatably be 0.40341504.

Cross-language and cross-platform algorithmic reproducibility is desirable. I'm inclined to use a hash function unless there is a better way. Here is what I have:

>>> in_str = 'a3b2Foobar'
>>> (int(hashlib.sha256(in_str.encode()).hexdigest(), 16) % 1e8) / 1e8
0.40341504

I'm using the latest stable Python 3. Please note that this question is similar but not exactly identical to the related question to convert an integer to a random but deterministically repeatable choice.

解决方案

Using hash

A cryptographic hash is assumably a uniformly distributed integer in the range [0, MAX_HASH]. Accordingly, it can be scaled to a floating-point number in the range [0, 1) by dividing it by MAX_HASH + 1.

import hashlib

Hash = hashlib.sha512
MAX_HASH_PLUS_ONE = 2**(Hash().digest_size * 8)

def str_to_probability(in_str):
    """Return a reproducible uniformly random float in the interval [0, 1) for the given string."""
    seed = in_str.encode()
    hash_digest = Hash(seed).digest()
    hash_int = int.from_bytes(hash_digest, 'big')  # Uses explicit byteorder for system-agnostic reproducibility
    return hash_int / MAX_HASH_PLUS_ONE  # Float division

>>> str_to_probability('a3b2Foobar')
0.3659629991207491

Notes:

  • The built-in hash method must not be used because it can preserve the input's distribution, e.g. with hash(123). Alternatively, it can return values that differ when Python is restarted, e.g. with hash('123').
  • Using modulo is not necessary as float division is sufficient.

Using random

The random module can be used with in_str as its seed, while addressing concerns surrounding both thread safety and continuity.

With this approach, not only is cross-language reproducibility a concern, but reproducibility across multiple future versions of Python could also be a concern. It is therefore not recommended.

import random

def str_to_probability(in_str):
    """Return a reproducible uniformly random float in the interval [0, 1) for the given seed."""
    return random.Random(in_str).random()

>>> str_to_probability('a3b2Foobar')
0.4662507245848473

这篇关于将字符串转换为随机但确定性可重复的均匀概率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆