适用于很大范围的高效随机生成器(在python中) [英] Efficient random generator for very large range (in python)

查看:57
本文介绍了适用于很大范围的高效随机生成器(在python中)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个生成器,该生成器返回在给定范围内的数字,这些数字通过了功能foo给出的特定测试.但是,我希望数字以随机顺序进行测试.以下代码将实现此目的:

I am trying to create a generator that returns numbers in a given range that pass a particular test given by a function foo. However I would like the numbers to be tested in a random order. The following code will achieve this:

from random import shuffle

def MyGenerator(foo, num):
    order = list(range(num))
    shuffle(order)
    for i in order:
        if foo(i):
            yield i

问题

此解决方案的问题是,有时范围会很大(num可能是10**8或更高).在内存中有如此大的列表时,此功能可能会变慢.我尝试使用以下代码来避免此问题:

The problem with this solution is that sometimes the range will be quite large (num might be of the order 10**8 and upwards). This function can become slow, having such a large list in memory. I have tried to avoid this problem, with the following code:

from random import randint    

def MyGenerator(foo, num):
    tried = set()
    while len(tried) <= num - 1:
        i = randint(0, num-1)
        if i in tried:
            continue
        tried.add(i)
        if foo(i):
            yield i

这在大多数情况下效果很好,因为在大多数情况下num会很大,foo将传递合理数量的数字,并且调用__next__方法的总次数将相对增加小(例如,最多200个通常小得多).因此,我们很可能偶然发现通过foo测试的值,并且tried的大小永远不会变大. (即使只通过了10%的时间,我们也不希望tried大约大于2000.)

This works well most of the time, since in most cases num will be quite large, foo will pass a reasonable number of numbers and the total number of times the __next__ method will be called will be relatively small (say, a maximum of 200 often much smaller). Therefore its reasonable likely we stumble upon a value that passes the foo test and the size of tried never gets large. (Even if it only passes 10% of the time, we wouldn't expect tried to get larger than about 2000 roughly.)

但是,当num很小(接近调用__next__方法的次数,或者foo在大多数情况下失败)时,上述解决方案效率很低-随机猜测数字直到猜测tried中没有的一个.

However, when num is small (close to the number of times that the __next__ method is called, or foo fails most of the time, the above solution becomes very inefficient - randomly guessing numbers until it guesses one that isn't in tried.

我尝试的解决方案...

我希望使用某种函数以大致随机的方式将数字0,1,2,..., n映射到自身. (这并不是用于任何安全目的,因此,如果它不是世界上最随机"的功能,也没关系).此处的函数(创建随机的双射函数具有相同的域和范围),将带符号的32位整数映射到自己,但是我不确定如何将映射调整到较小的范围.给定num,我什至不需要在0,1,..num上的双射,只要n的值大于并'close'到num(使用您认为合适的close定义).然后,我可以执行以下操作:

I was hoping to use some kind of function that maps the numbers 0,1,2,..., n onto themselves in a roughly random way. (This isn't being used for any security purposes and so doesn't matter if it isn't the most 'random' function in the world). The function here (Create a random bijective function which has same domain and range) maps signed 32-bit integers onto themselves, but I am not sure how to adapt the mapping to a smaller range. Given num I don't even need a bijection on 0,1,..num just a value of n larger than and 'close' to num (using whatever definition of close you see fit). Then I can do the following:

def mix_function_factory(num):
    # something here???
    def foo(index):
        # something else here??
    return foo

def MyGenerator(foo, num):
    mix_function = mix_function_factory(num):
    for i in range(num):
        index = mix_function(i)
        if index <= num:
            if foo(index):
                yield index

(只要该双射不在一组比num大得多的数字上,index <= num不是True的次数就会很小).

(so long as the bijection isn't on a set of numbers massively larger than num the number of times index <= num isn't True will be small).

我的问题

您能想到以下其中之一吗?

Can you think of one of the following:

  • mix_function_factory的潜在解决方案,或者mix_function的其他一些潜在功能,我可以尝试针对num的不同值进行概括?
  • 解决原始问题的更好方法?
  • A potential solution for mix_function_factory or even a few other potential functions for mix_function that I could attempt to generalise for different values of num?
  • A better way of solving the original problem?

非常感谢....

推荐答案

问题基本上是生成范围为0..n-1的整数的随机排列.

The problem is basically generating a random permutation of the integers in the range 0..n-1.

对我们来说幸运的是,这些数字具有非常有用的属性:它们都具有以n为模的不同值.如果我们可以对这些数字应用一些数学运算,同时注意使每个数字的模数保持不同,则很容易生成随机出现的排列.最好的部分是,我们不需要任何内存来跟踪已经生成的数字,因为每个数字都是用一个简单的公式计算的.

Luckily for us, these numbers have a very useful property: they all have a distinct value modulo n. If we can apply some mathemical operations to these numbers while taking care to keep each number distinct modulo n, it's easy to generate a permutation that appears random. And the best part is that we don't need any memory to keep track of numbers we've already generated, because each number is calculated with a simple formula.

我们可以对范围内的每个数字x执行的操作示例包括:

Examples of operations we can perform on every number x in the range include:

  • 加法:我们可以将任意整数c添加到x.
  • 乘法:我们可以将x乘以任何与n没有素数的数字m.
  • Addition: We can add any integer c to x.
  • Multiplication: We can multiply x with any number m that shares no prime factors with n.

仅将这两个操作应用于范围0..n-1已经给出了令人满意的结果:

Applying just these two operations on the range 0..n-1 already gives quite satisfactory results:

>>> n = 7
>>> c = 1
>>> m = 3
>>> [((x+c) * m) % n for x in range(n)]
[3, 6, 2, 5, 1, 4, 0]

看起来随机,不是吗?

如果我们从随机数生成cm,它实际上也是 be 随机的.但是请记住,不能保证此算法将生成所有可能的排列,也不保证每个排列具有相同的生成概率.

If we generate c and m from a random number, it'll actually be random, too. But keep in mind that there is no guarantee that this algorithm will generate all possible permutations, or that each permutation has the same probability of being generated.

关于实现的困难部分实际上只是生成合适的随机m.我使用了此答案中的素因数分解代码.

The difficult part about the implementation is really just generating a suitable random m. I used the prime factorization code from this answer to do so.

import random

# credit for prime factorization code goes
# to https://stackoverflow.com/a/17000452/1222951
def prime_factors(n):
    gaps = [1,2,2,4,2,4,2,4,6,2,6]
    length, cycle = 11, 3
    f, fs, next_ = 2, [], 0
    while f * f <= n:
        while n % f == 0:
            fs.append(f)
            n /= f
        f += gaps[next_]
        next_ += 1
        if next_ == length:
            next_ = cycle
    if n > 1: fs.append(n)
    return fs

def generate_c_and_m(n, seed=None):
    # we need to know n's prime factors to find a suitable multiplier m
    p_factors = set(prime_factors(n))

    def is_valid_multiplier(m):
        # m must not share any prime factors with n
        factors = prime_factors(m)
        return not p_factors.intersection(factors)

    # if no seed was given, generate random values for c and m
    if seed is None:
        c = random.randint(n)
        m = random.randint(1, 2*n)
    else:
        c = seed
        m = seed

    # make sure m is valid
    while not is_valid_multiplier(m):
        m += 1

    return c, m

现在我们可以为cm生成合适的值,创建排列很简单:

Now that we can generate suitable values for c and m, creating the permutation is trivial:

def random_range(n, seed=None):
    c, m = generate_c_and_m(n, seed)

    for x in range(n):
        yield ((x + c) * m) % n

您的生成器函数可以实现为

And your generator function can be implemented as

def MyGenerator(foo, num):
    for x in random_range(num):
        if foo(x):
            yield x

这篇关于适用于很大范围的高效随机生成器(在python中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆