Python:保持最小距离的范围内的随机数字列表 [英] Python: Random list of numbers in a range keeping with a minimum distance

查看:47
本文介绍了Python:保持最小距离的范围内的随机数字列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们说这段代码 随机种子(42)random.sample(range(0,40), 4)输出:[7, 1, 17, 15]我应该在此代码中更改什么以生成随机数,其中列表中任意两个数字之间的最小距离至少为 10 或更多.类似于 [0, 10, 25, 39] 或 [0, 12, 23, 38 ].可能的重复项是 这个.谢谢.

解决方案

排序案例的单行解决方案

这是一个简单的单行,它以相同的可能性生成所有可能性:

[9*i + x for i, x in enumerate(sorted(random.sample(range(13), 4)))]

一些示例输出:

[2, 16, 26, 38][0, 10, 25, 35][2, 12, 25, 36][0, 13, 26, 39][1, 14, 24, 34][1, 11, 29, 39][0, 13, 26, 39][1, 12, 27, 38]

输出总是按排序顺序生成;如果这不是您想要的,您可以轻松地向结果添加随机播放(或参见下文了解一般解决方案).

说明:如果[a, b, c, d]是满足你要求的有序列表,则[a, b-9, c-18, d-27]range(13) 中长度为 4 的有序样本,反之亦然.因此,您需要做的就是从 range(13) 生成样本,对它们进行排序,然后重新添加 9 的必要倍数以获得至少 10 分开.

一般未排序的解决方案

这是一个不需要对随机样本进行排序的通用解决方案.相反,我们计算样本元素的秩并使用这些来计算必要的偏移量.

随机导入定义等级(样本):"""返回整数样本中每个元素的等级."""索引 = 排序(范围(len(样本)),键 = lambda i:样本 [i])返回排序(索引,键= lambda i:索引 [i])def sample_with_minimum_distance(n=40, k=4, d=10):"""来自 range(n) 的 k 个元素的样本,最小距离为 d."""样本 = random.sample(range(n-(k-1)*(d-1)), k)返回 [s + (d-1)*r for s, r in zip(sample,ranks(sample))]

以及一些示例输出:

<预><代码>>>>sample_with_minimum_distance()[17, 27, 3, 38]>>>sample_with_minimum_distance()[27, 38, 10, 0]>>>sample_with_minimum_distance()[36, 13, 1, 24]>>>sample_with_minimum_distance()[1, 25, 15, 39]>>>sample_with_minimum_distance()[26, 12, 1, 38]

便宜的把戏"解决方案

如果原始问题中的各种常数是固定的(population range(40),长度为 4 的样本,最小距离为 10),那么有一个明显的廉价技巧:只有 715 可能有不同的排序样本,因此只需预先创建一个包含所有样本的列表,然后每次需要生成样本时,使用 random.choice 从该预先创建的列表中选择一个.

对于这一代,我们可以采用效率极低但显然正确的蛮力解决方案:

<预><代码>>>>导入迭代工具>>>all_samples = [ # 低效的蛮力解决方案... 样品在 itertools.product(range(40), repeat=4)... if all(x - y >= 10 for x, y in zip(sample[1:], sample))...]>>>len(all_samples)715

这仍然足够快,在我的机器上只需要几秒钟.或者,我们可以使用与上述相同的双射来做一些更精细和直接的事情.

<预><代码>>>>all_samples = [... [9*i + s for i, s in enumerate(sample)]... 对于 itertools.combinations(range(13), 4) 中的示例...]>>>len(all_samples)715

无论哪种方式,我们只生成一次样本列表,然后每次需要时使用 random.choice 选择一个:

<预><代码>>>>随机选择(all_samples)(1, 11, 21, 38)>>>随机选择(all_samples)(0, 10, 23, 33)

当然,这个解决方案不能很好地扩展:对于 range(100) 之外的 7 个样本,最小距离为 5,有超过 20 亿个可能的不同排序样本.

均匀性演示

我之前声称,单行函数以相同的可能性产生所有可能性(当然,假设有一个完美的随机数源,但 Python 的 Mersenne Twister 足够好,我们不太可能检测到核心生成器产生的统计异常在下面的测试中).这是这种一致性的演示.

首先,为了方便起见,我们将单行代码包装在一个函数中.我们还将更改它以返回一个 tuple 而不是 list,因为下一步我们想要一些可散列的东西.

<预><代码>>>>def sorted_sample():... return tuple(9*i + x for i, x in... enumerate(sorted(random.sample(range(13), 4))))

现在我们生成 1000 万个样本(这将需要几分钟),并计算每个样本出现的频率:

<预><代码>>>>从集合导入计数器>>>样本 = Counter(sorted_sample() for _ in range(10**7))

几个快速检查:

<预><代码>>>>镜头(样品)715>>>10**7/71513986.013986013986>>>样本[0, 10, 20, 30]14329>>>样本[0, 11, 22, 33]13995>>>分钟(样本.值())13624>>>最大值(样本.值())14329

我们收集了 715 种不同的组合,一点点数学知识告诉我们这正是我们期望的数字(13 选择 4),因此在均匀分布的情况下,我们希望每个组合大约发生 10**7/715 次,或大约 14000 次.我们上面检查的两个组合都在 14000 左右,出现的最小和最大计数也是如此,但毫不奇怪,存在一些随机变化.

这种随机变化是否在可接受的范围内?为了找出答案,我们可以使用 p = 0.01 进行卡方检验.我们的零假设是我们从抽取的总体是统一的:即,我们的代码以相同的可能性生成每个可能的样本.

SciPy 进行 卡方检验 易于均匀:

<预><代码>>>>从 scipy.stats 导入卡方>>>chisquare(list(samples.values()))Power_divergenceResult(statistic=724.682234, pvalue=0.3825060783237031)

我们得到的 p 值小于 0.01,因此我们无法拒绝原假设:也就是说,我们没有非均匀性的证据.

Let's say this code random.seed(42) random.sample(range(0,40), 4) Output:[7, 1, 17, 15] What should I change in this code to generate random numbers where the minimum distance between any two numbers in the list will be be at least 10 or more. Something like [0, 10, 25, 39] or [0, 12, 23, 38 ]. Possible duplicate would be this. Thanks.

解决方案

One-line solution for the sorted case

Here's a simple one-liner, that generates all possibilities with equal likelihood:

[9*i + x for i, x in enumerate(sorted(random.sample(range(13), 4)))]

A few sample outputs:

[2, 16, 26, 38]
[0, 10, 25, 35]
[2, 12, 25, 36]
[0, 13, 26, 39]
[1, 14, 24, 34]
[1, 11, 29, 39]
[0, 13, 26, 39]
[1, 12, 27, 38]

Outputs are always generated in sorted order; if that's not what you want, you can easily add a shuffle to the result (or see below for a general solution).

Explanation: if [a, b, c, d] is an ordered list satisfying your requirements, then [a, b-9, c-18, d-27] is an ordered sample of length 4 from range(13), and vice versa. So all you need to do is generate samples from range(13), sort them, and then re-add the necessary multiples of 9 to get values that are at least 10 apart.

General unsorted solution

Here's a general solution that doesn't require sorting of the random sample. Instead, we compute the ranks of the elements of the sample and use those to compute the necessary offsets.

import random

def ranks(sample):
    """
    Return the ranks of each element in an integer sample.
    """
    indices = sorted(range(len(sample)), key=lambda i: sample[i])
    return sorted(indices, key=lambda i: indices[i])

def sample_with_minimum_distance(n=40, k=4, d=10):
    """
    Sample of k elements from range(n), with a minimum distance d.
    """
    sample = random.sample(range(n-(k-1)*(d-1)), k)
    return [s + (d-1)*r for s, r in zip(sample, ranks(sample))]

And some sample outputs:

>>> sample_with_minimum_distance()
[17, 27, 3, 38]
>>> sample_with_minimum_distance()
[27, 38, 10, 0]
>>> sample_with_minimum_distance()
[36, 13, 1, 24]
>>> sample_with_minimum_distance()
[1, 25, 15, 39]
>>> sample_with_minimum_distance()
[26, 12, 1, 38]

The "cheap trick" solution

If the various constants here in the original problem are fixed (population range(40), samples of length 4, minimum distance of 10), then there's an obvious cheap trick: there are only 715 possible different sorted samples, so just pre-create a list containing all of them, and then every time you need to generate a sample, pick one from that pre-created list using random.choice.

For the generation, we can either go with a grossly inefficient but clearly correct brute force solution:

>>> import itertools
>>> all_samples = [  # inefficient brute-force solution
...     sample for sample in itertools.product(range(40), repeat=4)
...     if all(x - y >= 10 for x, y in zip(sample[1:], sample))
... ]
>>> len(all_samples)
715

This is still plenty fast enough, taking only a couple of seconds on my machine. Alternatively, we can do something more refined and direct using the same bijection as identified above.

>>> all_samples = [
...     [9*i + s for i, s in enumerate(sample)]
...     for sample in itertools.combinations(range(13), 4)
... ]
>>> len(all_samples)
715

Either way, we generate the list of samples just once, and then use random.choice to pick one every time we need it:

>>> random.choice(all_samples)
(1, 11, 21, 38)
>>> random.choice(all_samples)
(0, 10, 23, 33)

Of course, this solution doesn't scale well: for 7 samples out of range(100) with a minimum distance of 5, there are over 2 billion possible different sorted samples.

Demonstration of uniformity

I claimed earlier that the one-liner produces all possibilities with equal likelihood (assuming a perfect source of random numbers, of course, but Python's Mersenne Twister is good enough that we're unlikely to detect statistical anomalies arising from the core generator in the test below). Here's a demonstration of that uniformity.

First, for convenience, we'll wrap our one-liner in a function. We'll also change it to return a tuple instead of a list, because for the next step we want something hashable.

>>> def sorted_sample():
...     return tuple(9*i + x for i, x in
...                  enumerate(sorted(random.sample(range(13), 4))))

Now we generate 10 million samples (this will take a couple of minutes), and count how often each one occurs:

>>> from collections import Counter
>>> samples = Counter(sorted_sample() for _ in range(10**7))

A couple of quick checks:

>>> len(samples)
715
>>> 10**7 / 715
13986.013986013986
>>> samples[0, 10, 20, 30]
14329
>>> samples[0, 11, 22, 33]
13995
>>> min(samples.values())
13624
>>> max(samples.values())
14329

We've collected 715 different combinations, and a little bit of maths tells us that that's exactly the number we expect (13 choose 4), so with a uniform distribution we'd expect each combination to occur approximately 10**7 / 715 times, or somewhere around 14000 times. Both the combinations we checked above are around 14000, as are the minimum and maximum counts appearing, but not surprisingly, there's some random variation.

Is that random variation within acceptable limits? To find out, we can do a chi-squared test with p = 0.01. Our null hypothesis is that the population we're drawing from is uniform: i.e., that our code generates each possible sample with equal likelihood.

SciPy makes a chi-squared test for uniformity easy:

>>> from scipy.stats import chisquare
>>> chisquare(list(samples.values()))
Power_divergenceResult(statistic=724.682234, pvalue=0.3825060783237031)

The p-value we get is not smaller than 0.01, so we fail to reject the null hypothesis: that is, we have no evidence of non-uniformity.

这篇关于Python:保持最小距离的范围内的随机数字列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆