以一定顺序迭代数组,以便公平地对其进行采样 [英] Iterate over an array in a certain order, so that it is sampled fairly

查看:117
本文介绍了以一定顺序迭代数组,以便公平地对其进行采样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想以某种方式遍历数组:
从数组的第一个和最后一个元素开始,我要访问的下一个元素是所有先前访问的元素中最远的一个.

I want to iterate over an array in a certain fashion:
Starting with the first and the last element of the array, the next element I want to visit is the one furthest from all previously visited elements.

对于长度为n + 1的数组,序列应为

For an array of length n+1, the sequence would be

  • 0,
  • n,
  • n/2(距离0和n最远),
  • n/4和n * 3/4(在所有前三个索引中最远),
  • n/8,n * 3/8,n * 5/8,n * 7/8(在所有之前的5个索引中最远)
  • n * 1/16,n * 3/16,n * 5/16,n * 7/16,n * 9/16,n * 11/16,n * 13/16,n * 15/16
  • ...

如果n不是2的幂,那么其中一些数字将不得不向上或向下取整,但是我不确定在舍入时如何避免重复.

if n is not a power of two, then some of these numbers will have to be rounded up or down, but I am not sure how to avoid duplicates when rounding.

最后,我想要一个整数序列,其中包含0到n之间的所有数字恰好一次. (对于任何n,不仅是2的幂)

At the end I want an integer sequence that contains all the numbers between 0 and n exactly once. (For any n, not just powers of two)

此排列是否有名称?

生成这些数字的函数将如何工作?

How would a function that generates these numbers work?

我正在寻找可以即时生成这些数字的函数.

I am looking for a function that can generate these numbers on-the-fly.

如果有十亿个元素,我不想管理所有以前访问过的元素的庞大列表,也不希望事先生成整个排列列表.

If there are a billion elements, I do not want to manage a giant list of all previously visited elements, or generate the whole permutation list in advance.

这样的想法是,一旦找到符合特定条件的元素,我就可以中止迭代,因此在大多数情况下,我不需要整个排列序列.

The idea is that I can abort the iteration once I have found an element that fits certain criteria, so I will in most cases not need the whole permutation sequence.

所以我正在寻找具有以下属性的函数f(int currentIndex, int maxIndex):

So I am looking for a function f(int currentIndex, int maxIndex) with the following properties:

要遍历大小为8的数组,我会呼叫

To interate over an array of size 8, i would call

f(0,8) returns 0, to get the index of the first element
f(1,8) returns 8
f(2,8) returns 4
f(3,8) returns 2
f(4,8) returns 6
f(5,8) returns 1
f(6,8) returns 3
f(7,8) returns 5
f(8,8) returns 7

(我不太确定如何将此示例扩展到不是2的幂的数字)

(I am not quite sure how to extend this example to numbers that are not a power of two)

具有这些属性的函数吗?

Is there a function with these properties?

推荐答案

您所描述的跳跃是Van der Corput序列的一个功能,如

The hopping about you describe is a feature of the Van der Corput sequence, as mentioned in a task I wrote on Rosetta Code.

我具有精确的功能来对输入序列进行重新排序,但是它需要与输入数组一样大的数组.

I have an exact function to re-order an input sequence, but it needs arrays as large as the input array.

接下来是一个近似的解决方案,它逐个产生索引,并且只取输入数组的长度,然后用恒定内存计算索引.

What follows is an approximate solution that yields indices one by one and only takes the length of the input array, then calculates the indices with constant memory.

测试表明了例程的良好"程度.

The testing gives some indication of how "good" the routine is.

>>> from fractions import Fraction
>>> from math import ceil
>>> 
>>> def vdc(n, base=2):
    vdc, denom = 0,1
    while n:
        denom *= base
        n, remainder = divmod(n, base)
        vdc += remainder / denom
    return vdc

>>> [vdc(i) for i in range(5)]
[0, 0.5, 0.25, 0.75, 0.125]
>>> def van_der_corput_index(sequence):
    lenseq = len(sequence)
    if lenseq:
        lenseq1 = lenseq - 1
        yield lenseq1   # last element
        for i in range(lenseq1):
            yield ceil(vdc(Fraction(i)) * lenseq1)


>>> seq = list(range(23))
>>> seq
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]
>>> list(van_der_corput_index(seq))
[22, 0, 11, 6, 17, 3, 14, 9, 20, 2, 13, 7, 18, 5, 16, 10, 21, 1, 12, 7, 18, 4, 15]
>>> len(set(van_der_corput_index(seq)))
21
>>> from collections import Counter
>>> 
>>> for listlen in (2, 3, 5, 7, 11, 13, 17, 19, 23,
        29, 31, 37, 41, 43, 47, 53, 59, 61,
        67, 71, 73, 79, 83, 89, 97, 1023,
        1024, 4095, 4096, 2**16 - 1, 2**16):
    out = list(van_der_corput_index( list(range(listlen) )))
    outcount = Counter(out)
    if outcount and outcount.most_common(1)[0][1] > 1:
        print("Duplicates in %i leaving %i unique nums." % (listlen, len(outcount)))
    outlen = len(out)
    if outlen != listlen:
        print("Length change in %i to %i" % (listlen, outlen))


Duplicates in 23 leaving 21 unique nums.
Duplicates in 43 leaving 37 unique nums.
Duplicates in 47 leaving 41 unique nums.
Duplicates in 53 leaving 49 unique nums.
Duplicates in 59 leaving 55 unique nums.
Duplicates in 71 leaving 67 unique nums.
Duplicates in 79 leaving 69 unique nums.
Duplicates in 83 leaving 71 unique nums.
Duplicates in 89 leaving 81 unique nums.
>>> outlen
65536
>>> listlen
65536
>>> 

这篇关于以一定顺序迭代数组,以便公平地对其进行采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆