产生不连续的组合 [英] Generating non-consecutive combinations

查看:151
本文介绍了产生不连续的组合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个发电机(迭代器支持做下一个,也许是用收益率​​在python),这也是 - [R元素的所有组合从{1,2,...,N}(N和r参数)等,在选择的R元素,任何两个是连续的。

I am trying to create a generator (iterator which supports doing a next, perhaps using yield in python) which gives all combinations of r elements from {1,2,...n} (n and r are parameters) such that in the selected r elements, no two are consecutive.

例如,当r = 2和n = 4

For example, for r = 2 and n= 4

生成的组合 {1,3},{1,4},{2,4}

我可以生成所有组合(如迭代器)和过滤那些不符合标准,但我们会做不必要的工作。

I could generate all combinations(as an iterator) and filter those which don't satisfy the criteria, but we will be doing unnecessary work.

有一些生成算法,使得接下来是O(1)(如果那是不可能的,O(R)或为O(n))。

Is there some generation algorithm such that the next is O(1) (and if that is not possible, O(r) or O(n)).

,其中集返回的顺序是不相关的(希望将允许一个O(1)算法)。

The order in which the sets are returned is not relevant (and hopefully will allow an O(1) algorithm).

注:我已标记的是蟒蛇,但语言无关的算法,将有助于太

Note: I have tagged it python, but a language-agnostic algorithm will help too.

更新:

我已经找到一种方法将其映射到生成纯组合!在网络上搜索发现,O(1)是可能的组合(虽然它看起来很复杂)。

I have found a way to map it to generating pure combinations! A web search reveals that O(1) is possible for combinations (though it seems complicated).

下面是映射。

假设我们有一个组合X_1,X_2,...,x_r与X_1 + 1< X_2,X_2 + 1< x_3,...

Suppose we have a combination x_1, x_2, ... , x_r with x_1 + 1 < x_2, x_2 + 1 < x_3, ...

我们映射到Y_1,y_2,...,y_r如下:

We map to y_1, y_2, ..., y_r as follows

Y_1 = X_1

y_1 = x_1

y_2 = X_2 - 1

y_2 = x_2 - 1

y_3 = x_3 - 2

y_3 = x_3 - 2

...

y_r = x_r - (R-1)

y_r = x_r - (r-1)

这样,我们有Y_1&LT; y_2&LT; y_3 ...没有非连续约束!

This way we have that y_1 < y_2 < y_3 ... without the non-consecutive constraint!

这基本上相当于选择 - [R元素出N-R + 1。因此,所有我需要做的是运行生成的(N-R + 1选择R)。

This basically amounts to choosing r elements out of n-r+1. Thus all I need to do is run the generation for (n-r+1 choose r).

对于我们来说,使用映射生成的东西后,就足够了。

For our purposes, using the mapping after things are generated is good enough.

原因选择svkcr的答案

所有伟大的答案,但我选择了svkcr的答案。

All great answers, but I have chosen svkcr's answer.

下面是一些原因

1),它是有效的无状态(或马尔可夫更precise)。接下来置换可以从previous 1生成。这是在某种程度上几乎是最理想的:O(R)空间和时间

1) It is effectively stateless (or "Markovian" to be more precise). The next permutation can be generated from the previous one. It is in a way almost optimal: O(r) space and time.

2)是predictable。我们确切地知道其中产生的组合顺序(字典)。

2) It is predictable. We know exactly the order(lexicographic) in which the combinations are generated.

这两个属性可以很容易地并行处理的产生(拆分为predictable点和委托),其中投入了(可以从如果CPU /机器发生故障,最后生成的组合摘下)容错!

These two properties make it easy to parallelise the generation (split at predictable points and delegate), with fault tolerance thrown in (can pick off from the last generated combination if a CPU/machine fails)!

对不起,并行化并没有前面提到的,因为它没有发生,我当我写的问题,我有这个想法只是后来。

Sorry, parallelisation was not mentioned earlier, because it didn't occur to me when I wrote the question and I got that idea only later.

推荐答案

这是有趣!这个怎么样:

This is fun! How about this:

def nonconsecutive_combinations(r, n):
  # first combination, startng at 1, step-size 2
  combination = range(1, r*2, 2)
  # as long as all items are less than or equal to n
  while combination[r-1] <= n:
    yield tuple(combination)
    p = r-1 # pointer to the element we will increment
    a = 1   # number that will be added to the last element
    # find the rightmost element we can increment without
    # making the last element bigger than n
    while p > 0 and combination[p] + a > n:
      p -= 1
      a += 2
    # increment the item and
    # fill the tail of the list with increments of two
    combination[p:] = range(combination[p]+1, combination[p] + 2*(r-p), 2)

每个下一个()调用应该有一个O(R).. 我有这个想法,同时思考如何将其转化为自然数,但花了相当长的一段时间,以获得详细信息的权利。

Each next() call should have an O(r) .. I got the idea while thinking about how this would translate to natural numbers, but it took quite some time to get the details right.

> list(nonconsecutive_combinations(2, 4))
[(1, 3), (1, 4), (2, 4)]
> list(nonconsecutive_combinations(3, 6))
[(1, 3, 5), (1, 3, 6), (1, 4, 6), (2, 4, 6)]

让我试着解释一下这是如何工作的。

条件的元组 C 研究元素是结果集的一部分:

Let me try to explain how this works.

Conditions for a tuple c with r elements to be part of the result set:

  1. 元组的任何元件至少一样大preceding元件加2。 ( C [X]&GT; = C [X-1] + 2
  2. 所有元素都小于或等于 N 。 1.因为它足以说的的最后一个元素的是小于 或等于 N 。 ( C [R-1]; = N
  1. Any element of the tuple is at least as large as the preceding element plus 2. (c[x] >= c[x-1] + 2)
  2. All elements are less than, or equal to n. Because of 1. it is sufficient to say that the last element is less than or equal to n. (c[r-1] <= n)

最小的元组的可以的是结果集的一部分,是(1,3,5,...,2 * R-1)。 当我说一个元组是小比另一个,我假设的字典顺序。

The smallest tuple that may be part of the result set is (1, 3, 5, ..., 2*r-1). When I say a tuple is "smaller" than another, I am assuming the lexicographical order.

由于Blckknght时指出,即使是最小的元组可能是大到满足条件2。

As Blckknght is pointing out, even the smallest possible tuple may be to large to satisfy condition 2.

上面的函数包含两个while循环:

The function above contains two while loops:

  • 通过结果外环步骤,并假定它们出现在字典顺序和满足的条件之一。一旦有问题的元组违反条件二,我们知道,我们已经用尽了结果集,并做了:

  • The outer loop steps through the results and assumes they appear in lexicographical order and satisfy condition one. As soon as the tuple in question violates condition two, we know that we have exhausted the result set and are done:

combination = range(1, r*2, 2)
while combination[r-1] <= n:

第一行初始化的结果,元组根据情况与第一个可能的结果。两线正视转化为状态的2个。

The first line initializes the result-tuple with the first possible result according to condition one. Line two squarely translates to condition two.

内环寻找下一个可能的元组满足条件之一。

The inner loop finds the next possible tuple satisfying condition one.

yield tuple(combination)

由于,而的条件(2)是真实的,我们确定的结果满足条件的一个我们可以得到当前结果元组。

Since the while condition (2) is true and we made sure the result satisfies condition one we can yield the current result-tuple.

接着,为了找出字典顺序下一个元组,我们将增加1的最后一个元素

Next, to find the lexicographically next tuple, we would add "1" to the last element.

# we don't actually do this:
combination[r-1] += 1

不过,这可能太早了突破条件2。因此,如果的该操作会破坏条件2,我们增加preceding元素,并相应地调整最后一个元素。这是一个有点像计数整数基地10:如果最后一个数字是大于9,增加了previous数字,使最后一个数字0。而不是加零但是,我们填补了元组,这样的条件1为真。

However, that may break condition 2 too early. So, if that operation would break condition 2, we increment preceding element and adjust the last element accordingly. This is a little like counting integers base 10: "If the last digit is larger than 9, increment the previous digit and make the last digit a 0." But instead of adding zeros, we fill the tuple so that condition 1 is true.

# if above does not work
combination[r-2] += 1
combination[r-1]  = combination[r-2] + 2

但问题是,第二行还可能再次突破条件两种。所以我们实际上做的是,我们跟踪的最后一个元素,这就是与做了。此外,我们使用 P 变量来引用我们正在寻找在指数当前元素。

Problem is, the second line may break condition two yet again. So what we actually do is, we keep track of the last element and that is what is done with the a. Also we use the p variable to refer to the index current element we are looking at.

p = r-1
a = 1
while p > 0 and combination[p] + a > n:
  p -= 1
  a += 2

我们正在迭代从右到左(p值= R-1,对 - = 1)通过的结果元组的物品。 首先,我们要添加一个到最后一个项目(A = 1),但通过元组步进的时候,我们其实是想用preceding项的值加 2 * X <替换最后一个项目/ code>,其中 X 是这两个项目之间的距离。 (A + = 2,组合[P] +α)

We are iterating right-to-left (p = r-1, p -= 1) through the items of the result tuple. Initially we want to add one to the last item (a = 1) but when stepping through the tuple we actually want to replace the last item with the value of a preceding item plus 2*x, where x is the distance between the two items. (a += 2, combination[p] + a)

最后,我们发现我们想要增加,并填写序列开始在递增项目的元组的其余部分,用2步长的项目:

Finally, we have found the item we want to increment, and fill the rest of the tuple with a sequence starting at the incremented item, with a step size of 2:

combination[p:] = range(combination[p]+1, combination[p] + 2*(r-p), 2)

就是这样。它看起来是那么容易的,当我第一次想到这一点,但是整个函数的所有算法进行了关闭接一个错误,一个伟大的地方,说明这是难度比它应该是。我应该知道我有困难的时候我还说,内环:)

And that's it. It seemed so easy when I first thought of it, but all the arithmetic throughout the function make a great place for off-by-one errors and describing it is harder than it should be. I should have known I'm in trouble when I added that inner loop :)

不幸的是while循环填充算法是不是最有效的事情在Python写的。其他解决方案接受了这一现实,并使用清单COM prehensions或过滤的繁重向下推入Python运行。这在我看来是的的正确的事情的。

Unfortunately while loops filled with arithmetic are not the most efficient thing to write in Python. The other solutions accept that reality and use list comprehensions or filtering to push the heavy lifting down into the Python runtime. This seems to me to be the right thing to do.

在另一方面,我很肯定,我的解决方案会比大多数执行好很多,如果这是C.内部while循环是为O(log r)和它发生变异的结果就地和相同的Ø (登录R)。它不消耗额外的堆栈帧和不消耗之外的结果,两个变量的任何存储器。但显然这不是C,所以没有这真的很重要。

On the other hand, I'm quite certain that my solution would perform a lot better than most if this were C. The inner while loop is O(log r) and it mutates the result in-place and the same O(log r). It does not consume additional stack frames and does not consume any memory besides the result and two variables. But obviously this is not C, so none of this really matters.

这篇关于产生不连续的组合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆