在不构建和排序整个列表的情况下(即生成器)按产品顺序获取列表的每个可能子集的算法 [英] Algorithm to get every possible subset of a list, in order of their product, without building and sorting the entire list (i.e Generators)
问题描述
实际上,我有一组具有概率的对象,我想查看它们的每个可能的组,以使它们全部为真,并假设它们是真的是独立的-即按子集元素乘积的降序排列-如果概率相同,则按长度顺序排列(因此(1,0.5)在(0.5)之后.
Practically, I've got a set of objects with probabilities, and I want to look at each possible group of them, in order of how likely it is that they're all true assuming they're independent -- i.e. in descending order of the product of the elements of the subsets -- or in order of length if the probabilities are the same (so that (1, 0.5) comes after (0.5)).
示例:如果我有[ 1, 0.5, 0.1 ]
,我要[ (), (1), (0.5), (1, 0.5), (0.1), (1, 0.1), (0.5, 0.1), (1, 0.5, 0.1) ]
Example: If I have [ 1, 0.5, 0.1 ]
I want [ (), (1), (0.5), (1, 0.5), (0.1), (1, 0.1), (0.5, 0.1), (1, 0.5, 0.1) ]
从本质上讲,这意味着我想按顺序遍历一组元素的幂集,并且我可以很容易地生成它,对其进行排序并完成.但是,幂集很快变得相当大,我希望我通常会想要第一个子集,而我宁愿不生成成千上万个子集的列表,也不对它们进行排序,然后再也不要超越第三个子集.这就是python生成器希望节省一天的地方!
In essence, this means I want to iterate over the powerset of a set of elements in order, and I could fairly easily generate this, sort it, and be done. However, powersets get pretty big pretty fast, I expect I'm usually going to want one of the first subsets, and I'd rather not generate a list of thousands of subsets, sort them, and then never look past the third. This is where python generators hopefully save the day!
对该问题进行更正式的说明,我需要找到一种方法来作为生成器来做sorted(powerset(input), key = lambda l : reduce (lambda (p, n), e: (p * e, n-1), l, (1, 0)), reverse=True)
,或者以其他方式使我避免构建和排序整个列表.
More formal specification of the problem, I need to work out a way to do sorted(powerset(input), key = lambda l : reduce (lambda (p, n), e: (p * e, n-1), l, (1, 0)), reverse=True)
, as a generator, or in some other way that lets me avoid building and sorting the entire list.
我可以肯定地确定这与背包问题以及子集产品问题有关,但是我真的很难为它找到一个可行的好的算法,并且非常感谢帮助:-).它比在最坏的情况下(对整个过程进行迭代)对整个事物进行构建+排序要慢,这不是问题,它只需要更好的最佳情况(例如,在前10%以内)即可.>
I'm reasonably sure this is related to the knapsack problem, along with the subset product problem, but I'm really struggling to get a nice algorithm for it that works, and help would be very much appreciated :-). It's not an issue for it to be slower than building + sorting the whole thing in the worst case (iterating all the way to the end), it just needs much better best case (within the first 10%, say) performance.
推荐答案
很好的问题,解决起来非常棘手.我也想不出一种按顺序生成组合的方法,但是我挥舞着强大的heapq
(又名优先队列)来保持候选人的排序.
Nice question, it was quite tricky to solve. I can't think of a way to generate the combinations in order either, but I wield the mighty heapq
(aka a priority queue) to keep the candidates sorted.
from heapq import heappush, heappop
import operator
def prob(ps):
""" returns the probability that *not* all ps are True """
return 1-reduce(operator.mul, ps)
def gen(ps):
# turn each to a tuple
items = ((x,) for x in sorted(ps, reverse=True))
# create a priority queue, sorted by probability
pq = [(prob(x),x) for x in items]
# because you wanted this
yield ()
# as long as there are valid combinations
while pq:
# get the best un-yielded combination, the pq makes sure of that
p, x = heappop(pq)
yield x
# generate all the combinations from this item
for other in ps:
# keeping the tuples sorted -> unique combinations
if other < x[-1]:
# create a new combination
new = x+(other,)
item = prob(new), new
# add it to the queue
heappush(pq,item)
a = [1, 0.1, 0.5]
print list(gen(a))
这篇关于在不构建和排序整个列表的情况下(即生成器)按产品顺序获取列表的每个可能子集的算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!