在给定互斥整数键的情况下,找到多个字典值的最小和的有效方法是什么? [英] What is an efficient way to find the minimum sum of multiple dictionary values, given keys of mutually exclusive integers?

查看:96
本文介绍了在给定互斥整数键的情况下,找到多个字典值的最小和的有效方法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字典,它的键由整数0到 n 的所有len(4)组合组成,所有值均为浮点数(表示由另一个函数计算的成本) 。

I have a dictionary, the keys of which consist of all len(4) combinations of the integers 0 to n, with all values being floats (representing a cost that was computed by another function).

例如:

cost_dict = {(0,1,2,3): 6.23, (0,1,2,4): 7.89,
            ...
            (14,15,16,17): 2.57}

我想有效地找到 m 互斥的密钥(也就是说,这些密钥不共享任何(它们的整数),其值总和为最低值(因此,找到了最低的总成本)。也就是说,我不仅想要字典的 m 最小值,而且还希望 m 互斥值总和为最小值。 (或者没有达到绝对最小值,我不会介意有那么高效的东西)。

I would like to efficiently find m mutually exclusive keys (that is, where the keys do not share any of their integers) whose values sum to the lowest number (thus, finding the lowest overall cost). That is, I don't just want the m minimum values of the dictionary, I want m mutually exclusive values that sum to the lowest value. (Or failing the absolute minimum, I wouldn't mind something efficient that comes pretty close).

因此在上面的示例中,对于 m = 3,也许:

So in the above example, for m = 3, maybe:

cost_dict[(0,3,5,11)]
>1.1 
cost_dict[(2,6,7,13)]
>0.24
cost_dict[(4,10,14,15)]
>3.91

...可能是该字典中所有互斥的键的总和为最低值的键。

... could be the keys whose values sum to the lowest possible value, of all mutually exclusively keys in this dictionary.

字典中最小的三个值可能是这样的:

It may be possible that the smallest three values in the dict were something like:

cost_dict[(0,3,7,13)]
>0.5
cost_dict[(2,6,7,13)]
>0.24
cost_dict[(4,6,14,15)]
>0.8

但是鉴于这些键中的整数不是是互斥的,所以这是不正确的。

But given the integers in these keys are not mutually exclusive, this would not be correct.

是否有可能比O(n ** m)时间更好?也就是说,我可以针对 m 级别将每个项目与键与第一个键不相交的其他所有项求和(这需要将键设置为冻结集而不是元组)。鉴于字典的长度最多可以达到10,000,这相当慢。

Is it possible to do better than O(n**m) time? That is, I could sum every item against every other item whose key is disjoint with the first (this would need the keys to be frozensets instead of tuples) for m levels. This is rather slow given the dictionary's length can be up to 10,000.

在此问题的早期版本中似乎有所帮助的事情是创建了所有可能的键组合的列表,这很耗时,但可能更有效鉴于我将需要多次寻找最低成本。

Something that seems to have helped me with an earlier version of this problem is creating a list of all possible combinations of keys, which is time-intensive, but potentially more efficient given that I will need to be finding the minimum cost numerous times.

推荐答案

我尝试通过三种不同的方法解决此问题:优化的暴力破解,动态编程方法和贪婪算法。前两个不能处理 n> 17 ,但生成了最优解,因此我可以使用它们来验证贪婪方法的平均性能。我将从动态编程方法开始,然后描述贪婪的方法。

I tried solving this problem three different ways- an optimized brute force, a dynamic programming approach, and a greedy algorithm. The first two could not handle inputs for n > 17, but generated optimal solutions, so I could use them to verify the average performance of the greedy method. I'll start first with the dynamic programming approach, and then describe the greedy one.

首先,请注意,如果我们能够确定(1、2、3、4)(5、6、7、8)总和为值小于(3、4、5、6)(1、2、7、8),然后您的最佳解决方案绝对不能同时包含(3、4、5、6)(1、2、7、8)-因为您可以将它们换成前者,并且金额较小。扩展此逻辑,将存在(a,b,c,d)(e,f,g,h)导致 x0,x1,x2,x3,x4,x5,x6,x7 的所有组合的总和最小,因此我们可以确定

First, note that we can if we determine that (1, 2, 3, 4) and (5, 6, 7, 8) sum to a smaller value than (3, 4, 5, 6) and (1, 2, 7, 8), then your optimal solution absolutely cannot contain both (3, 4, 5, 6) and (1, 2, 7, 8)- because you could swap them out for the former, and have a smaller sum. Extending this logic, there will be one optimal combination of (a, b, c, d) and (e, f, g, h) that results in the minimal sum from all combinations of x0, x1, x2, x3, x4, x5, x6, x7, and so we can rule out all of the others.

利用这些知识,我们可以成为所有 x0,x1,x2,x3,x4,x5,x6,x7的字典映射通过强行强制的所有组合的总和,将 [0,n)中的个组合最小化x0,x1,x2,x3 。然后,我们可以使用这些映射对 x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11重复该过程。 code> x0,x1,x2,x3,x4,x5,x6,x7 x0,x1,x2,x3 对。我们重复此过程,直到获得 x0,x1 ... x_(4 * m-1)的所有最小和,然后迭代查找最小和。

Using this knowledge, we can be a dictionary mapping of all x0, x1, x2, x3, x4, x5, x6, x7 combinations from the set [0, n), to their minimal sums, by brute forcing the sums of all combinations of x0, x1, x2, x3. Then, we can use these mappings to repeat the process for x0, x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11 from x0, x1, x2, x3, x4, x5, x6, x7 and x0, x1, x2, x3 pairs. We repeat this process until we obtain all minimal sums for x0, x1 ... x_(4*m-1), which we then iterate over to find the minimal sum.

def dp_solve(const_dict, n, m):

    lookup = {comb:(comb,) for comb in const_dict.keys()}

    keys = set(range(n))
    for size in range(8, 4 * m + 1, 4):
        for key_total in combinations(keys, size):
            key_set = set(key_total)
            min_keys = (key_total[:4], key_total[4:])
            min_val = const_dict[min_keys[0]] + const_dict[min_keys[1]]

            key1, key2 = min(zip(combinations(key_total, 4), reversed(list(combinations(key_total, size - 4)))), key=lambda x:const_dict[x[0]]+const_dict[x[1]])

            k = tuple(sorted(x for x in key1 + key2))
            const_dict[k] = const_dict[key1] + const_dict[key2]
            lookup[k] = lookup[key1] + lookup[key2]

    key, val = min(((key, val) for key, val in const_dict.items() if len(key) == 4 * m), key=lambda x: x[1])
    return lookup[key], val

诚然,这种实现还很粗糙,因为我一直在逐段进行微优化,希望能够使其足够快而不必切换到贪婪的方法。

Admittedly this implementation is pretty gnarly, because I kept micro-optimizing piece after piece hoping to make it fast enough without having to switch to a greedy approach.

这可能是您所关心的,因为它可以快速处理相当大的输入,并且非常准确。

This is probably the one you care about, since it handles fairly large inputs quickly, and is quite accurate.

首先为部分和构造一个列表,然后通过增加值对字典中的元素进行迭代。对于每个元素,找到不会与它们的键产生任何冲突的所有部分和,并组合它们。他们成一个新的部分和,并追加到列表中。这样,您可以建立一个最小部分和的列表,该列表可以从字典中的最小 k 值创建。为了加快这一切,我使用散列集来快速检查哪些部分和包含相同密钥的对。

Start by constructing a list for partial sums, and begin iterating over your elements in your dictionary by increasing value. For each element, find all partial sums that don't create any collisions with their keys and "combine" them into a new partial sum, and append to the list. In doing so, you build a list of minimal partial sums that can be created from the smallest k values in your dictionary. To speed this all up, I use hash sets to quickly check which partial sums contain pairs of the same key.

贪婪的方法,当您发现密钥长度为 4 * m (或等价于 m 4元组)。根据我的经验,这通常可以取得不错的结果,但是我想添加一些逻辑以使其在需要时更加准确。为此,我添加了两个因素-

In the "fast" greedy approach, you would abort the moment you find a partial sum that has a key length of 4 * m (or equivalently, of m 4-tuples). This usually nets fairly good results in my experience, but I wanted to add some logic to make it more accurate if need be. To do so, I add two factors-


  • extra_runs -规定了要搜索的额外迭代次数以获得更好的解决方案,然后再打破

  • check_factor -指定当前搜索深度的倍数。向前扫描一个单个新整数,该整数可以使用当前状态创建更好的解决方案。这与上面的不同之处在于,它不保留或保留。每个新的整数都会被检查-它只会进行快速求和,以查看是否创建了新的最小值。这使得速度显着提高,但代价是其中一部分 $ 1中的其他 m-1 个四元组必须已经存在。

  • extra_runs - which dictates how many extra iterations to search for better solutions before breaking
  • check_factor - which specifies a multiple of current search "depth" to scan forward for a single new integer that creates a better solution with the current state. This is different from the above in that it does not "preserve" each new integer checked- it only does a quick sum to see if it creates a new min. This makes it significantly faster, at the cost that the other m - 1 4-tuples must already exist in one of the partial sums.

结合起来,这些检查似乎总能找到真正的最小总和,但所需的运行时间却要长5倍左右(尽管仍然相当快)。要禁用它们,只需为两个因素传递 0

Combined, these checks seem to always find the true minimal sum, at the cost of about 5x longer runtime (still quite fast though). To disable them, just pass 0 for both factors.

def greedy_solve(const_dict, n, m, extra_runs=10, check_factor=2):
    pairs = sorted(const_dict.items(), key=lambda x: x[1])

    lookup = [set([]) for _ in range(n)]
    nset = set([])

    min_sums = []
    min_key, min_val = None, None
    for i, (pkey, pval) in enumerate(pairs):
        valid = set(nset)
        for x in pkey:
            valid -= lookup[x]
            lookup[x].add(len(min_sums))
        
        nset.add(len(min_sums))
        min_sums.append(((pkey,), pval))

        for x in pkey:
            lookup[x].update(range(len(min_sums), len(min_sums) + len(valid)))
        for idx in valid:
            comb, val = min_sums[idx]
            for key in comb:
                for x in key:
                    lookup[x].add(len(min_sums))
            nset.add(len(min_sums))
            min_sums.append((comb + (pkey,), val + pval))
            if len(comb) == m - 1 and (not min_key or min_val > val + pval):
                min_key, min_val = min_sums[-1]
        
        if min_key:
            if not extra_runs: break
            extra_runs -= 1

    for pkey, pval in pairs[:int(check_factor*i)]:
        valid = set(nset)
        for x in pkey:
            valid -= lookup[x]
        
        for idx in valid:
            comb, val = min_sums[idx]
            if len(comb) < m - 1:
                nset.remove(idx)
            elif min_val > val + pval:
                min_key, min_val = comb + (pkey,), val + pval
    return min_key, min_val

我为此测试了 n< 36 m< 9 ,它的运行速度似乎相当快(最差时几秒钟即可完成)。我想它应该可以在您的情况下迅速发挥作用 12< = n< = 24

I tested this for n < 36 and m < 9, and it seemed to run fairly fast (couple of seconds to complete at worst). I'd imagine it should work for your case 12 <= n <= 24 pretty quickly.

这篇关于在给定互斥整数键的情况下,找到多个字典值的最小和的有效方法是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆