快速找到具有最大不同元素总数的列表列表子集 [英] Quickly find subset of list of lists with greatest total distinct elements

查看:49
本文介绍了快速找到具有最大不同元素总数的列表列表子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一个元组列表的列表,我想找到列表的子集,该列表的子集可以最大化不同整数值的数量而无需重复任何整数.

Given a list of lists of tuples, I would like to find the subset of lists which maximize the number of distinct integer values without any integer being repeated.

列表看起来像这样:

x = [
         [(1,2,3), (8,9,10), (15,16)],
         [(2,3), (10,11)],
         [(9,10,11), (17,18,19), (20,21,22)],
         [(4,5), (11,12,13), (18,19,20)]
    ]

内部元组始终是顺序的->(1,2,3)或(15,16),但它们可以是任意长度.

The internal tuples are always sequential --> (1,2,3) or (15,16), but they may be of any length.

在这种情况下,预期收益为:

In this case, the expected return would be:

maximized_list = [
                  [(1, 2, 3), (8, 9, 10), (15, 16)], 
                  [(4, 5), (11, 12, 13), (18, 19, 20)]
                 ]

这是有效的,因为在每种情况下:

This is valid because in each case:

  1. x的每个内部列表均保持不变
  2. 最大数量的不同整数(在这种情况下为16)
  3. 不重复整数.

如果有多个有效的解决方案,则应将所有解决方案归还列表.

If there are multiple valid solutions, all should be return in a list.

我对此有一个幼稚的实现,主要是基于我之前问过的一个stackoverflow问题,该问题的形式不尽如人意(

I have a naive implementation of this, heavily based on a previous stackoverflow question I had asked, which was not as well formed as it could have been (Python: Find tuples with greatest total distinct values):

import itertools

def maximize(self, x):
    max_ = 0
    possible_patterns = []

    for i in range(1, len(x)+1):
        b = itertools.combinations(x, i)

        for combo in b:
            all_ints = tuple(itertools.chain(*itertools.chain(*combo)))
            distinct_ints = tuple(set(all_ints))

            if sorted(all_ints) != sorted(distinct_ints):
                continue
            else:
                if len(all_ints) >= max_:
                    if len(all_ints) == max_:
                        possible_patterns.append(combo)
                        new_max = len(all_ints)
                    elif len(all_ints) > max_:
                        possible_patterns = [combo]
                        new_max = len(all_ints)
                    max_ = new_max

    return possible_patterns

上面提到的功能似乎可以给我正确的结果,但不能扩展.我将需要接受带有几千个列表(可能多达几万个)的x值,因此需要一种优化的算法.

The above-mentioned function appears to give me the correct result, but does not scale. I will need to accept x values with a few thousand lists (possibly as many as tens of thousands), so an optimized algorithm is required.

推荐答案

以下内容针对基数解决了子列表的最大子集.它通过展平每个子列表,构造一个子列表之间的交集列表,然后在深度优先搜索中搜索具有最多元素(即最大权重")的解的解空间来进行工作.

The following solves for the maximal subset of sublists, with respect to cardinality. It works by flattening each sublist, constructing a list of sets of intersections between the sublists, and then searches the solution space in a depth-first-search for the solution with the most elements (i.e. largest "weight").

def maximize_distinct(sublists):
    subsets = [{x for tup in sublist for x in tup} for sublist in sublists]

    def intersect(subset):
        return {i for i, sset in enumerate(subsets) if subset & sset}

    intersections = [intersect(subset) for subset in subsets]
    weights = [len(subset) for subset in subsets]

    pool = set(range(len(subsets)))
    max_set, _ = search_max(pool, intersections, weights)
    return [sublists[i] for i in max_set]

def search_max(pool, intersections, weights):
    if not pool: return [], 0

    max_set = max_weight = None
    for num in pool:
        next_pool = {x for x in pool - intersections[num] if x > num}
        set_ids, weight = search_max(next_pool, intersections, weights)

        if not max_set or max_weight < weight + weights[num]:
            max_set, max_weight = [num] + set_ids, weight + weights[num]
    return max_set, max_weight

可以通过以下方式进一步优化该代码:保留权重"(子列表的基数之和)的连续总计,并在搜索空间的该分支超出迄今为止的最大解的范围时将其修剪(这将是最小的丢弃物重量).但是,除非遇到性能问题,否则这可能是比其价值还要多的工作,并且对于一小部分列表,计算的开销将超过修剪的速度.

This code can be optimized further by keeping a running total of the "weights" (sum of cardinalities of sublists) discarded, and pruning that branch of the search space when it exceeds that of the maximal solution so far (which will be the minimal discard weight). Unless you run into performance problems however, this will likely be more work than its worth, and for a small list of lists the overhead of the computation will exceed the speedup of pruning.

这篇关于快速找到具有最大不同元素总数的列表列表子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆