为最小长度的子集高效算法幂 [英] efficient powerset algorithm for subsets of minimal length

查看:146
本文介绍了为最小长度的子集高效算法幂的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用下面的C#功能得到幂限制在最小长度的子集

i am using the following C# function to get a powerset limited to subsets of a minimal length

string[] PowerSet(int min_len, string set)
{
    IEnumerable<IEnumerable<string>> seed = 
                    new List<IEnumerable<string>>() { Enumerable.Empty<string>() };

    return set.Replace(" ", "")
              .Split(',')
              .Aggregate(seed, (a, b) => a.Concat(a.Select(x => x.Concat(new[] { b }))))
              .Where(subset => subset.Count() >= min_len)
              .Select(subset => string.Join(",", subset))
              .ToArray();
}



问题是,当原设定为大,该算法工作

the problem is that when the original set is large, the algorithm has to work very hard even if the minimal length is large as well.

e.g:

如$ C>幂(27日,1,11,12,17,22,127,128,135,240,254,277,284,292,296,399,309,322,326,333,439,440,442,447,567,580,590,692,697);

PowerSet(27, "1,11,12,17,22,127,128,135,240,254,277,284,292,296,399,309,322,326,333,439,440,442,447,567,580,590,692,697");



应该是很容易的,但过于长地对上述功能。我找我的函数的简明修正,它能够有效地处理这些案件。

should be very easy, but is too lengthily for the above function. i am looking for a concise modification of my function which could efficiently handle these cases.

推荐答案

在你的方法采取快速浏览一下,效率低下的一个是,每一个可能的子集创建的,不管它是否有足够的成员,以保证包含在有限的超集。

Taking a quick look at your method, one of the inefficiencies is that every possible subset is created, regardless of whether it has enough members to warrant inclusion in the limited super set.

考虑实施以下扩展方法,而不是。这种方法可以根据他们的数量,以避免过多的计算修剪出一些不必要的子集

Consider implementing the following extension method instead. This method can trim out some unnecessary subsets based on their count to avoid excess computation.

public static List<List<T>> PowerSet<T>(List<T> startingSet, int minSubsetSize)
{
    List<List<T>> subsetList = new List<List<T>>();

    //The set bits of each intermediate value represent unique 
    //combinations from the startingSet.
    //We can start checking for combinations at (1<<minSubsetSize)-1 since
    //values less than that will not yield large enough subsets.
    int iLimit = 1 << startingSet.Count;
    for (int i = (1 << minSubsetSize)-1; i < iLimit; i++)
    {
        //Get the number of 1's in this 'i'
        int setBitCount = NumberOfSetBits(i);

        //Only include this subset if it will have at least minSubsetSize members.
        if (setBitCount >= minSubsetSize)
        {
            List<T> subset = new List<T>(setBitCount);

            for (int j = 0; j < startingSet.Count; j++)
            {
                //If the j'th bit in i is set, 
                //then add the j'th element of the startingSet to this subset.
                if ((i & (1 << j)) != 0)
                {
                    subset.Add(startingSet[j]);
                }
            }
            subsetList.Add(subset);
        }
    }
    return subsetList;
}



组位在每个增量数I 告诉你许多成员将如何在子集。如果没有足够的比特集,那么在操作的创建由比特组合表示的子集的工作没有任何意义。 NumberOfSetBits 可以实现多种方式。请参见最好的算法来计算比特组的32位整数多少?的各种方法,解释和引用。下面是从SO问题采取的一个例子。

The number of set bits in each incremental i tells you how many members will be in the subset. If there are not enough set bits, then there is no point in doing the work of creating the subset represented by the bit combination. NumberOfSetBits can be implemented a number of ways. See Best algorithm to count the number of set bits in a 32-bit integer? for various approaches, explanations and references. Here is one example taken from that SO question.

public static int NumberOfSetBits(int i)
{
    i = i - ((i >> 1) & 0x55555555);
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
    return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}

现在,虽然这种解决方案适用于你的榜样,我想你会碰上长运行时间和内存的问题,如果你降低最低的子集大小太远或继续增长的 startingSet 的大小。如果没有特殊要求张贴在你的问题,我无法判断,如果这个解决方案会为你工作和/或对你的期望的输入病例范围内安全。

Now, while this solution works for your example, I think you will run into long runtimes and memory issues if you lower the minimum subset size too far or continue to grow the size of the startingSet. Without specific requirements posted in your question, I can't judge if this solution will work for you and/or is safe for your range of expected input cases.

如果你发现这个解决方案仍然太慢,操作可以拆分为并行计算,可能使用PLINQ功能。

If you find that this solution is still too slow, the operations can be split up for parallel computation, perhaps using PLINQ features.

最后,如果你想打扮与LINQ扩展方法,它看起来像下面这样。然而,正如写的,我想你会看到较慢的性能没有一些更改。

Lastly, if you would like to dress up the extension method with LINQ, it would look like the following. However, as written, I think you will see slower performance without some changes to it.

public static IEnumerable<List<T>> PowerSet<T>(List<T> startingSet, int minSubsetSize)
{
    var startingSetIndexes = Enumerable.Range(0, startingSet.Count).ToList();

    var candidates = Enumerable.Range((1 << minSubsetSize)-1, 1 << startingSet.Count)
                               .Where(p => NumberOfSetBits(p) >= minSubsetSize)
                               .ToList();

    foreach (int p in candidates)
    {
        yield return startingSetIndexes.Where(setInd => (p & (1 << setInd)) != 0)
                                       .Select(setInd => startingSet[setInd])
                                       .ToList();
    }
}

这篇关于为最小长度的子集高效算法幂的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆