排列生成时的System.OutOfMemoryException [英] System.OutOfMemoryException when generating permutations

查看:210
本文介绍了排列生成时的System.OutOfMemoryException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想生成6字母排列时获得的System.OutOfMemoryException 。 5字母排列仍能正常工作。

I'm getting System.OutOfMemoryException when trying to generate 6 letter permutations. 5 letter permutations still work.

下面是code我使用生成所有排列:

Here is the code I'm using to generate ALL permutations:

private static List<string> getPermutations(int n,string source)
        {
            IEnumerable<string> q = source.Select(x => x.ToString());
            for (int i = 0; i < n - 1; i++)
            {
                q = q.SelectMany(x => source, (x, y) => x + y);
            }
            return q.ToList(); // THIS IS WHERE THE ERROR HAPPENS
        }

之后,我使用的这块code基于正则表达式来进行筛选:

after which I am using this piece of code to filter them based on regex:

private static List<string> filterListByRegex(List<string> list, string regex)
        {
            List<string> newList = list.ToList();
            for (int i = newList.Count - 1; i >= 0; i--)
            {
                Match match = Regex.Match(""+newList[i], regex, RegexOptions.IgnoreCase);
                if (!match.Success)
                {
                    newList.RemoveAt(i);
                }
            }
            return newList;
        }

所以,因为我并不真的需要那些排列,是有办法的正则表达式过滤器,同时生成排列,或者我应该用更高效的一块code,生成排列?

so since I don't really need ALL those permutations, is there a way to regex filter while generating permutations, or should I use a more efficient piece of code to generate permutations?

下面是图片更好地展示我想要实现:

Here is a picture to better demonstrate what I'm trying to achieve:

垂直字母串是一个我说的是code使用。

The vertical alphabet string is the one I'm telling the code to use.

推荐答案

首先,我想提一提的是,我们在这里讨论的是不是真正的排列,而是所谓的 n元组排列与重复 - 维基百科

First, I would like to mention that what we are discussing here is not real permutations, but so called n-tuples or permutations with repetition - Wikipedia.

二,生成排列时关于的System.OutOfMemoryException,我想我们大家都同意,结果不应该被存放​​在一个列表中,但作为枚举提供的,这将使实施过滤和消费(最终存储),只有在兴趣的人。

Second, regarding the System.OutOfMemoryException when generating permutations, I think we all agree that the result should not be stored in a list, but provided as enumerable which will allow applying filtering and consuming (eventually storing) only the ones in interest.

在这方面所@juharr提供的LINQ解决方案表现非常好。所以,我的目标是尽量减少中间的内存分配,包括字符串连接,并与更广泛和更快的解决方案,以结束。

In that regard the LINQ solution provided by @juharr is performing very well. So my goals are to minimize the intermediary memory allocations, including string concatenations and also to end up with a more general and faster solution.

在为了做到这一点,我需要采取一些硬的设计决策。我说的是一般的函数的签名看起来像这样

In order to do that, I need to take some hard design decision. The signature of the general function I'm talking about will look like this

public static IEnumerable<T[]> RepeatingPermutations<T>(this T[] set, int N)

和问题是应该是什么阵列产生。如果我们按照recomendations,他们应该是一个单独的数组实例。但是,请记住我希望尽量减少分配的,我决定打破这一规则,并产生一个相同的数组实例,动不修改它,如果需要克隆它给调用者的责任。例如,这允许执行无成本滤波呼叫者。或实施在上面OP功能上像这样

and the question is what should be the array yielded. If we follow the recomendations, they should be a separate array instances. However, remember I want to minimize allocations, I've decided to break that rules and yield one and the same array instance, moving the responsibility of not modifying it and cloning it if necessary to the caller. For instance, this allows the caller to perform no cost filtering. Or implement the OP function on top on it like this

public static IEnumerable<string> RepeatingPermutations(this string set, int N)
{
    return set.ToCharArray().RepeatingPermutations(N).Select(p => new string(p));
}

关于算法的几句话。而不是看问题递归其他一些积极答复,我想有效地实施这样的事情相当于

A few words about the algorithm. Instead of looking at the problem recursively as some other answerers, I want to effectively implement the equivalent of something like this

from e1 in set
from e2 in set
...
from eN in set
select new [] { e1, e2, .., eN }

有趣的是,最近我回答有关<一个一个的组合href=\"http://stackoverflow.com/questions/35493785/looking-at-each-combination-in-jagged-array/35494340#35494340\">question并意识到算法是pretty大同小异。

Interestingly, I recently answered a combinations related question and realized that the algorithms are pretty much the same.

与所有他这样说,这里是功能:

With all that being said, here is the function:

public static IEnumerable<T[]> RepeatingPermutations<T>(this T[] set, int N)
{
    var result = new T[N];
    var indices = new int[N];
    for (int pos = 0, index = 0; ;)
    {
        for (; pos < N; pos++, index = 0)
        {
            indices[pos] = index;
            result[pos] = set[index];
        }
        yield return result;
        do
        {
            if (pos == 0) yield break;
            index = indices[--pos] + 1;
        }
        while (index >= set.Length);
    }
}

我通过简单地用N = 2,3,... 6调用不同的功能和简单的迭代和计数做了一些测试。这里是我的机器上的结果:

I've did some tests by simply calling the different functions with N=2,3,..6 and simply iterating and counting. Here are the results on my machine:

A : N=2 Count=         676 Time=00:00:00.0000467 Memory=     29K
B1: N=2 Count=         676 Time=00:00:00.0000263 Memory=     16K
B2: N=2 Count=         676 Time=00:00:00.0000189 Memory=      8K

A : N=3 Count=      17,576 Time=00:00:00.0010107 Memory=    657K
B1: N=3 Count=      17,576 Time=00:00:00.0003673 Memory=    344K
B2: N=3 Count=      17,576 Time=00:00:00.0001415 Memory=      8K

A : N=4 Count=     456,976 Time=00:00:00.0184445 Memory=  2,472K
B1: N=4 Count=     456,976 Time=00:00:00.0096189 Memory=  2,520K
B2: N=4 Count=     456,976 Time=00:00:00.0033624 Memory=      8K

A : N=5 Count=  11,881,376 Time=00:00:00.4281349 Memory=    397K
B1: N=5 Count=  11,881,376 Time=00:00:00.2482835 Memory=  4,042K
B2: N=5 Count=  11,881,376 Time=00:00:00.0887759 Memory=      8K

A : N=6 Count= 308,915,776 Time=00:00:11.2697326 Memory=  1,688K
B1: N=6 Count= 308,915,776 Time=00:00:06.5638404 Memory=  1,024K
B2: N=6 Count= 308,915,776 Time=00:00:02.2674431 Memory=      8K

其中,

A - 从@juharr结果LINQ功能
B1 - 我与字符串搜索功能
B2 - 我与CHAR函数[]

A - LINQ function from @juharr
B1 - my function with string
B2 - my function with char[]

我们可以看到,内存明智两个字符串函数具有可比性。性能明智的LINQ功能是慢了〜2倍,这是pretty良好的效果。

As we can see, memory wise both string functions are comparable. Performance wise the LINQ function is only ~2 times slower, which is pretty good result.

如在这样的场景预期,该非清分功能显著优于它们。

As expected in such scenario, the non allocating function significantly outperforms them both.

更新:正如意见中的要求,这里是上述功能(请注意,它们是扩展方法,必须放置在一个静态类你选择的)样本用法:

UPDATE: As requested in the comments, here is the sample usage of the above functions (note that they are extension methods and must be placed in a static class of your choice):

var charSet = Enumerable.Range('A', 'Z' - 'A' + 1).Select(c => (char)c).ToArray();
var charPermutations = charSet.RepeatingPermutations(3);
var stringSet = new string(charset);
var stringPermutations = stringSet.RepeatingPermutations(3);

不过,记得设计选择我做了,所以如果你展开 charPermutations 在调试器中,你会看到一个相同的值(最后置换)。消费上述呼的整个结果为的char [] 应该是这样的。

However, remember the design choice I've made, so if you expand the charPermutations inside the debugger, you'll see one and the same values (the last permutation). Consuming the whole result of the above call for char[] should be like this

var charPermutationList = charSet.RepeatingPermutations(3)
    .Select(p => (char[])p.Clone()).ToList();

其实一个很好的补充,两种方法presented将下面的扩展方法:

Actually a good addition to the two methods presented would be the following extension method:

public static IEnumerable<T[]> Clone<T>(this IEnumerable<T[]> source)
{
    return source.Select(item => (T[])item.Clone());
}

所以消耗的呼叫将是简单

so the consuming call would be simple

var charPermutationList = charSet.RepeatingPermutations(3).Clone().ToList();

这篇关于排列生成时的System.OutOfMemoryException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆