优化LINQ将多个列表组合成新的通用列表 [英] Optimizing LINQ combining multiple lists into new generic list

查看:60
本文介绍了优化LINQ将多个列表组合成新的通用列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出以下三个列表:

    var FirstNames = new List<string>(){ "Bob", "Sondra", "Avery", "Von", "Randle", "Gwen", "Paisley" };
    var LastNames = new List<string>(){ "Anderson", "Carlson", "Vickers", "Black", "Schultz", "Marigold", "Johnson" };
    var Birthdates = new List<DateTime>()
                    { 
                        Convert.ToDateTime("11/12/1980"), 
                        Convert.ToDateTime("09/16/1978"), 
                        Convert.ToDateTime("05/18/1985"), 
                        Convert.ToDateTime("10/29/1980"), 
                        Convert.ToDateTime("01/19/1989"), 
                        Convert.ToDateTime("01/14/1972"), 
                        Convert.ToDateTime("02/20/1981") 
                    };

我想将它们组合成一个新的通用类型,其中列表共享的关系是它们在集合中的位置.即,名字[0],姓氏[0],生日[0]是相关的.

I'd like to combine them into a new generic type where the relationship the lists share is their position in the collection. i.e. FirstNames[0], LastNames[0], Birthdates[0] are related.

所以我想出了这个与索引匹配的LINQ,目前看来还可以:

So I have come up with this LINQ, matching the indices, which seems to work fine for now:

    var students = from fn in FirstNames
                   from ln in LastNames
                   from bd in Birthdates
                   where FirstNames.IndexOf(fn) == LastNames.IndexOf(ln)
                   where FirstNames.IndexOf(fn) == Birthdates.IndexOf(bd)
                   select new { First = fn, Last = ln, Birthdate = bd.Date };

但是,我强调要测试此代码(每个List<string>List<DateTime>都加载了几百万条记录),并且遇到了SystemOutOfMemory异常.

However, I have stressed tested this code (Each List<string> and List<DateTime> loaded with a few million records) and I run into SystemOutOfMemory Exception.

是否还有其他方法可以编写查询以使用Linq更有效地获得相同的结果?

Is there any other way of writing out this query to achieve the same results more effectively using Linq?

推荐答案

Zip就是为了这个目的.

That is what Zip is for.

var result = FirstNames
  .Zip(LastNames, (f,l) => new {f,l})
  .Zip(BirthDates, (fl, b) => new {First=fl.f, Last = fl.l, BirthDate = b});

关于缩放:

int count = 50000000;
var FirstNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var LastNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var BirthDates = Enumerable.Range(0, count).Select(x=> DateTime.Now.AddSeconds(x));

var sw = new Stopwatch();
sw.Start();

var result = FirstNames
  .Zip(LastNames, (f,l) => new {f,l})
  .Zip(BirthDates, (fl, b) => new {First=fl.f, Last = fl.l, BirthDate = b});

foreach(var r in result)
{
    var x = r;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds); // Returns 69191 on my machine.

这些内存不足时会爆炸:

While these blow up with out of memory:

int count = 50000000;
var FirstNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var LastNames = Enumerable.Range(0, count).Select(x=>x.ToString());
var BirthDates = Enumerable.Range(0, count).Select(x=> DateTime.Now.AddSeconds(x));

var sw = new Stopwatch();
sw.Start();

var FirstNamesList = FirstNames.ToList(); // Blows up in 32-bit .NET with out of Memory
var LastNamesList = LastNames.ToList();
var BirthDatesList = BirthDates.ToList();

var result = Enumerable.Range(0, FirstNamesList.Count())
    .Select(i => new 
                 { 
                     First = FirstNamesList[i], 
                     Last = LastNamesList[i], 
                     Birthdate = BirthDatesList[i] 
                 });

result = BirthDatesList.Select((bd, i) => new
{ 
    First = FirstNamesList[i], 
    Last = LastNamesList[i], 
    BirthDate = bd 
});

foreach(var r in result)
{
    var x = r;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);

在较低的值处,将Enumerables转换为List的成本也比其他对象创建的成本高得多. Zip比索引版本快30%.随着添加更多列,Zips的优势可能会缩小.

At lower values, the cost of converting the Enumerables to a List is much more expensive than the additional object creation as well. Zip was approximately 30% faster than the indexed versions. As you add more columns, Zips advantage would likely shrink.

性能特征也有很大不同. Zip例程将几乎立即开始输出答案,而其他例程将仅在读取了整个Enumerables并将其转换为List之后才开始输出答案,因此,如果要获取结果并使用.Skip(x).Take(y)对其进行分页,或者检查是否有某些内容存在.Any(...),因为它不必转换整个可枚举的对象,因此幅度会更快.

The performance characteristics are also very different. The Zip routine will start outputting answers almost immediately, while the others will start outputting answers only after the entire Enumerables have been read and converted to Lists, so if you take the results and do pagination on it with .Skip(x).Take(y), or check if something exists .Any(...) it will be magnitudes faster as it doesn't have to convert the entire enumerable.

最后,如果它变得对性能至关重要,并且您需要实现许多结果,则可以考虑扩展zip以处理任意数量的Enumerable,例如(从Jon Skeet无耻地偷走了-

Lastly, if it becomes performance critical, and you need to implement many results, you could consider extending zip to handle an arbitrary number of Enumerables like (shamelessly stolen from Jon Skeet - https://codeblog.jonskeet.uk/2011/01/14/reimplementing-linq-to-objects-part-35-zip/):

private static IEnumerable<TResult> Zip<TFirst, TSecond, TThird, TResult>( 
    IEnumerable<TFirst> first, 
    IEnumerable<TSecond> second,
    IEnumerable<TThird> third, 
    Func<TFirst, TSecond, TThird, TResult> resultSelector) 
{ 
    using (IEnumerator<TFirst> iterator1 = first.GetEnumerator()) 
    using (IEnumerator<TSecond> iterator2 = second.GetEnumerator()) 
    using (IEnumerator<TThird> iterator3 = third.GetEnumerator()) 
    { 
        while (iterator1.MoveNext() && iterator2.MoveNext() && iterator3.MoveNext()) 
        { 
            yield return resultSelector(iterator1.Current, iterator2.Current, iterator3.Current); 
        } 
    } 
}

然后您可以执行以下操作:

Then you can do this:

var result = FirstNames
  .Zip(LastNames, BirthDates, (f,l,b) => new {First=f,Last=l,BirthDate=b});

现在您甚至没有创建中间对象的问题,因此您可以尽享一切.

And now you don't even have the issue of the middle object being created, so you get the best of all worlds.

或使用此处的实现来通用地处理任何数字:压缩C#中的枚举/绝对数量的可枚举数量

Or use the implementation here to handle any number generically: Zip multiple/abitrary number of enumerables in C#

这篇关于优化LINQ将多个列表组合成新的通用列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆