的foreach +打破VS LINQ FirstOrDefault性能差异 [英] foreach + break vs linq FirstOrDefault performance difference

查看:149
本文介绍了的foreach +打破VS LINQ FirstOrDefault性能差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个执行日期日期范围的数据抓取,尤其是天两班。

 公共类IterationLookup< TItem>
{
    私人的IList<项目>项=无效;

    公共IterationLookup(IEnumerable的< TItem>的项目,Func键< TItem,TKEY的>的KeySelectors)
    {
        this.items = items.OrderByDescending(的KeySelectors).ToList();
    }

    公共TItem的GetItem(日期时间日)
    {
        的foreach(TItem我在this.items)
        {
           如果(i.IsWithinRange(天))
           {
               返回我;
           }
        }
        返回null;
    }
}


公共类LinqLookup< TItem>
{
    私人的IList<项目>项=无效;

    公共IterationLookup(IEnumerable的< TItem>的项目,Func键< TItem,TKEY的>的KeySelectors)
    {
        this.items = items.OrderByDescending(的KeySelectors).ToList();
    }

    公共TItem的GetItem(日期时间日)
    {
        返回this.items.FirstOrDefault(ⅰ=> i.IsWithinRange(天));
    }
}
 

然后,我做这表明,LINQ的版本是关于 5倍慢速度测试。这将使意义时,我会在本地存储内容,而无需使用了ToList 枚举他们。这将使它更慢,因为在每次调用 FirstOrDefault ,LINQ也将执行 OrderByDescending 。但事实并非如此,所以我真的不知道发生了什么事情。 的LINQ应该执行非常相似,迭代。

这是衡量我的计时

的code摘录

 的IList< RangeItem>范围= GenerateRanges(); //返回列表< T>

VAR iterLookup =新IterationLookup< RangeItems>(范围,R => r.Id);
VAR linqLookup =新LinqLookup< RangeItems>(范围,R => r.Id);

秒表计时器=新的秒表();

timer.Start();
的for(int i = 0; I< 1000000;我++)
{
    iterLookup.GetItem(GetRandomDay());
}
timer.Stop();
//显示经过时间

timer.Restart();
的for(int i = 0; I< 1000000;我++)
{
    linqLookup.GetItem(GetRandomDay());
}
timer.Stop();
//显示经过时间
 

为什么我知道它应该有更好的表现?因为当我写了一个非常类似的code,而不使用这些查询类,LINQ的执行非常相似,的foreach 迭代...

  //从previous code座继续

//使用这两种秩序,因为他们在课堂上做的一样好项目
IList的< RangeItem>项= ranges.OrderByDescending性(r => r.Id).ToList();

timer.Restart();
的for(int i = 0; I< 1000000;我++)
{
    日期时间天= GetRandomDay();
    的foreach(在项目RangeItem R)
    {
        如果(r.IsWithinRange(天))
        {
            // RangeItem结果= R;
            打破;
        }
    }
}
timer.Stop();
//显示经过时间

timer.Restart();
的for(int i = 0; I< 1000000;我++)
{
   日期时间天= GetRandomDay();
   items.FirstOrDefault(ⅰ=> i.IsWithinRange(天));
}
timer.Stop();
//显示经过时间
 

这是我的意见高度相似code。 FirstOrDefault ,就像我也知道迭代仅供长,直到它得到一个有效的项目或者直到它到达终点。这是某种相同的的foreach 破发

不过,即使反复类执行有过之而无不及我简单的的foreach 迭代循环,这也是一个谜,因为它拥有的所有开销相比,在类中调用的方法直接访问。

问题

我是什么,它​​执行这样很慢我的LINQ类做错了什么?
我在做什么错在我的迭代类,它执行两次直接慢的foreach 循环?

正被测量的时间?

我做这些步骤:

  1. 在生成的范围(如结果如下所示)
  2. 创建为IterationLookup,LinqLookup(和我的最佳日期范围类BitCountLookup这不是讨论的一部分在这里)
  3. 对象实例
  4. 启动计时器,并通过使用previously实例IterationLookup类执行百万查找的范围内最大的日期范围(如结果可以看出),随机天。
  5. 启动计时器,并通过使用previously实例LinqLookup类执行百万查找的范围内最大的日期范围(如结果可以看出),随机天。
  6. 在使用手册中的foreach启动计时器并执行百万查找(6次)+破环和LINQ调用。

正如你所看到的对象实例化不测量

附录I:结果超过百万查找

不重叠显示在这些结果范围,这将使这两种方法即使在LINQ版本更类似于不破环上匹配成功(这很有可能不会)。

生成的范围:

ID范围000000000111111111122222222223300000000011111111112222222222
                123456789012345678901234567890112345678901234567890123456789
09 22.01.-30.01。 | ------- |
08 14.01.-16.01。 |  -  |
07 16.02.-19.02。 |  -  |
06 15.01.-17.01。 |  -  |
05 19.02.-23.02。 | --- |
04 01.01.-07.01 | ----- |
03 02.01.-10.01。 | ------- |
02 11.01.-13.01。 |  -  |
01 16.01.-20.01。 | --- |
00 29.01.-06.02。 | ------- |

查找类...

 - 迭代:1028ms
  - 的Linq:4517ms!这就是问题所在!
 -  BitCounter:401ms

手动环...

 -  Iter项目:786ms
 -  LINQ的:981ms
 -  Iter项目:787ms
 -  LINQ的:996ms
 -  Iter项目:787ms
 -  LINQ的:977ms
 -  Iter项目:783ms
 -  LINQ的:979ms

附录二:GitHub的:依据code,以测试自己

我已经提出了一个要点,让你可以Ç自己的全部$ C $,看看发生了什么事情。创建一个控制台应用程序,并复制的的Program.cs 的到它的一个补充,是这个主旨的一部分的其他文件。

5.0分据这里

附录III:最后的想法和测量测试

最困难的事情是当然LINQ implementatino这是非常缓慢的。原来,这所有做委托编译器优化。 LukeH提供了最好的,最实用的解决方案,其实让我尝试不同的方法来此。我已经尝试了各种不同的方法在的GetItem 方法(或 GetPointData 因为它称为GIST):

  1. 通常的方式,大多数开发商会做(和要点是实现好,没有更新后的结果显示,这是不是这样做的最佳方式):

     返回this.items.FirstOrDefault(项目=> item.IsWithinRange(天));
     

  2. 定义一个局部predicate变量:

      Func键< TItem,布尔> predicate =项=> item.IsWithinRange(天);
    返回this.items.FirstOrDefault(predicate);
     

  3. 本地predicate建设者:

      Func键<日期时间,Func键和LT; TItem,布尔>>建设者= D =>项目=> item.IsWithinRange(四);
    返回this.items.FirstOrDefault(助洗剂(天));
     

  4. 本地predicate建设者和当地predicate变量:

      Func键<日期时间,Func键和LT; TItem,布尔>>建设者= D =>项目=> item.IsWithinRange(四);
    FUNC< TItem,布尔> predicate =建设者(天);
    返回this.items.FirstOrDefault(predicate);
     

  5. 一流水平(静态或实例)predicate建设者:

     返回this.items.FirstOrDefault(classLevelBuilder(天));
     

  6. 外部定义predicate并作为方法的参数

     公共TItem的GetItem(Func键< TItem,布尔> predicate)
    {
        返回this.items.FirstOrDefault(predicate);
    }
     

    执行该方法时,我也花了两个方法:

    1. predicate直接在方法调用提供了在循环:

       的for(int i = 0; I< 1000000;我++)
      {
          linqLookup.GetItem(项目=> item.IsWithinRange(GetRandomDay()));
      }
       

    2. predicate外定义的建设者循环:

        Func键<日期时间,Func键和LT;游侠,布尔>>建设者= D => R => r.IsWithinRange(四);
      的for(int i = 0; I< 1000000;我++)
      {
          linqLookup.GetItem(助洗剂(GetRandomDay()));
      }
       

结果 - 什么表现最好

使用迭代上课的时候​​

有关相比,它需要它约。 770ms 以随机产生的范围内执行百万查找。

  1. 在#3当地predicate制造商证明是最佳的编译器优化,所以它的性能几乎一样快像往常一样重复。 800ms的
  2. #6.2 predicate外定义的建设者循环: 885ms
  3. #$ 6.1在定义点$ pdicate循环: 1525ms
  4. 在所有其他地方了 4200ms之间 - 4360ms ,并因此被认为不可用

所以每当你使用predicate在外面经常调用的方法,定义一个生成器并执行。这将产生最佳的结果。

我这个最大的惊喜是,代表(或predicates)可能是这么多耗费时间。

解决方案

Gabe的答案,我可以证实,差别似乎是由成本引起的重新构建委托每次调用 GetPointData

如果我加上一行到你的 IterationRangeLookupSingle GetPointData 方法,然后将它减慢一直到同样的爬行速度为 LinqRangeLookupSingle 。试试吧:

  //在IterationRangeLookupSingle< TItem,TKEY的>
公共TItem GetPointData(日期时间点)
{
    //只是一个单一的线,这个委托是从未使用过
    FUNC< TItem,布尔>假= I => i.IsWithinRange(点);

    //该方法的其余部分保持完全一样之前
    // ...
}
 

(我不知道为什么编译器和/或抖动不能忽略我上面添加多余的委托。显然,代表的的必要的,你的 LinqRangeLookupSingle 类。)

一个可能的解决方法是撰写predicate在 LinqRangeLookupSingle 传递给它一个参数。这意味着代表只需要构成的一旦的,并不是每一个的 GetPointData 方法被调用的时间。例如,下面的变化将加快LINQ版本,因此它的pretty的与的foreach 版本多少可比性:

  //在LinqRangeLookupSingle< TItem,TKEY的>
公共TItem GetPointData(日期时间点)
{
    FUNC<日期时间,Func键和LT; TItem,布尔>>建设者= X => Y => y.IsWithinRange(X);
    FUNC< TItem,布尔> predicate =建设者(点);

    返回this.items.FirstOrDefault(predicate);
}
 

I have two classes that perform date date range data fetching for particular days.

public class IterationLookup<TItem>
{
    private IList<Item> items = null;

    public IterationLookup(IEnumerable<TItem> items, Func<TItem, TKey> keySelector)
    {
        this.items = items.OrderByDescending(keySelector).ToList();
    }

    public TItem GetItem(DateTime day)
    {
        foreach(TItem i in this.items)
        {
           if (i.IsWithinRange(day))
           {
               return i;
           }
        }
        return null;
    }
}


public class LinqLookup<TItem>
{
    private IList<Item> items = null;

    public IterationLookup(IEnumerable<TItem> items, Func<TItem, TKey> keySelector)
    {
        this.items = items.OrderByDescending(keySelector).ToList();
    }

    public TItem GetItem(DateTime day)
    {
        return this.items.FirstOrDefault(i => i.IsWithinRange(day));
    }
}

Then I do speed tests which show that Linq version is about 5 times slower. This would make sense when I would store items locally without enumerating them using ToList. This would make it much slower, because with every call to FirstOrDefault, linq would also perform OrderByDescending. But that's not the case so I don't really know what's going on. Linq should perform very similar to iteration.

This is the code excerpt that measures my timings

IList<RangeItem> ranges = GenerateRanges(); // returns List<T>

var iterLookup = new IterationLookup<RangeItems>(ranges, r => r.Id);
var linqLookup = new LinqLookup<RangeItems>(ranges, r => r.Id);

Stopwatch timer = new Stopwatch();

timer.Start();
for(int i = 0; i < 1000000; i++)
{
    iterLookup.GetItem(GetRandomDay());
}
timer.Stop();
// display elapsed time

timer.Restart();
for(int i = 0; i < 1000000; i++)
{
    linqLookup.GetItem(GetRandomDay());
}
timer.Stop();
// display elapsed time

Why do I know it should perform better? Because when I write a very similar code without using these lookup classes, Linq performs very similar to foreach iterations...

// continue from previous code block

// items used by both order as they do in classes as well
IList<RangeItem> items = ranges.OrderByDescending(r => r.Id).ToList();

timer.Restart();
for(int i = 0; i < 1000000; i++)
{
    DateTime day = GetRandomDay();
    foreach(RangeItem r in items)
    {
        if (r.IsWithinRange(day))
        {
            // RangeItem result = r;
            break;
        }
    }
}    
timer.Stop();
// display elapsed time

timer.Restart();
for(int i = 0; i < 1000000; i++)
{
   DateTime day = GetRandomDay();
   items.FirstOrDefault(i => i.IsWithinRange(day));
}
timer.Stop();
// display elapsed time

This is by my opinion highly similar code. FirstOrDefault as much as I know also iterate for only as long until it gets to a valid item or until it reaches the end. And this is somehow the same as foreach with break.

But even iteration class performs worse than my simple foreach iteration loop which is also a mystery because all the overhead it has is the call to a method within a class compared to direct access.

Question

What am I doing wrong in my LINQ class that it performs so very slow?
What am I doing wrong in my Iteration class so it performs twice as slow as direct foreach loop?

Which times are being measured?

I do these steps:

  1. Generate ranges (as seen below in results)
  2. Create object instances for IterationLookup, LinqLookup (and my optimized date range class BitCountLookup which is not part of discussion here)
  3. Start timer and execute 1 million lookups on random days within maximum date range (as seen in results) by using previously instantiated IterationLookup class.
  4. Start timer and execute 1 million lookups on random days within maximum date range (as seen in results) by using previously instantiated LinqLookup class.
  5. Start timer and execute 1 million lookups (6 times) using manual foreach+break loops and Linq calls.

As you can see object instantiation is not measured.

Appendix I: Results over million lookups

Ranges displayed in these results don't overlap, which should make both approaches even more similar in case LINQ version doesn't break loop on successful match (which it highly likely does).

Generated Ranges:

ID Range        000000000111111111122222222223300000000011111111112222222222
                123456789012345678901234567890112345678901234567890123456789
09 22.01.-30.01.                     |-------|
08 14.01.-16.01.             |-|
07 16.02.-19.02.                                              |--|
06 15.01.-17.01.              |-|
05 19.02.-23.02.                                                 |---|
04 01.01.-07.01.|-----|
03 02.01.-10.01. |-------|
02 11.01.-13.01.          |-|
01 16.01.-20.01.               |---|
00 29.01.-06.02.                            |-------|

Lookup classes...

- Iteration: 1028ms
- Linq: 4517ms   !!! THIS IS THE PROBLEM !!!
- BitCounter: 401ms

Manual loops...

- Iter: 786ms
- Linq: 981ms
- Iter: 787ms
- Linq: 996ms
- Iter: 787ms
- Linq: 977ms
- Iter: 783ms
- Linq: 979ms

Appendix II: GitHub:Gist code to test for yourself

I've put up a Gist so you can get the full code yourself and see what's going on. Create a Console application and copy Program.cs into it an add other files that are part of this gist.

Grab it here.

Appendix III: Final thoughts and measurement tests

The most problematic thing was of course LINQ implementatino that was awfully slow. Turns out that this has all to do with delegate compiler optimization. LukeH provided the best and most usable solution that actually made me try different approaches to this. I've tried various different approaches in the GetItem method (or GetPointData as it's called in Gist):

  1. the usual way that most of developers would do (and is implemented in Gist as well and wasn't updated after results revealed this is not the best way of doing it):

    return this.items.FirstOrDefault(item => item.IsWithinRange(day));
    

  2. by defining a local predicate variable:

    Func<TItem, bool> predicate = item => item.IsWithinRange(day);
    return this.items.FirstOrDefault(predicate);
    

  3. local predicate builder:

    Func<DateTime, Func<TItem, bool>> builder = d => item => item.IsWithinRange(d);
    return this.items.FirstOrDefault(builder(day));
    

  4. local predicate builder and local predicate variable:

    Func<DateTime, Func<TItem, bool>> builder = d => item => item.IsWithinRange(d);
    Func<TItem, bool> predicate = builder(day);
    return this.items.FirstOrDefault(predicate);
    

  5. class level (static or instance) predicate builder:

    return this.items.FirstOrDefault(classLevelBuilder(day));
    

  6. externally defined predicate and provided as method parameter

    public TItem GetItem(Func<TItem, bool> predicate)
    {
        return this.items.FirstOrDefault(predicate);
    }
    

    when executing this method I also took two approaches:

    1. predicate provided directly at method call within for loop:

      for (int i = 0; i < 1000000; i++)
      {
          linqLookup.GetItem(item => item.IsWithinRange(GetRandomDay()));
      }
      

    2. predicate builder defined outside for loop:

      Func<DateTime, Func<Ranger, bool>> builder = d => r => r.IsWithinRange(d);
      for (int i = 0; i < 1000000; i++)
      {
          linqLookup.GetItem(builder(GetRandomDay()));
      }
      

Results - what performs best

For comparison when using iteration class, it takes it approx. 770ms to execute 1 million lookups on randomly generated ranges.

  1. #3 local predicate builder turns out to be best compiler optimized so it performs almost as fast as usual iterations. 800ms.
  2. #6.2 predicate builder defined outside for loop: 885ms
  3. #6.1 predicate defined within for loop: 1525ms
  4. all others took somewhere between 4200ms - 4360ms and are thus considered unusable.

So whenever you use a predicate in externally frequently callable method, define a builder and execute that. This will yield best results.

The biggest surprise to me about this is that delegates (or predicates) may be this much time consuming.

解决方案

Further to Gabe's answer, I can confirm that the difference appears to be caused by the cost of re-constructing the delegate for every call to GetPointData.

If I add a single line to the GetPointData method in your IterationRangeLookupSingle class then it slows right down to the same crawling pace as LinqRangeLookupSingle. Try it:

// in IterationRangeLookupSingle<TItem, TKey>
public TItem GetPointData(DateTime point)
{
    // just a single line, this delegate is never used
    Func<TItem, bool> dummy = i => i.IsWithinRange(point);

    // the rest of the method remains exactly the same as before
    // ...
}

(I'm not sure why the compiler and/or jitter can't just ignore the superfluous delegate that I added above. Obviously, the delegate is necessary in your LinqRangeLookupSingle class.)

One possible workaround is to compose the predicate in LinqRangeLookupSingle so that point is passed to it as an argument. This means that the delegate only needs to be constructed once, not every time the GetPointData method is called. For example, the following change will speed up the LINQ version so that it's pretty much comparable with the foreach version:

// in LinqRangeLookupSingle<TItem, TKey>
public TItem GetPointData(DateTime point)
{
    Func<DateTime, Func<TItem, bool>> builder = x => y => y.IsWithinRange(x);
    Func<TItem, bool> predicate = builder(point);

    return this.items.FirstOrDefault(predicate);
}

这篇关于的foreach +打破VS LINQ FirstOrDefault性能差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆