foreach + break vs linq FirstOrDefault 性能差异 [英] foreach + break vs linq FirstOrDefault performance difference

查看:19
本文介绍了foreach + break vs linq FirstOrDefault 性能差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个类可以执行特定日期的日期范围数据提取.

公共类 IterationLookup{私有 IList项目 = 空;public IterationLookup(IEnumerable items, Func keySelector){this.items = items.OrderByDescending(keySelector).ToList();}公共 TItem GetItem(DateTime 天){foreach(TItem i in this.items){如果(i.IsWithinRange(天)){返回我;}}返回空;}}公共类 LinqLookup{私有 IList项目 = 空;public IterationLookup(IEnumerable items, Func keySelector){this.items = items.OrderByDescending(keySelector).ToList();}公共 TItem GetItem(DateTime 天){返回 this.items.FirstOrDefault(i => i.IsWithinRange(day));}}

然后我进行了速度测试,结果表明 Linq 版本慢了大约 5 倍.当我将项目存储在本地而不使用 ToList 枚举它们时,这将是有意义的.这会使其变慢,因为每次调用 FirstOrDefault 时,linq 也会执行 OrderByDescending.但事实并非如此,所以我真的不知道发生了什么.Linq 的表现应该与迭代非常相似.

这是衡量我的时间的代码摘录

IList范围 = GenerateRanges();//返回列表var iterLookup = new IterationLookup(ranges, r => r.Id);var linqLookup = new LinqLookup(ranges, r => r.Id);秒表计时器 ​​= new Stopwatch();定时器开始();for(int i = 0; i <1000000; i++){iterLookup.GetItem(GetRandomDay());}定时器.停止();//显示经过的时间timer.Restart();for(int i = 0; i <1000000; i++){linqLookup.GetItem(GetRandomDay());}定时器.停止();//显示经过的时间

为什么我知道它应该表现得更好?因为当我在不使用这些查找类的情况下编写非常相似的代码时,Linq 的执行与 foreach 迭代非常相似...

//从上一个代码块继续//两个订单都使用的项目,就像它们在类中一样IListitems = range.OrderByDescending(r => r.Id).ToList();timer.Restart();for(int i = 0; i <1000000; i++){日期时间日 = GetRandomDay();foreach(RangeItem r 在项目中){如果(r.IsWithinRange(天)){//RangeItem 结果 = r;休息;}}}定时器.停止();//显示经过的时间timer.Restart();for(int i = 0; i <1000000; i++){日期时间日 = GetRandomDay();items.FirstOrDefault(i => i.IsWithinRange(day));}定时器.停止();//显示经过的时间

我认为这是高度相似的代码.FirstOrDefault 就我所知,它也只迭代直到到达有效项目或到达终点为止.这在某种程度上与带有 breakforeach 相同.

但即使是迭代类的性能也比我简单的 foreach 迭代循环差,这也是一个谜,因为与直接访问相比,它的所有开销都是对类内方法的调用.

问题

我在 LINQ 类中做错了什么以至于它的执行速度非常慢?
我在 Iteration 类中做错了什么,所以它的执行速度是直接 foreach 循环的两倍?

正在测量哪些时间?

我执行以下步骤:

  1. 生成范围(如下面的结果所示)
  2. 为 IterationLookup、LinqLookup(以及我优化的日期范围类 BitCountLookup 不在这里讨论的一部分)创建对象实例
  3. 使用先前实例化的 IterationLookup 类启动计时器并在最大日期范围内的随机日期执行 100 万次查找(如结果所示).
  4. 使用先前实例化的 LinqLookup 类启动计时器并在最大日期范围内的随机日期执行 100 万次查找(如结果所示).
  5. 启动计时器并使用手动 foreach+break 循环和 Linq 调用执行 100 万次查找(6 次).

如您所见未测量对象实例化.

附录 I:超过百万次查找的结果

这些结果中显示的范围不重叠,这应该会使两种方法更加相似,以防 LINQ 版本在成功匹配时不会中断循环(这很可能会发生).

<前>生成的范围:ID 范围 0000000001111111111222222222233000000001111111111222222222212345678901234567890123456789011234567890123456789012345678909 22.01.-30.01.|-------|08 14.01.-16.01.|-|07 16.02.-19.02.|--|06 15.01.-17.01.|-|05 19.02.-23.02.|---|04 01.01.-07.01.|-----|03 02.01.-10.01.|-------|02 11.01.-13.01.|-|01 16.01.-20.01.|---|00 29.01.-06.02.|-------|查找类...- 迭代:1028ms-Linq:4517 毫秒!!!这就是问题!!!- 位计数器:401ms手动循环...- 迭代:786ms- Linq:981 毫秒- 迭代:787ms- Linq:996 毫秒- 迭代:787ms- Linq:977 毫秒- 迭代:783ms- Linq:979 毫秒

附录二:GitHub:Gist 代码供自己测试

我已经提供了一个 Gist,所以你可以自己获取完整的代码,看看发生了什么.创建一个控制台应用程序并将Program.cs复制到其中,并添加属于此要点的其他文件.

这里获取它.

附录 III:最终想法和测量测试

最有问题的当然是 LINQ 实现速度非常慢.事实证明,这完全与委托编译器优化有关.LukeH 提供了最好和最有用的解决方案,这实际上让我尝试了不同的方法.我在 GetItem 方法(或 GetPointData,因为它在 Gist 中被调用)中尝试了各种不同的方法:

  1. 大多数开发人员通常会采用的方式(并且也在 Gist 中实现,并且在结果显示这不是最好的方式后没有更新):

    return this.items.FirstOrDefault(item => item.IsWithinRange(day));

  2. 通过定义一个局部谓词变量:

    Func谓词 = 项目 =>item.IsWithinRange(day);返回 this.items.FirstOrDefault(predicate);

  3. 本地谓词构建器:

    Func>建造者 = d =>项目 =>item.IsWithinRange(d);返回 this.items.FirstOrDefault(builder(day));

  4. 本地谓词构建器和本地谓词变量:

    Func>建造者 = d =>项目 =>item.IsWithinRange(d);Func<TItem, bool>谓词 = builder(day);返回 this.items.FirstOrDefault(predicate);

  5. 类级别(静态或实例)谓词构建器:

    return this.items.FirstOrDefault(classLevelBuilder(day));

  6. 外部定义的谓词并作为方法参数提供

    public TItem GetItem(Func谓词){返回 this.items.FirstOrDefault(predicate);}

    在执行此方法时,我也采用了两种方法:

    1. 谓词直接在 for 循环内的方法调用中提供:

      for (int i = 0; i <1000000; i++){linqLookup.GetItem(item => item.IsWithinRange(GetRandomDay()));}

    2. for 循环之外定义的谓词构建器:

      Func>建造者 = d =>r=>r.IsWithinRange(d);for (int i = 0; i <1000000; i++){linqLookup.GetItem(builder(GetRandomDay()));}

结果 - 什么表现最好

为了比较使用迭代类时,它需要大约.770 毫秒对随机生成的范围执行 100 万次查找.

  1. 3 本地谓词构建器被证明是最好的编译器优化,因此它的执行速度几乎与通常的迭代一样快.800 毫秒.

  2. 6.2 谓词构建器在 for 循环之外定义:885ms

  3. for 循环中定义的 6.1 谓词:1525ms

  4. 所有其他时间都在 4200 毫秒 - 4360 毫秒之间,因此被认为无法使用.

因此,每当您在外部频繁调用的方法中使用谓词时,请定义一个构建器并执行它.这将产生最佳结果.

对此我最大的惊讶是委托(或谓词)可能会耗费这么多时间.

解决方案

对于 Gabe 的回答,我可以确认差异似乎是由每次调用 GetPointData 时重新构建委托的成本造成的.

如果我在您的 IterationRangeLookupSingle 类中的 GetPointData 方法中添加一行,那么它会减慢到与 LinqRangeLookupSingle 相同的爬行速度.试试看:

//在 IterationRangeLookupSingle公共 TItem GetPointData(DateTime 点){//只是一行,这个委托从来没有被使用过Func<TItem, bool>虚拟 = i =>i.IsWithinRange(point);//该方法的其余部分与之前完全相同//...}

(我不知道为什么编译器和/或抖动不能忽略我上面添加的多余的委托.显然,在您的 LinqRangeLookupSingle 中,委托是必要的代码>类.)

一种可能的解决方法是在 LinqRangeLookupSingle 中组合谓词,以便将 point 作为参数传递给它.这意味着委托只需要被构造一次,而不是每次GetPointData 方法被调用时.例如,以下更改将加快 LINQ 版本的速度,使其与 foreach 版本非常相似:

//在 LinqRangeLookupSingle公共 TItem GetPointData(DateTime 点){Func建造者 = x =>y =>y.IsWithinRange(x);Func<TItem, bool>谓词 = builder(point);返回 this.items.FirstOrDefault(predicate);}

I have two classes that perform date date range data fetching for particular days.

public class IterationLookup<TItem>
{
    private IList<Item> items = null;

    public IterationLookup(IEnumerable<TItem> items, Func<TItem, TKey> keySelector)
    {
        this.items = items.OrderByDescending(keySelector).ToList();
    }

    public TItem GetItem(DateTime day)
    {
        foreach(TItem i in this.items)
        {
           if (i.IsWithinRange(day))
           {
               return i;
           }
        }
        return null;
    }
}


public class LinqLookup<TItem>
{
    private IList<Item> items = null;

    public IterationLookup(IEnumerable<TItem> items, Func<TItem, TKey> keySelector)
    {
        this.items = items.OrderByDescending(keySelector).ToList();
    }

    public TItem GetItem(DateTime day)
    {
        return this.items.FirstOrDefault(i => i.IsWithinRange(day));
    }
}

Then I do speed tests which show that Linq version is about 5 times slower. This would make sense when I would store items locally without enumerating them using ToList. This would make it much slower, because with every call to FirstOrDefault, linq would also perform OrderByDescending. But that's not the case so I don't really know what's going on. Linq should perform very similar to iteration.

This is the code excerpt that measures my timings

IList<RangeItem> ranges = GenerateRanges(); // returns List<T>

var iterLookup = new IterationLookup<RangeItems>(ranges, r => r.Id);
var linqLookup = new LinqLookup<RangeItems>(ranges, r => r.Id);

Stopwatch timer = new Stopwatch();

timer.Start();
for(int i = 0; i < 1000000; i++)
{
    iterLookup.GetItem(GetRandomDay());
}
timer.Stop();
// display elapsed time

timer.Restart();
for(int i = 0; i < 1000000; i++)
{
    linqLookup.GetItem(GetRandomDay());
}
timer.Stop();
// display elapsed time

Why do I know it should perform better? Because when I write a very similar code without using these lookup classes, Linq performs very similar to foreach iterations...

// continue from previous code block

// items used by both order as they do in classes as well
IList<RangeItem> items = ranges.OrderByDescending(r => r.Id).ToList();

timer.Restart();
for(int i = 0; i < 1000000; i++)
{
    DateTime day = GetRandomDay();
    foreach(RangeItem r in items)
    {
        if (r.IsWithinRange(day))
        {
            // RangeItem result = r;
            break;
        }
    }
}    
timer.Stop();
// display elapsed time

timer.Restart();
for(int i = 0; i < 1000000; i++)
{
   DateTime day = GetRandomDay();
   items.FirstOrDefault(i => i.IsWithinRange(day));
}
timer.Stop();
// display elapsed time

This is by my opinion highly similar code. FirstOrDefault as much as I know also iterate for only as long until it gets to a valid item or until it reaches the end. And this is somehow the same as foreach with break.

But even iteration class performs worse than my simple foreach iteration loop which is also a mystery because all the overhead it has is the call to a method within a class compared to direct access.

Question

What am I doing wrong in my LINQ class that it performs so very slow?
What am I doing wrong in my Iteration class so it performs twice as slow as direct foreach loop?

Which times are being measured?

I do these steps:

  1. Generate ranges (as seen below in results)
  2. Create object instances for IterationLookup, LinqLookup (and my optimized date range class BitCountLookup which is not part of discussion here)
  3. Start timer and execute 1 million lookups on random days within maximum date range (as seen in results) by using previously instantiated IterationLookup class.
  4. Start timer and execute 1 million lookups on random days within maximum date range (as seen in results) by using previously instantiated LinqLookup class.
  5. Start timer and execute 1 million lookups (6 times) using manual foreach+break loops and Linq calls.

As you can see object instantiation is not measured.

Appendix I: Results over million lookups

Ranges displayed in these results don't overlap, which should make both approaches even more similar in case LINQ version doesn't break loop on successful match (which it highly likely does).

Generated Ranges:

ID Range        000000000111111111122222222223300000000011111111112222222222
                123456789012345678901234567890112345678901234567890123456789
09 22.01.-30.01.                     |-------|
08 14.01.-16.01.             |-|
07 16.02.-19.02.                                              |--|
06 15.01.-17.01.              |-|
05 19.02.-23.02.                                                 |---|
04 01.01.-07.01.|-----|
03 02.01.-10.01. |-------|
02 11.01.-13.01.          |-|
01 16.01.-20.01.               |---|
00 29.01.-06.02.                            |-------|

Lookup classes...

- Iteration: 1028ms
- Linq: 4517ms   !!! THIS IS THE PROBLEM !!!
- BitCounter: 401ms

Manual loops...

- Iter: 786ms
- Linq: 981ms
- Iter: 787ms
- Linq: 996ms
- Iter: 787ms
- Linq: 977ms
- Iter: 783ms
- Linq: 979ms

Appendix II: GitHub:Gist code to test for yourself

I've put up a Gist so you can get the full code yourself and see what's going on. Create a Console application and copy Program.cs into it an add other files that are part of this gist.

Grab it here.

Appendix III: Final thoughts and measurement tests

The most problematic thing was of course LINQ implementatino that was awfully slow. Turns out that this has all to do with delegate compiler optimization. LukeH provided the best and most usable solution that actually made me try different approaches to this. I've tried various different approaches in the GetItem method (or GetPointData as it's called in Gist):

  1. the usual way that most of developers would do (and is implemented in Gist as well and wasn't updated after results revealed this is not the best way of doing it):

    return this.items.FirstOrDefault(item => item.IsWithinRange(day));
    

  2. by defining a local predicate variable:

    Func<TItem, bool> predicate = item => item.IsWithinRange(day);
    return this.items.FirstOrDefault(predicate);
    

  3. local predicate builder:

    Func<DateTime, Func<TItem, bool>> builder = d => item => item.IsWithinRange(d);
    return this.items.FirstOrDefault(builder(day));
    

  4. local predicate builder and local predicate variable:

    Func<DateTime, Func<TItem, bool>> builder = d => item => item.IsWithinRange(d);
    Func<TItem, bool> predicate = builder(day);
    return this.items.FirstOrDefault(predicate);
    

  5. class level (static or instance) predicate builder:

    return this.items.FirstOrDefault(classLevelBuilder(day));
    

  6. externally defined predicate and provided as method parameter

    public TItem GetItem(Func<TItem, bool> predicate)
    {
        return this.items.FirstOrDefault(predicate);
    }
    

    when executing this method I also took two approaches:

    1. predicate provided directly at method call within for loop:

      for (int i = 0; i < 1000000; i++)
      {
          linqLookup.GetItem(item => item.IsWithinRange(GetRandomDay()));
      }
      

    2. predicate builder defined outside for loop:

      Func<DateTime, Func<Ranger, bool>> builder = d => r => r.IsWithinRange(d);
      for (int i = 0; i < 1000000; i++)
      {
          linqLookup.GetItem(builder(GetRandomDay()));
      }
      

Results - what performs best

For comparison when using iteration class, it takes it approx. 770ms to execute 1 million lookups on randomly generated ranges.

  1. 3 local predicate builder turns out to be best compiler optimized so it performs almost as fast as usual iterations. 800ms.

  2. 6.2 predicate builder defined outside for loop: 885ms

  3. 6.1 predicate defined within for loop: 1525ms

  4. all others took somewhere between 4200ms - 4360ms and are thus considered unusable.

So whenever you use a predicate in externally frequently callable method, define a builder and execute that. This will yield best results.

The biggest surprise to me about this is that delegates (or predicates) may be this much time consuming.

解决方案

Further to Gabe's answer, I can confirm that the difference appears to be caused by the cost of re-constructing the delegate for every call to GetPointData.

If I add a single line to the GetPointData method in your IterationRangeLookupSingle class then it slows right down to the same crawling pace as LinqRangeLookupSingle. Try it:

// in IterationRangeLookupSingle<TItem, TKey>
public TItem GetPointData(DateTime point)
{
    // just a single line, this delegate is never used
    Func<TItem, bool> dummy = i => i.IsWithinRange(point);

    // the rest of the method remains exactly the same as before
    // ...
}

(I'm not sure why the compiler and/or jitter can't just ignore the superfluous delegate that I added above. Obviously, the delegate is necessary in your LinqRangeLookupSingle class.)

One possible workaround is to compose the predicate in LinqRangeLookupSingle so that point is passed to it as an argument. This means that the delegate only needs to be constructed once, not every time the GetPointData method is called. For example, the following change will speed up the LINQ version so that it's pretty much comparable with the foreach version:

// in LinqRangeLookupSingle<TItem, TKey>
public TItem GetPointData(DateTime point)
{
    Func<DateTime, Func<TItem, bool>> builder = x => y => y.IsWithinRange(x);
    Func<TItem, bool> predicate = builder(point);

    return this.items.FirstOrDefault(predicate);
}

这篇关于foreach + break vs linq FirstOrDefault 性能差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆