Linq到对象:内部查询性能 [英] Linq to objects: inner query performance

查看:64
本文介绍了Linq到对象:内部查询性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

During answering on one of questions I saw 2 examples of LINQ code which should work exactly same. But I was wonder about performance, and found that one code much faster that another code. And I cannot understand why.

我从问题中获取了数据结构

I took datastructures from question

public struct Strc
{
    public decimal A;
    public decimal B;
    // more stuff
}

public class CLASS
{
    public List<Strc> listStrc = new List<Strc>();
    // other stuff
}

然后我编写了简单的基准测试(用于 benchmarkdotnet 库)

then I wrote simple benchmark tests (used benchmarkdotnet library)

UPD 我包括了所有要求的测试

UPD I included all tests which was requested

public class TestCases
{
    private Dictionary<string, CLASS> dict;

    public TestCases()
    {
        var m = 100;
        var n = 100;

        dict = Enumerable.Range(0, m)
                .Select(x => new CLASS()
                {
                    listStrc = Enumerable.Range(0, n)
                        .Select(y => new Strc() { A = y % 4, B = y }).ToList()
                })
                .ToDictionary(x => Guid.NewGuid().ToString(), x => x);
    }

大于3个测试

    [Benchmark]
    public void TestJon_Gt3()
    {
        var result = dict.Values
            .SelectMany(x => x.listStrc)
            .Where(ls => ls.A > 3)
            .Select(ls => ls.B).ToArray();
    }

    [Benchmark]
    public void TestTym_Gt3()
    {
        var result = dict.Values
                .SelectMany(x => x.listStrc.Where(l => l.A > 3))
                .Select(x => x.B).ToArray();
    }


    [Benchmark]
    public void TestDasblinkenlight_Gt3()
    {
        var result = dict.Values
            .SelectMany(x => x.listStrc.Select(v => v))
            .Where(l => l.A > 3)
            .Select(ls => ls.B).ToArray();
    }


    [Benchmark]
    public void TestIvan_Gt3()
    {
        var result = dict.Values
            .SelectMany(x => x.listStrc.Where(l => l.A > 3).Select(l => l.B))
            .ToArray();
    }

返回真实测试

    [Benchmark]
    public void TestJon_True()
    {
        var result = dict.Values
            .SelectMany(x => x.listStrc)
            .Where(ls => true)
            .Select(ls => ls.B).ToArray();
    }

    [Benchmark]
    public void TestTym_True()
    {
        var result = dict.Values
                .SelectMany(x => x.listStrc.Where(l => true))
                .Select(x => x.B).ToArray();
    }

    [Benchmark]
    public void TestDasblinkenlight_True()
    {
        var result = dict.Values
            .SelectMany(x => x.listStrc.Select(v => v))
            .Where(ls => true)
            .Select(ls => ls.B).ToArray();
    }


    [Benchmark]
    public void TestIvan_True()
    {
        var result = dict.Values
            .SelectMany(x => x.listStrc.Where(l => true).Select(l => l.B))
            .ToArray();
    }
}

我进行了这些测试

static void Main(string[] args)
{
    var summary = BenchmarkRunner.Run<TestCases>();        
}

得到了结果

// * Summary *

BenchmarkDotNet=v0.10.9, OS=Windows 7 SP1 (6.1.7601)
Processor=Intel Core i7-4770 CPU 3.40GHz (Haswell), ProcessorCount=8
Frequency=3312841 Hz, Resolution=301.8557 ns, Timer=TSC
  [Host]     : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.6.1076.0
  DefaultJob : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.6.1076.0


                   Method |       Mean |      Error |     StdDev |
------------------------- |-----------:|-----------:|-----------:|
              TestJon_Gt3 |   655.1 us |  1.3408 us |  1.2542 us |
              TestTym_Gt3 |   353.1 us | 12.9535 us | 10.8167 us |
  TestDasblinkenlight_Gt3 |   943.9 us |  1.9563 us |  1.7342 us |
             TestIvan_Gt3 |   352.6 us |  0.7216 us |  0.6397 us |
             TestJon_True |   801.8 us |  2.7194 us |  2.2708 us |
             TestTym_True | 1,055.8 us |  3.0912 us |  2.7403 us |
 TestDasblinkenlight_True | 1,090.6 us |  2.3084 us |  2.1593 us |
            TestIvan_True |   677.7 us |  3.0427 us |  2.8461 us |

// * Hints *
Outliers
  TestCases.TestTym_Gt3: Default             -> 2 outliers were removed
  TestCases.TestDasblinkenlight_Gt3: Default -> 1 outlier  was  removed
  TestCases.TestIvan_Gt3: Default            -> 1 outlier  was  removed
  TestCases.TestJon_True: Default            -> 2 outliers were removed
  TestCases.TestTym_True: Default            -> 1 outlier  was  removed

// * Legends *
  Mean   : Arithmetic mean of all measurements
  Error  : Half of 99.9% confidence interval
  StdDev : Standard deviation of all measurements
  1 us   : 1 Microsecond (0.000001 sec)

我尝试更改初始数据(n和m个参数),但结果稳定,每次TestTym都比TestJon快.在所有测试中,TestIvan最快.我只想了解,为什么要更快?或者也许我在测试过程中错了吗?

I tried to change initial data (n and m parameters), but results was stable, TestTym was faster than TestJon each time. And TestIvan is semms fastest from all tests. I just want to understand, why it faster? Or maybe I did smthg wrong during testing?

推荐答案

由于最终两个表达式都过滤掉了所有项目,所以时间差是由于中间迭代器在组合的语句链中返回值的次数不同而引起的.

Since ultimately both expressions filter out all items, the time difference is due to the different number of times an intermediate iterator returns a value in the combined chain of statements.

要了解发生了什么,请考虑从

To understand what is going on consider the implementation of SelectMany from the reference source, with arguments checking removed:

public static IEnumerable<TResult> SelectMany<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, IEnumerable<TResult>> selector) {
    return SelectManyIterator<TSource, TResult>(source, selector);
}
static IEnumerable<TResult> SelectManyIterator<TSource, TResult>(IEnumerable<TSource> source, Func<TSource, IEnumerable<TResult>> selector) {
    foreach (TSource element in source) {
        foreach (TResult subElement in selector(element)) {
            yield return subElement;
        }
    }
}

Select由一系列不同的迭代器实现,这些迭代器基于枚举的集合类型-WhereSelectArrayIteratorWhereSelectListIteratorWhereSelectEnumerableIterator.

Select is implemented with a series of different iterators based on the type of collection being enumerated - WhereSelectArrayIterator, WhereSelectListIterator, or WhereSelectEnumerableIterator.

您的测试代码生成的情况下,A的范围是从零到三(包括三个端点):

Your test code generates cases in which As are in the range from zero to three, inclusive:

Select(y => new Strc() { A = y % 4, B = y })
//                       ^^^^^^^^^

因此,条件Where(ls => ls.A > 3)没有匹配项.

Therefore, condition Where(ls => ls.A > 3) produces no matches.

TestJon示例中,SelectMany内的yield return被命中了10,000次,因为在过滤之前已选择了所有内容.之后,Select使用WhereSelectEnumerableIterator,该匹配找不到任何匹配项.因此,迭代器在两个阶段都返回值的次数为10,000 + 0 = 10,000.

In the TestJon example yield return inside SelectMany is hit 10,000 times, because everything is selected prior to filtering. After that Select uses WhereSelectEnumerableIterator, which finds no matches. The number of times the iterator returns a value in both stages is, therefore, 10,000 + 0 = 10,000.

TestTym在第一个状态期间将所有内容过滤掉. SelectMany得到的IEnumerable为空的IEnumerable s,因此在两个阶段中的任何一个阶段,迭代器返回值的总次数为0 + 0 = 0.

TestTym, on the other hand, filters everything out during the first state. SelectMany gets an IEnumerable of empty IEnumerables, so the combined number of times an iterator returns a value during any of the two stages is 0 + 0 = 0.

我将查询的条件更改为Where(l => true),而Tym现在比Jon慢.为什么?

I changed conditon in queries to Where(l => true), and Tym is now slower than Jon. Why?

现在两个阶段返回的项目总数相同,即10,000 + 10,000 = 20,000.现在,区别在于SelectMany的嵌套循环的运行方式:

Now the total number of items returned in both stages is the same, 10,000 + 10,000 = 20,000. Now the difference comes down to the way the nested loop of SelectMany operates:

foreach (TResult subElement in selector(element)) {
    yield return subElement; //^^^^^^^^^^^^^^^^^
}

Jon情况下,

selector(element)返回List<Strc>.看起来foreach可以弄清楚这一点,并以比Tym情况更少的开销对其进行迭代,从而构造并返回新的迭代器对象.

In in Jon's case selector(element) returns List<Strc>. It looks like foreach figures this out, and iterates over it with less overhead than in Tym's case, which constructs and returns new iterator objects.

Jon中添加Select(v => v)消除了应用此优化的可能性,因此第二次更新的结果在误差范围内.

Adding Select(v => v) to Jon eliminates the possibility to apply this optimization, so the results in the second update are within the margin of error.

这篇关于Linq到对象:内部查询性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆