Linq到对象:内部查询性能 [英] Linq to objects: inner query performance
问题描述
During answering on one of questions I saw 2 examples of LINQ code which should work exactly same. But I was wonder about performance, and found that one code much faster that another code. And I cannot understand why.
我从问题中获取了数据结构
I took datastructures from question
public struct Strc
{
public decimal A;
public decimal B;
// more stuff
}
public class CLASS
{
public List<Strc> listStrc = new List<Strc>();
// other stuff
}
然后我编写了简单的基准测试(用于 benchmarkdotnet 库)
then I wrote simple benchmark tests (used benchmarkdotnet library)
UPD 我包括了所有要求的测试
UPD I included all tests which was requested
public class TestCases
{
private Dictionary<string, CLASS> dict;
public TestCases()
{
var m = 100;
var n = 100;
dict = Enumerable.Range(0, m)
.Select(x => new CLASS()
{
listStrc = Enumerable.Range(0, n)
.Select(y => new Strc() { A = y % 4, B = y }).ToList()
})
.ToDictionary(x => Guid.NewGuid().ToString(), x => x);
}
大于3个测试
[Benchmark]
public void TestJon_Gt3()
{
var result = dict.Values
.SelectMany(x => x.listStrc)
.Where(ls => ls.A > 3)
.Select(ls => ls.B).ToArray();
}
[Benchmark]
public void TestTym_Gt3()
{
var result = dict.Values
.SelectMany(x => x.listStrc.Where(l => l.A > 3))
.Select(x => x.B).ToArray();
}
[Benchmark]
public void TestDasblinkenlight_Gt3()
{
var result = dict.Values
.SelectMany(x => x.listStrc.Select(v => v))
.Where(l => l.A > 3)
.Select(ls => ls.B).ToArray();
}
[Benchmark]
public void TestIvan_Gt3()
{
var result = dict.Values
.SelectMany(x => x.listStrc.Where(l => l.A > 3).Select(l => l.B))
.ToArray();
}
返回真实测试
[Benchmark]
public void TestJon_True()
{
var result = dict.Values
.SelectMany(x => x.listStrc)
.Where(ls => true)
.Select(ls => ls.B).ToArray();
}
[Benchmark]
public void TestTym_True()
{
var result = dict.Values
.SelectMany(x => x.listStrc.Where(l => true))
.Select(x => x.B).ToArray();
}
[Benchmark]
public void TestDasblinkenlight_True()
{
var result = dict.Values
.SelectMany(x => x.listStrc.Select(v => v))
.Where(ls => true)
.Select(ls => ls.B).ToArray();
}
[Benchmark]
public void TestIvan_True()
{
var result = dict.Values
.SelectMany(x => x.listStrc.Where(l => true).Select(l => l.B))
.ToArray();
}
}
我进行了这些测试
static void Main(string[] args)
{
var summary = BenchmarkRunner.Run<TestCases>();
}
得到了结果
// * Summary *
BenchmarkDotNet=v0.10.9, OS=Windows 7 SP1 (6.1.7601)
Processor=Intel Core i7-4770 CPU 3.40GHz (Haswell), ProcessorCount=8
Frequency=3312841 Hz, Resolution=301.8557 ns, Timer=TSC
[Host] : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.6.1076.0
DefaultJob : .NET Framework 4.6.1 (CLR 4.0.30319.42000), 32bit LegacyJIT-v4.6.1076.0
Method | Mean | Error | StdDev |
------------------------- |-----------:|-----------:|-----------:|
TestJon_Gt3 | 655.1 us | 1.3408 us | 1.2542 us |
TestTym_Gt3 | 353.1 us | 12.9535 us | 10.8167 us |
TestDasblinkenlight_Gt3 | 943.9 us | 1.9563 us | 1.7342 us |
TestIvan_Gt3 | 352.6 us | 0.7216 us | 0.6397 us |
TestJon_True | 801.8 us | 2.7194 us | 2.2708 us |
TestTym_True | 1,055.8 us | 3.0912 us | 2.7403 us |
TestDasblinkenlight_True | 1,090.6 us | 2.3084 us | 2.1593 us |
TestIvan_True | 677.7 us | 3.0427 us | 2.8461 us |
// * Hints *
Outliers
TestCases.TestTym_Gt3: Default -> 2 outliers were removed
TestCases.TestDasblinkenlight_Gt3: Default -> 1 outlier was removed
TestCases.TestIvan_Gt3: Default -> 1 outlier was removed
TestCases.TestJon_True: Default -> 2 outliers were removed
TestCases.TestTym_True: Default -> 1 outlier was removed
// * Legends *
Mean : Arithmetic mean of all measurements
Error : Half of 99.9% confidence interval
StdDev : Standard deviation of all measurements
1 us : 1 Microsecond (0.000001 sec)
我尝试更改初始数据(n和m个参数),但结果稳定,每次TestTym都比TestJon快.在所有测试中,TestIvan最快.我只想了解,为什么要更快?或者也许我在测试过程中错了吗?
I tried to change initial data (n and m parameters), but results was stable, TestTym was faster than TestJon each time. And TestIvan is semms fastest from all tests. I just want to understand, why it faster? Or maybe I did smthg wrong during testing?
推荐答案
由于最终两个表达式都过滤掉了所有项目,所以时间差是由于中间迭代器在组合的语句链中返回值的次数不同而引起的.
Since ultimately both expressions filter out all items, the time difference is due to the different number of times an intermediate iterator returns a value in the combined chain of statements.
To understand what is going on consider the implementation of SelectMany
from the reference source, with arguments checking removed:
public static IEnumerable<TResult> SelectMany<TSource, TResult>(this IEnumerable<TSource> source, Func<TSource, IEnumerable<TResult>> selector) {
return SelectManyIterator<TSource, TResult>(source, selector);
}
static IEnumerable<TResult> SelectManyIterator<TSource, TResult>(IEnumerable<TSource> source, Func<TSource, IEnumerable<TResult>> selector) {
foreach (TSource element in source) {
foreach (TResult subElement in selector(element)) {
yield return subElement;
}
}
}
Select
由一系列不同的迭代器实现,这些迭代器基于枚举的集合类型-WhereSelectArrayIterator
,WhereSelectListIterator
或WhereSelectEnumerableIterator
.
Select
is implemented with a series of different iterators based on the type of collection being enumerated - WhereSelectArrayIterator
, WhereSelectListIterator
, or WhereSelectEnumerableIterator
.
您的测试代码生成的情况下,A
的范围是从零到三(包括三个端点):
Your test code generates cases in which A
s are in the range from zero to three, inclusive:
Select(y => new Strc() { A = y % 4, B = y })
// ^^^^^^^^^
因此,条件Where(ls => ls.A > 3)
没有匹配项.
Therefore, condition Where(ls => ls.A > 3)
produces no matches.
在TestJon
示例中,SelectMany
内的yield return
被命中了10,000次,因为在过滤之前已选择了所有内容.之后,Select
使用WhereSelectEnumerableIterator
,该匹配找不到任何匹配项.因此,迭代器在两个阶段都返回值的次数为10,000 + 0 = 10,000.
In the TestJon
example yield return
inside SelectMany
is hit 10,000 times, because everything is selected prior to filtering. After that Select
uses WhereSelectEnumerableIterator
, which finds no matches. The number of times the iterator returns a value in both stages is, therefore, 10,000 + 0 = 10,000.
TestTym
在第一个状态期间将所有内容过滤掉. SelectMany
得到的IEnumerable
为空的IEnumerable
s,因此在两个阶段中的任何一个阶段,迭代器返回值的总次数为0 + 0 = 0.
TestTym
, on the other hand, filters everything out during the first state. SelectMany
gets an IEnumerable
of empty IEnumerable
s, so the combined number of times an iterator returns a value during any of the two stages is 0 + 0 = 0.
我将查询的条件更改为
Where(l => true)
,而Tym
现在比Jon
慢.为什么?
I changed conditon in queries to
Where(l => true)
, andTym
is now slower thanJon
. Why?
现在两个阶段返回的项目总数相同,即10,000 + 10,000 = 20,000.现在,区别在于SelectMany
的嵌套循环的运行方式:
Now the total number of items returned in both stages is the same, 10,000 + 10,000 = 20,000. Now the difference comes down to the way the nested loop of SelectMany
operates:
foreach (TResult subElement in selector(element)) {
yield return subElement; //^^^^^^^^^^^^^^^^^
}
在Jon
情况下,
selector(element)
返回List<Strc>
.看起来foreach
可以弄清楚这一点,并以比Tym
情况更少的开销对其进行迭代,从而构造并返回新的迭代器对象.
In in Jon
's case selector(element)
returns List<Strc>
. It looks like foreach
figures this out, and iterates over it with less overhead than in Tym
's case, which constructs and returns new iterator objects.
在Jon
中添加Select(v => v)
消除了应用此优化的可能性,因此第二次更新的结果在误差范围内.
Adding Select(v => v)
to Jon
eliminates the possibility to apply this optimization, so the results in the second update are within the margin of error.
这篇关于Linq到对象:内部查询性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!