LINQ加入-表现 [英] LINQ Joins - Performance

查看:81
本文介绍了LINQ加入-表现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很好奇LINQ(不是LINQ to SQL)在后台执行的连接与Sql Server执行连接的方式究竟有何关系.

I am curious on how exactly LINQ (not LINQ to SQL) is performing is joins behind the scenes in relation to how Sql Server performs joins.

在执行查询之前,Sql Server会生成一个执行计划.执行计划基本上是一个表达式树,它认为这是执行查询的最佳方法.每个节点都提供有关是否执行排序,扫描,选择,联接等的信息.

Sql Server before executing a query, generates an Execution Plan. The Execution Plan is basically an Expression Tree on what it believes is the best way to execute the query. Each node provides information on whether to do a Sort, Scan, Select, Join, ect.

在执行计划的加入"节点上,我们可以看到三种可能的算法:哈希联接,合并联接和嵌套循环联接. Sql Server将根据内部表和外部表中的预期行数,我们正在执行的联接类型(某些算法不支持所有联接类型),是否需要对数据排序来为每个联接操作选择哪种算法.可能还有许多其他因素.

On a 'Join' node in our execution plan, we can see three possible algorithms; Hash Join, Merge Join, and Nested Loops Join. Sql Server will choose which algorithm to for each Join operation based on expected number of rows in Inner and Outer tables, what type of join we are doing (some algorithms don't support all types of joins), whether we need data ordered, and probably many other factors.

加入算法:

嵌套循环加入: 最适合小型输入,可以通过有序内部表进行优化.

Nested Loop Join: Best for small inputs, can be optimized with ordered inner table.

合并加入: 最适合大中型输入,已排序的输入或需要订购的输出.

Merge Join: Best for medium to large inputs sorted inputs, or an output that needs to be ordered.

哈希加入: 最适合大中型输入,可以并行进行线性缩放.

Hash Join: Best for medium to large inputs, can be parallelized to scale linearly.

LINQ查询:

DataTable  firstTable, secondTable;

...

var rows = from firstRow in firstTable.AsEnumerable ()
                join secondRow in secondTable.AsEnumerable ()
                    on firstRow.Field<object> (randomObject.Property)
                    equals secondRow.Field<object> (randomObject.Property)
           select new {firstRow, secondRow};

SQL查询:

SELECT *
FROM firstTable fT
    INNER JOIN secondTable sT ON fT.Property = sT.Property

如果Sql Server知道每个表中的行数较少,则可能使用嵌套循环联接;如果知道表中的一个表具有索引,则可能使用合并联接;如果知道有很多表,则使用Hash联接.任一表上的行,但都没有索引.

Sql Server might use a Nested Loop Join if it knows there are a small number of rows from each table, a merge join if it knows one of the tables has an index, and Hash join if it knows there are a lot of rows on either table and neither has an index.

Linq是否选择其加入算法?还是总是使用一个?

Does Linq choose its algorithm for joins? or does it always use one?

推荐答案

Linq to SQL不会将连接提示发送到服务器.因此,使用Linq to SQL的联接的性能将与直接"发送到服务器(即使用纯ADO或SQL Server Management Studio)的同一联接的性能相同,而未指定任何提示.

Linq to SQL does not send join hints to the server. Thus the performance of a join using Linq to SQL will be identical to the performance of the same join sent "directly" to the server (i.e. using pure ADO or SQL Server Management Studio) without any hints specified.

Linq to SQL也不允许您使用连接提示(据我所知).因此,如果要强制执行特定类型的联接,则必须使用存储过程或Execute[Command|Query]方法进行联接.但是,除非您通过编写INNER [HASH|LOOP|MERGE] JOIN指定联接类型,否则SQL Server始终会选择它认为最有效的联接类型-与查询的来源无关.

Linq to SQL also doesn't allow you to use join hints (as far as I know). So if you want to force a specific type of join, you'll have to do it using a stored procedure or the Execute[Command|Query] method. But unless you specify a join type by writing INNER [HASH|LOOP|MERGE] JOIN, then SQL Server always picks the type of join it thinks will be most efficient - it doesn't matter where the query came from.

其他Linq查询提供程序(例如Entity Framework和NHibernate Linq)将执行与Linq to SQL完全相同的操作.这些都没有直接了解如何为数据库建立索引的信息,因此它们都没有发送连接提示.

Other Linq query providers - such as Entity Framework and NHibernate Linq - will do exactly the same thing as Linq to SQL. None of these have any direct knowledge of how you've indexed your database and so none of them send join hints.

Linq to Objects有点不同-在SQL Server中,它将(几乎?)始终执行哈希联接".那是因为它缺少进行合并联接所必需的索引,并且哈希联接通常比嵌套循环更有效,除非元素的数量很小.但是确定IEnumerable<T>中的元素数量可能首先需要进行完整的迭代,因此在大多数情况下,假设最坏的情况并使用哈希算法会更快.

Linq to Objects is a little different - it will (almost?) always perform a "hash join" in SQL Server parlance. That is because it lacks the indexes necessary to do a merge join, and hash joins are usually more efficient than nested loops, unless the number of elements is very small. But determining the number of elements in an IEnumerable<T> might require a full iteration in the first place, so in most cases it's faster just to assume the worst and use a hashing algorithm.

这篇关于LINQ加入-表现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆