应在LINQ查询子句的顺序会影响实体框架的性能? [英] Should the order of LINQ query clauses affect Entity Framework performance?

查看:110
本文介绍了应在LINQ查询子句的顺序会影响实体框架的性能?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是实体框架(code第一次)和发现的顺序我在LINQ查询指定的条款是有巨大的性能影响,因此,例如:

I'm using Entity Framework (code first) and finding the order I specify clauses in my LINQ queries is having a huge performance impact, so for example:

using (var db = new MyDbContext())
{
    var mySize = "medium";
    var myColour = "vermilion";
    var list1 = db.Widgets.Where(x => x.Colour == myColour && x.Size == mySize).ToList();
    var list2 = db.Widgets.Where(x => x.Size == mySize && x.Colour == myColour).ToList();
}

凡(罕见)色条precedes的(通用)的大小子句它的速度快,但反过来想这是数量级的速度较慢。该表有几百万行,有问题的两个字段为nvarchar(50),所以不归,但他们每个索引。该字段指定在code第一次时装如下:

Where the (rare) colour clause precedes the (common) size clause it's fast, but the other way round it's orders of magnitude slower. The table has a couple of million rows and the two fields in question are nvarchar(50), so not normalised but they are each indexed. The fields are specified in a code first fashion as follows:

    [StringLength(50)]
    public string Colour { get; set; }

    [StringLength(50)]
    public string Size { get; set; }

我真的应该担心在我的LINQ查询这样的事情,我认为这是数据库的工作?

Am I really supposed to have to worry about such things in my LINQ queries, I thought that was the database's job?

制度规范是:

  • 在Visual Studio 2010中
  • 在.NET 4
  • 的EntityFramework 6.0.0-β1
  • 在SQL Server 2008 R2中的Web(64位)

权,任何馋惩罚的效果可以复制如下。这个问题似乎是极其敏感的一些因素所以请原谅一些这样做作的性质:

Right, to any gluttons for punishment the effect can be replicated as below. The issue seems to be tremendously sensitive to a number of factors so please bear with the contrived nature of some of this:

通过的NuGet安装的EntityFramework 6.0.0-β1,然后生成code首先风格:

Install EntityFramework 6.0.0-beta1 via nuget, then generate code first style with:

public class Widget
{
    [Key]
    public int WidgetId { get; set; }

    [StringLength(50)]
    public string Size { get; set; }

    [StringLength(50)]
    public string Colour { get; set; }
}


public class MyDbContext : DbContext
{
    public MyDbContext()
        : base("DefaultConnection")
    {
    }

    public DbSet<Widget> Widgets { get; set; }
}


生成与以下SQL虚设数据:


Generate the dummy data with the following SQL:

insert into gadget (Size, Colour)
select RND1 + ' is the name is this size' as Size,
RND2 + ' is the name of this colour' as Colour
from (Select top 1000000
CAST(abs(Checksum(NewId())) % 100 as varchar) As RND1,
CAST(abs(Checksum(NewId())) % 10000 as varchar) As RND2
from master..spt_values t1 cross join master..spt_values t2) t3


添加每一个索引颜色和大小,然后用查询:


Add one index each for Colour and Size, then query with:

string mySize = "99 is the name is this size";
string myColour = "9999 is the name of this colour";
using (var db = new WebDbContext())
{
    var list1= db.Widgets.Where(x => x.Colour == myColour && x.Size == mySize).ToList();
}
using (var db = new WebDbContext())
{
    var list2 = db.Widgets.Where(x => x.Size == mySize && x.Colour == myColour).ToList();
}


这个问题似乎与生成的SQL NULL比较的钝收集连接,如下图所示。


The issue seems connected with the obtuse collection of NULL comparisons in the generated SQL, as below.

exec sp_executesql N'SELECT 
[Extent1].[WidgetId] AS [WidgetId], 
[Extent1].[Size] AS [Size], 
[Extent1].[Colour] AS [Colour]
FROM [dbo].[Widget] AS [Extent1]
WHERE ((([Extent1].[Size] = @p__linq__0) 
AND ( NOT ([Extent1].[Size] IS NULL OR @p__linq__0 IS NULL))) 
OR (([Extent1].[Size] IS NULL) AND (@p__linq__0 IS NULL))) 
AND ((([Extent1].[Colour] = @p__linq__1) AND ( NOT ([Extent1].[Colour] IS NULL 
OR @p__linq__1 IS NULL))) OR (([Extent1].[Colour] IS NULL) 
AND (@p__linq__1 IS NULL)))',N'@p__linq__0 nvarchar(4000),@p__linq__1 nvarchar(4000)',
@p__linq__0=N'99 is the name is this size',
@p__linq__1=N'9999 is the name of this colour'
go


更改相等操作中的LINQ to StartWith()使问题消失,因为没有任何变化这两个领域之一是非可空的数据库。


Changing the equality operator in the LINQ to StartWith() makes the problem go away, as does changing either one of the two fields to be non nullable at the database.

我绝望了!

有任何赏金猎人一些帮助,这个问题可以被复制的SQL Server 2008 R2的Web(64位)在一个干净的数据库,如下所示:

Some assistance for any bounty hunters, the issue can be reproduced on SQL Server 2008 R2 Web (64 bit) in a clean database, as follows:

CREATE TABLE [dbo].[Widget](
    [WidgetId] [int] IDENTITY(1,1) NOT NULL,
    [Size] [nvarchar](50) NULL,
    [Colour] [nvarchar](50) NULL,
 CONSTRAINT [PK_dbo.Widget] PRIMARY KEY CLUSTERED 
(
    [WidgetId] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX IX_Widget_Size ON dbo.Widget
    (
    Size
    ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX IX_Widget_Colour ON dbo.Widget
    (
    Colour
    ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


insert into Widget (Size, Colour)
select RND1 + ' is the name is this size' as Size,
RND2 + ' is the name of this colour' as Colour
from (Select top 1000000
CAST(abs(Checksum(NewId())) % 100 as varchar) As RND1,
CAST(abs(Checksum(NewId())) % 10000 as varchar) As RND2
from master..spt_values t1 cross join master..spt_values t2) t3
GO

,然后比较以下两个查询的相对表现(您可能需要调整的参数测试值,为了得到它返回一对夫妇行,以观察疗效的查询,即第二个查询ID慢得多)。

and then compare the relative performance of the following two queries (you may need to adjust the parameter test values in order to get a query which returns a couple of rows in order to observe the effect, i.e. the second query id much slower).

exec sp_executesql N'SELECT 
[Extent1].[WidgetId] AS [WidgetId], 
[Extent1].[Size] AS [Size], 
[Extent1].[Colour] AS [Colour]
FROM [dbo].[Widget] AS [Extent1]
WHERE ((([Extent1].[Colour] = @p__linq__0) 
AND ( NOT ([Extent1].[Colour] IS NULL 
OR @p__linq__0 IS NULL))) 
OR (([Extent1].[Colour] IS NULL) 
AND (@p__linq__0 IS NULL))) 
AND ((([Extent1].[Size] = @p__linq__1) 
AND ( NOT ([Extent1].[Size] IS NULL 
OR @p__linq__1 IS NULL))) 
OR (([Extent1].[Size] IS NULL) AND (@p__linq__1 IS NULL)))',
N'@p__linq__0 nvarchar(4000),@p__linq__1 nvarchar(4000)',
@p__linq__0=N'9999 is the name of this colour',
@p__linq__1=N'99 is the name is this size'
go

exec sp_executesql N'SELECT 
[Extent1].[WidgetId] AS [WidgetId], 
[Extent1].[Size] AS [Size], 
[Extent1].[Colour] AS [Colour]
FROM [dbo].[Widget] AS [Extent1]
WHERE ((([Extent1].[Size] = @p__linq__0) 
AND ( NOT ([Extent1].[Size] IS NULL 
OR @p__linq__0 IS NULL))) 
OR (([Extent1].[Size] IS NULL) 
AND (@p__linq__0 IS NULL))) 
AND ((([Extent1].[Colour] = @p__linq__1) 
AND ( NOT ([Extent1].[Colour] IS NULL 
OR @p__linq__1 IS NULL))) 
OR (([Extent1].[Colour] IS NULL) 
AND (@p__linq__1 IS NULL)))',
N'@p__linq__0 nvarchar(4000),@p__linq__1 nvarchar(4000)',
@p__linq__0=N'99 is the name is this size',
@p__linq__1=N'9999 is the name of this colour'

您还可能会发现,像我一样,如果你重新运行该虚拟数据的插入,使现在有两个百万行,问题消失了。

You may also find, as I do, that if you rerun the dummy data insert so that there are now two million rows, the problem goes away.

推荐答案

问题的核心不是为什么订单此事与LINQ?。 LINQ只是字面翻译没有重新排序。真正的问题是为什么这两个SQL查询有不同的表现?

The core of the question is not "why does the order matter with LINQ?". LINQ just translates literally without reordering. The real question is "why do the two SQL queries have different performance?".

我可以只插入10万行重现该问题。在这种情况下,在优化的弱点被触发:它不承认,它可以做一个寻求颜色由于复杂的情况。在第一个查询优化器会识别模式并创建一个索引查找。

I was able to reproduce the problem by only inserting 100k rows. In that case a weakness in the optimizer is being triggered: it does not recognize that it can do a seek on Colour due to the complex condition. In the first query the optimizer does recognize the pattern and creates an index seek.

没有语义的原因,这应该是。一个寻求一个指标是 NULL 求,即使在可能的情况。这是优化的一个弱点/错误。这里有两个计划:

There is no semantic reason why this should be. A seek on an index is possible even when seeking on NULL. This is a weakness/bug in the optimizer. Here are the two plans:

EF尝试是有益的这里,因为它假定列和过滤器变量可以为空。在这种情况下,它试图给你一个匹配(它根据C#语法是正确的)。

EF tries to be helpful here because it assumes that both the column and the filter variable can be null. In that case it tries to give you a match (which according to C# semantics is the right thing).

我试图撤消通过添加以下过滤器:

I tried undoing that by adding the following filter:

Colour IS NOT NULL AND @p__linq__0 IS NOT NULL
AND Size IS NOT NULL AND @p__linq__1 IS NOT NULL

希望优化现在使用这些知识来简化复杂的EF过滤器前pression。它没能做到这一点。如果这工作相同的过滤器可能被添加到EF查询提供了一个简单的办法。

Hoping that the optimizer now uses that knowledge to simplify the complex EF filter expression. It did not manage to do so. If this had worked the same filter could have been added to the EF query providing an easy fix.

下面是修复了我推荐的顺序,你应该尝试它们:

Here are the fixes the I recommend in the order that you should try them:

  1. 请在数据库列不空数据库
  2. 请列不是空的EF数据模型,希望这将prevent EF从创建复杂的过滤条件
  3. 创建索引:颜色,大小和/或尺寸,颜色。他们还删除它们的问题。
  4. 确保过滤以正确的顺序完成,并留下code注释
  5. 尝试使用 INTERSECT / Queryable.Intersect 组合过滤器。这通常会导致不同的俯视形状。
  6. 创建一个执行过滤内嵌表值函数。 EF可以使用这样的功能作为一个更大的查询的一部分
  7. 在下降到原始的SQL
  8. 使用计划指南改变计划
  1. Make the database columns not-null in the database
  2. Make the columns not-null in the EF data model hoping that this will prevent EF from creating the complex filter condition
  3. Create indexes: Colour, Size and/or Size, Colour. They also remove them problem.
  4. Ensure that the filtering is done in the right order and leave a code comment
  5. Try to use INTERSECT/Queryable.Intersect to combine the filters. This often results in different plan shapes.
  6. Create an inline table-valued function that does the filtering. EF can use such a function as part of a bigger query
  7. Drop down to raw SQL
  8. Use a plan guide to change the plan

所有这些都是解决方法,而不是根本原因修复。

All of these are workarounds, not root cause fixes.

在最后,我很不满意,SQL Server和EF在这里。这两款产品应该是固定的。唉,他们很可能不会,你不能为任何等待。

In the end I am not happy with both SQL Server and EF here. Both products should be fixed. Alas, they likely won't be and you can't wait for that either.

下面是索引脚本:

CREATE NONCLUSTERED INDEX IX_Widget_Colour_Size ON dbo.Widget
    (
    Colour, Size
    ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE NONCLUSTERED INDEX IX_Widget_Size_Colour ON dbo.Widget
    (
   Size, Colour
    ) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

这篇关于应在LINQ查询子句的顺序会影响实体框架的性能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆