实体框架查询性能与原始SQL执行不同 [英] Entity Framework query performance differs extrem with raw SQL execution

查看:447
本文介绍了实体框架查询性能与原始SQL执行不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



架构



我有一个这样的表格结构:

  CREATE TABLE [dbo]。[DataLogger] 
b $ b [ID] [bigint] IDENTITY(1,1)NOT NULL,
[ProjectID] [bigint] NULL,
CONSTRAINT [PrimaryKey1] PRIMARY KEY CLUSTERED([ID] ASC)


CREATE TABLE [dbo]。[DCDistributionBox]

[ID] [bigint] IDENTITY(1,1)NOT NULL,
[DataLoggerID] [bigint] NOT NULL,
CONSTRAINT [PrimaryKey2] PRIMARY KEY CLUSTERED([ID] ASC)


ALTER TABLE [dbo]。[DCDistributionBox]
添加CONSTRAINT [FK_DCDistributionBox_DataLogger]
FOREIGN KEY([DataLoggerID])REFERENCES [dbo]。[DataLogger]([ID])

CREATE TABLE [dbo]。[DCString]

[ID] [bigint] IDENTITY(1,1)NOT NULL,
[DCDistributionBoxID] [bigint] NOT NULL,
[CurrentMPP] [decimal](18,2)NULL,
CONSTRAINT [PrimaryKey3] PRIMARY KEY CLUSTERED([ID] ASC)


ALTER TABLE [dbo]。[DCString]
添加约束[FK_DCString_DCDistributionBox]
FOREIGN KEY ([DCDistributionBoxID])REFERENCES [dbo]。[DCDistributionBox]([ID])

CREATE TABLE [dbo]。[StringData]

[DCStringID] [bigint] NOT NULL,
[TimeStamp] [datetime] NOT NULL,
[DCCurrent] [decimal](18,2)NULL,
CONSTRAINT [PrimaryKey4] PRIMARY KEY CLUSTERED([TimeStamp] DESC, [DCStringID] ASC)


CREATE NONCLUSTERED INDEX [TimeStamp_DCCurrent-NonClusteredIndex]
ON [dbo]。[StringData]([DCStringID] ASC,[TimeStamp] ASC)
INCLUDE([DCCurrent])

外键上的标准索引也存在(我不'



[StringData] 表具有以下存储空间:数据空间:26,901.86 MB

  • 行数:13分区数:62



  • 使用



    我现在想将数据分组到 [StringData] 表并做一些聚合。



    我创建了一个实体框架查询(详细的信息查询可以找到



    更新6 (与@usr的答案相关):



    我通过SQL Server创建了两个showplan XML Profiler。



    快速运行(SSMS).SQLPlan



    慢跑(EF).SQLPlan



    更新7 (与@VladimirBaranov的评论相关):



    我现在运行一些更多的测试用例与您的意见相关。



    首先,我通过使用新的计算列和匹配的INDEXER来终止订单操作。这减少了与 DATEADD(MINUTE,DATEDIFF(MINUTE,0,[TimeStamp])/ 15 * 15,0)相关的性能滞后。有关如何以及为何可以找到此处的详细信息。



    结果外观如下所示:



    纯实体框架查询:

      for i = 0; i <3; i ++)
    {
    DateTime begin = DateTime.UtcNow;
    var result = model.StringDatas
    .AsNoTracking()
    .Where(p => p.DCString.DCDistributionBox.DataLogger.ProjectID == projectID&& p.TimeStamp15Minutes> = fromDate&& p.TimeStamp15Minutes< tillDate)
    .Select(d => new
    {
    TimeStamp = d.TimeStamp15Minutes,
    DCCurrentMpp = d.DCCurrent / d.DCString.CurrentMPP
    })
    .GroupBy(d => d.TimeStamp)
    .Select(d => new
    {
    TimeStamp = d .Key,
    DCCurrentMppMin = d.Min(v => v.DCCurrentMpp),
    DCCurrentMppMax = d.Max(v => v.DCCurrentMpp),
    DCCurrentMppAvg = d.Average (v => v.DCCurrentMpp),
    DCCurrentMppStDev = DbFunctions.StandardDeviationP(d.Select(v => v.DCCurrentMpp))
    })
    .ToList();

    TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
    Debug.WriteLine({0}运行纯EF:{1},我,excecutionTimeSpan.ToString());
    }

    0运行纯EF: 00:00:12.6460624



    第1次运行纯EF: 00:00:11.0258393



    第2次运行纯EF: 00:00:08.4171044



    我现在使用EF生成的SQL作为SQL查询:


    (int i = 0; i <3; i ++)

      
    {
    DateTime begin = DateTime.UtcNow;
    int result = model.Database.ExecuteSqlCommand(@SELECT
    1 AS [C1],
    [GroupBy1]。[K1] AS [TimeStamp15Minutes],
    [GroupBy1]。 [A1] AS [C2],
    [GroupBy1]。[A2] AS [C3],
    [GroupBy1]。[A3] AS [C4],
    [GroupBy1] ] AS [C5]
    FROM(SELECT
    [Project1]。[TimeStamp15Minutes] AS [K1],
    MIN([Project1]。[C1])AS [A1],
    =([Project1]。[C1])AS [A2],
    AVG([Project1]。[C1])AS [A3],
    STDEVP [A4]
    FROM(SELECT
    [Extent1]。[TimeStamp15Minutes] AS [TimeStamp15Minutes],
    [Extent1]。[DCCurrent] / [Extent2]。[CurrentMPP] AS [C1]
    FROM [dbo]。[StringData] AS [Extent1]
    INNER JOIN [dbo]。[DCString] AS [Extent2] ON [Extent1]。[DCStringID] = [Extent2]。[ID]
    INNER JOIN [dbo]。[DCDistributionBox] AS [Extent3 ] ON [Extent2]。[DCDistributionBoxID] = [Extent3]。[ID]
    INNER JOIN [dbo]。[DataLogger] AS [Extent4] ON [Extent3]。[DataLoggerID] = [Extent4]
    WHERE([Extent4]。[ProjectID] = @ p__linq__0)AND([Extent1]。[TimeStamp15Minutes]> = @ p__linq__1)AND([Extent1]。[TimeStamp15Minutes] @ p__linq__2)
    )AS [Project1]
    GROUP BY [Project1]。[TimeStamp15Minutes]
    )AS [GroupBy1];,
    new SqlParameter(p__linq__0,20827) ,
    new SqlParameter(p__linq__1,fromDate),
    new SqlParameter(p__linq__2,tillDate));

    TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
    Debug.WriteLine({0} th run:{1},i,excecutionTimeSpan.ToString());
    }

    第0次运行: 00:00:00.8381200



    第1次运行: 00:00: 00.6920736



    第2次运行: 00:00:00.7081006



    OPTION(RECOMPILE)

      for(int i = 0 ; i <3; i ++)
    {
    DateTime begin = DateTime.UtcNow;
    int result = model.Database.ExecuteSqlCommand(@SELECT
    1 AS [C1]
    [GroupBy1]。[K1] AS [TimeStamp15Minutes],
    [GroupBy1]。[A1] AS [C2],
    [A2] AS [C3],
    [GroupBy1]。[A3] AS [C4],
    [GroupBy1]。[A4] AS [C5]
    FROM
    [Project1]。[TimeStamp15Minutes] AS [K1],
    MIN([Project1]。[C1])AS [A1],
    MAX([Project1]。[C1])AS [A2],
    AVG([Project1]。[C1])AS [A3],
    STDEVP([Project1]。[C1])AS [A4]
    FROM(SELECT
    [Extent1]。[TimeStamp15Minutes] AS [TimeStamp15Minutes],
    [Extent1]。[DCCurrent] / [Extent2]。[CurrentMPP] AS [C1]
    FROM [dbo]。[StringData] AS [Extent1]
    INNER JOIN [dbo]。[DCString] AS [Extent2] ON [Extent1]。[DCStringID] = [Extent2]。[ID]
    INNER JOIN [dbo]。[DCDistributionBox] AS [Extent3] ON [Extent2]。[DCDistributionBoxID] = [Extent3]。[ID]
    INNER JOIN [dbo]。[DataLogger] AS [Extent4] ON [Extent3]。[DataLoggerID] = [Extent4] ID]
    WHERE ([Extent4]。[ProjectID] = @ p__linq__0)AND([Extent1]。[TimeStamp15Minutes]> = @ p__linq__1)AND([Extent1]] [TimeStamp15Minutes] @ p__linq__2)
    )AS [Project1]
    GROUP BY [Project1]。[TimeStamp15Minutes]
    )AS [GroupBy1]
    OPTION(RECOMPILE);,
    new SqlParameter(p__linq__0,20827),
    new SqlParameter(p__linq__1,fromDate),
    new SqlParameter(p__linq__2,tillDate));

    TimeSpan excecutionTimeSpan = DateTime .UtcNow - begin;
    Debug.WriteLine({0} th run:{1},i,excecutionTimeSpan.ToString());
    }

    第0次运行RECOMPILE: 00:00:00.8260932



    第1次运行RECOMPILE: 00:00:00.9139730



    第二次运行RECOMPILE: 00:00:01.0680665 / p>

    同样的SQL查询在SSMS(不含RECOMPILE)中显示:



    00:00:01.105



    同样的SQL查询在SSMS中显示(带RECOMPILE):



    00:00: 00.902



    我希望这些都是值您需要。

    解决方案

    在这个答案中,我专注于原始观察:EF生成的查询速度很慢,但是在SSMS中运行相同的查询是快速的。



    这种行为的一个可能的解释是参数嗅探


    当SQL Server使用具有参数的
    存储过程时,使用称为参数嗅探的进程。当
    过程被编译或重新编译时,传递给
    参数的值被计算并用于创建一个执行计划。然后将
    值与执行计划一起存储在计划缓存中。在
    的后续执行中,使用相同的值和相同的计划。


    所以,EF生成一个查询参数很少第一次运行此查询时,服务器将使用第一次运行中生效的参数值创建此查询的执行计划。这个计划通常都不错。但是,稍后您可以使用参数的其他值运行相同的EF查询。有可能对于新的参数值,以前生成的计划不是最佳的,并且查询变慢。服务器继续使用以前的计划,因为它仍然是相同的查询,只是参数的值是不同的。



    如果在这个时候你拿查询文本并尝试直接在SSMS中运行,服务器将创建一个新的执行计划,因为在技术上它与EF应用程序发出的查询不一样。即使一个字符差异就足够了,会话设置的任何更改也足以让服务器将查询视为新的。因此,服务器在其缓存中具有两个似乎相同的查询的计划。因为最初是为不同的参数值而建立的,所以第一个缓慢的计划对于新的参数值来说是缓慢的。第二个快速计划是针对当前参数值构建的,因此它是快速的。



    文章应用程序缓慢,SSMS中的快速由Erland Sommarskog更详细地解释了这个和其他相关领域。



    有几种丢弃缓存计划的方法,并强制服务器重新生成。更改表或更改表索引应该这样做 - 它应该丢弃与该表相关的所有计划,包括slow和fast。然后在EF应用程序中使用新的参数值运行查询并获得新的快速计划。您在SSMS中运行查询,并获得具有新值参数的第二个快速计划。服务器仍然生成两个计划,但现在这两个计划都很快。



    另一个变种是添加 OPTION(RECOMPILE)查询。使用此选项,服务器将不会将生成的计划存储在其缓存中。因此,每次查询运行时,服务器将使用实际的参数值来生成计划(它认为)对于给定的参数值是最佳的。缺点是计划生成的额外开销。



    例如,由于过时的统计信息,服务器仍然可以选择使用此选项的不良计划。但是,至少参数嗅探不会是一个问题。


    I have a question about Entity Framework query execution performance.

    Schema:

    I have a table structure like this:

    CREATE TABLE [dbo].[DataLogger]
    (
        [ID] [bigint] IDENTITY(1,1) NOT NULL,
        [ProjectID] [bigint] NULL,
        CONSTRAINT [PrimaryKey1] PRIMARY KEY CLUSTERED ( [ID] ASC )
    )
    
    CREATE TABLE [dbo].[DCDistributionBox]
    (
        [ID] [bigint] IDENTITY(1,1) NOT NULL,
        [DataLoggerID] [bigint] NOT NULL,
        CONSTRAINT [PrimaryKey2] PRIMARY KEY CLUSTERED ( [ID] ASC )
    )
    
    ALTER TABLE [dbo].[DCDistributionBox]
        ADD CONSTRAINT [FK_DCDistributionBox_DataLogger] 
        FOREIGN KEY([DataLoggerID]) REFERENCES [dbo].[DataLogger] ([ID])
    
    CREATE TABLE [dbo].[DCString] 
    (
        [ID] [bigint] IDENTITY(1,1) NOT NULL,
        [DCDistributionBoxID] [bigint] NOT NULL,
        [CurrentMPP] [decimal](18, 2) NULL,
        CONSTRAINT [PrimaryKey3] PRIMARY KEY CLUSTERED ( [ID] ASC )
    )
    
    ALTER TABLE [dbo].[DCString]
        ADD CONSTRAINT [FK_DCString_DCDistributionBox] 
        FOREIGN KEY([DCDistributionBoxID]) REFERENCES [dbo].[DCDistributionBox] ([ID])
    
    CREATE TABLE [dbo].[StringData]
    (
        [DCStringID] [bigint] NOT NULL,
        [TimeStamp] [datetime] NOT NULL,
        [DCCurrent] [decimal](18, 2) NULL,
        CONSTRAINT [PrimaryKey4] PRIMARY KEY CLUSTERED ( [TimeStamp] DESC, [DCStringID] ASC)
    )
    
    CREATE NONCLUSTERED INDEX [TimeStamp_DCCurrent-NonClusteredIndex] 
    ON [dbo].[StringData] ([DCStringID] ASC, [TimeStamp] ASC)
    INCLUDE ([DCCurrent])
    

    Standard indexes on the foreign keys also exist (I don't want to list them all for space reasons).

    The [StringData] table as has following storage stats:

    • Data space: 26,901.86 MB
    • Row count: 131,827,749
    • Partitioned: true
    • Partition count: 62

    Usage:

    I now want to group the data in the [StringData] table and do some aggregation.

    I created an Entity Framework query (detailed infos to the query can be found here):

    var compareData = model.StringDatas
        .AsNoTracking()
        .Where(p => p.DCString.DCDistributionBox.DataLogger.ProjectID == projectID && p.TimeStamp >= fromDate && p.TimeStamp < tillDate)
        .Select(d => new
        {
            TimeStamp = d.TimeStamp,
            DCCurrentMpp = d.DCCurrent / d.DCString.CurrentMPP
        })
        .GroupBy(d => DbFunctions.AddMinutes(DateTime.MinValue, DbFunctions.DiffMinutes(DateTime.MinValue, d.TimeStamp) / minuteInterval * minuteInterval))
        .Select(d => new
        {
            TimeStamp = d.Key,
            DCCurrentMppMin = d.Min(v => v.DCCurrentMpp),
            DCCurrentMppMax = d.Max(v => v.DCCurrentMpp),
            DCCurrentMppAvg = d.Average(v => v.DCCurrentMpp),
            DCCurrentMppStDev = DbFunctions.StandardDeviationP(d.Select(v => v.DCCurrentMpp))
        })
        .ToList();
    

    The excecution timespan is exceptional long!?

    • Execution result: 92rows
    • Execution time: ~16000ms

    Attempts:

    I now took a look into the Entity Framework generated SQL query and looks like this:

    DECLARE @p__linq__4 DATETIME = 0;
    DECLARE @p__linq__3 DATETIME = 0;
    DECLARE @p__linq__5 INT = 15;
    DECLARE @p__linq__6 INT = 15;
    DECLARE @p__linq__0 BIGINT = 20827;
    DECLARE @p__linq__1 DATETIME = '06.02.2016 00:00:00';
    DECLARE @p__linq__2 DATETIME = '07.02.2016 00:00:00';
    
    SELECT 
    1 AS [C1], 
    [GroupBy1].[K1] AS [C2], 
    [GroupBy1].[A1] AS [C3], 
    [GroupBy1].[A2] AS [C4], 
    [GroupBy1].[A3] AS [C5], 
    [GroupBy1].[A4] AS [C6]
    FROM ( SELECT 
        [Project1].[K1] AS [K1], 
        MIN([Project1].[A1]) AS [A1], 
        MAX([Project1].[A2]) AS [A2], 
        AVG([Project1].[A3]) AS [A3], 
        STDEVP([Project1].[A4]) AS [A4]
        FROM ( SELECT 
            DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Project1].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) AS [K1], 
            [Project1].[C1] AS [A1], 
            [Project1].[C1] AS [A2], 
            [Project1].[C1] AS [A3], 
            [Project1].[C1] AS [A4]
            FROM ( SELECT 
                [Extent1].[TimeStamp] AS [TimeStamp], 
                [Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
                FROM    [dbo].[StringData] AS [Extent1]
                INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
                INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
                INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
                WHERE (([Extent4].[ProjectID] = @p__linq__0) OR (([Extent4].[ProjectID] IS NULL) AND (@p__linq__0 IS NULL))) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
            )  AS [Project1]
        )  AS [Project1]
        GROUP BY [K1]
    )  AS [GroupBy1]
    

    I copied this SQL query into SSMS on the same machine, connected with same connection string as the Entity Framework.

    The result is a very much improved performance:

    • Execution result: 92rows
    • Execution time: 517ms

    I also do some loop runing test and the result is strange. The test looks like this

    for (int i = 0; i < 50; i++)
    {
        DateTime begin = DateTime.UtcNow;
    
        [...query...]
    
        TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
        Debug.WriteLine("{0}th run: {1}", i, excecutionTimeSpan.ToString());
    }
    

    The result is very different and looks random(?):

    0th run: 00:00:11.0618580
    1th run: 00:00:11.3339467
    2th run: 00:00:10.0000676
    3th run: 00:00:10.1508140
    4th run: 00:00:09.2041939
    5th run: 00:00:07.6710321
    6th run: 00:00:10.3386312
    7th run: 00:00:17.3422765
    8th run: 00:00:13.8620557
    9th run: 00:00:14.9041528
    10th run: 00:00:12.7772906
    11th run: 00:00:17.0170235
    12th run: 00:00:14.7773750
    

    Question:

    Why is Entity Framework query execution so slow? The resulting row count is really low and the raw SQL query shows a very fast performance.

    Update 1:

    I take care that its not a MetaContext or Model creation delay. Some other queries are executed on the same Model instance right before with good performance.

    Update 2 (related to the answer of @x0007me):

    Thanks for the hint but this can be eliminated by changing the model settings like this:

    modelContext.Configuration.UseDatabaseNullSemantics = true;
    

    The EF generated SQL is now:

    SELECT 
    1 AS [C1], 
    [GroupBy1].[K1] AS [C2], 
    [GroupBy1].[A1] AS [C3], 
    [GroupBy1].[A2] AS [C4], 
    [GroupBy1].[A3] AS [C5], 
    [GroupBy1].[A4] AS [C6]
    FROM ( SELECT 
        [Project1].[K1] AS [K1], 
        MIN([Project1].[A1]) AS [A1], 
        MAX([Project1].[A2]) AS [A2], 
        AVG([Project1].[A3]) AS [A3], 
        STDEVP([Project1].[A4]) AS [A4]
        FROM ( SELECT 
            DATEADD (minute, ((DATEDIFF (minute, @p__linq__4, [Project1].[TimeStamp])) / @p__linq__5) * @p__linq__6, @p__linq__3) AS [K1], 
            [Project1].[C1] AS [A1], 
            [Project1].[C1] AS [A2], 
            [Project1].[C1] AS [A3], 
            [Project1].[C1] AS [A4]
            FROM ( SELECT 
                [Extent1].[TimeStamp] AS [TimeStamp], 
                [Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
                FROM    [dbo].[StringData] AS [Extent1]
                INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
                INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
                INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
                WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
            )  AS [Project1]
        )  AS [Project1]
        GROUP BY [K1]
    )  AS [GroupBy1]
    

    So you can see the problem you described is now solved, but the execution time does not change.

    Also, as you can see in the schema and the raw execution time, I used optimized structure with high optimized indexer.

    Update 3 (related to the answer of @Vladimir Baranov):

    I don't see why this can be related to query plan caching. Because in the MSDN is clearly descripted that the EF6 make use of query plan caching.

    A simple test proof that the huge excecution time differenz is not related to the query plan caching (phseudo code):

    using(var modelContext = new ModelContext())
    {
        modelContext.Query(); //1th run activates caching
    
        modelContext.Query(); //2th used cached plan
    }
    

    As the result, both queries run with the same excecution time.

    Update 4 (related to the answer of @bubi):

    I tried to run the query that is generated by the EF as you descripted it:

    int result = model.Database.ExecuteSqlCommand(@"SELECT 
        1 AS [C1], 
        [GroupBy1].[K1] AS [C2], 
        [GroupBy1].[A1] AS [C3], 
        [GroupBy1].[A2] AS [C4], 
        [GroupBy1].[A3] AS [C5], 
        [GroupBy1].[A4] AS [C6]
        FROM ( SELECT 
            [Project1].[K1] AS [K1], 
            MIN([Project1].[A1]) AS [A1], 
            MAX([Project1].[A2]) AS [A2], 
            AVG([Project1].[A3]) AS [A3], 
            STDEVP([Project1].[A4]) AS [A4]
            FROM ( SELECT 
                DATEADD (minute, ((DATEDIFF (minute, 0, [Project1].[TimeStamp])) / @p__linq__5) * @p__linq__6, 0) AS [K1], 
                [Project1].[C1] AS [A1], 
                [Project1].[C1] AS [A2], 
                [Project1].[C1] AS [A3], 
                [Project1].[C1] AS [A4]
                FROM ( SELECT 
                    [Extent1].[TimeStamp] AS [TimeStamp], 
                    [Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
                    FROM    [dbo].[StringData] AS [Extent1]
                    INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
                    INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
                    INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
                    WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp] >= @p__linq__1) AND ([Extent1].[TimeStamp] < @p__linq__2)
                )  AS [Project1]
            )  AS [Project1]
            GROUP BY [K1]
        )  AS [GroupBy1]",
        new SqlParameter("p__linq__0", 20827),
        new SqlParameter("p__linq__1", fromDate),
        new SqlParameter("p__linq__2", tillDate),
        new SqlParameter("p__linq__5", 15),
        new SqlParameter("p__linq__6", 15));
    

    • Execution result: 92
    • Execution time: ~16000ms

    It took exact as long as the normal EF query!?

    Update 5 (related to the answer of @vittore):

    I create a traced call tree, maybe it helps:

    Update 6 (related to the answer of @usr):

    I created two showplan XML via SQL Server Profiler.

    Fast run (SSMS).SQLPlan

    Slow run (EF).SQLPlan

    Update 7 (related to the comments of @VladimirBaranov):

    I now run some more test case related to your comments.

    First I eleminate time taking order operations by using a new computed column and a matching INDEXER. This reduce the perfomance lag related to DATEADD(MINUTE, DATEDIFF(MINUTE, 0, [TimeStamp] ) / 15* 15, 0). Detail for how and why you can find here.

    The Result look s like this:

    Pure EntityFramework query:

    for (int i = 0; i < 3; i++)
    {
        DateTime begin = DateTime.UtcNow;
        var result = model.StringDatas
            .AsNoTracking()
            .Where(p => p.DCString.DCDistributionBox.DataLogger.ProjectID == projectID && p.TimeStamp15Minutes >= fromDate && p.TimeStamp15Minutes < tillDate)
            .Select(d => new
            {
                TimeStamp = d.TimeStamp15Minutes,
                DCCurrentMpp = d.DCCurrent / d.DCString.CurrentMPP
            })
            .GroupBy(d => d.TimeStamp)
            .Select(d => new
            {
                TimeStamp = d.Key,
                DCCurrentMppMin = d.Min(v => v.DCCurrentMpp),
                DCCurrentMppMax = d.Max(v => v.DCCurrentMpp),
                DCCurrentMppAvg = d.Average(v => v.DCCurrentMpp),
                DCCurrentMppStDev = DbFunctions.StandardDeviationP(d.Select(v => v.DCCurrentMpp))
            })
            .ToList();
    
            TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
            Debug.WriteLine("{0}th run pure EF: {1}", i, excecutionTimeSpan.ToString());
    }
    

    0th run pure EF: 00:00:12.6460624

    1th run pure EF: 00:00:11.0258393

    2th run pure EF: 00:00:08.4171044

    I now used the EF generated SQL as a SQL query:

    for (int i = 0; i < 3; i++)
    {
        DateTime begin = DateTime.UtcNow;
        int result = model.Database.ExecuteSqlCommand(@"SELECT 
            1 AS [C1], 
            [GroupBy1].[K1] AS [TimeStamp15Minutes], 
            [GroupBy1].[A1] AS [C2], 
            [GroupBy1].[A2] AS [C3], 
            [GroupBy1].[A3] AS [C4], 
            [GroupBy1].[A4] AS [C5]
            FROM ( SELECT 
                [Project1].[TimeStamp15Minutes] AS [K1], 
                MIN([Project1].[C1]) AS [A1], 
                MAX([Project1].[C1]) AS [A2], 
                AVG([Project1].[C1]) AS [A3], 
                STDEVP([Project1].[C1]) AS [A4]
                FROM ( SELECT 
                    [Extent1].[TimeStamp15Minutes] AS [TimeStamp15Minutes], 
                    [Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
                    FROM    [dbo].[StringData] AS [Extent1]
                    INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
                    INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
                    INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
                    WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp15Minutes] >= @p__linq__1) AND ([Extent1].[TimeStamp15Minutes] < @p__linq__2)
                )  AS [Project1]
                GROUP BY [Project1].[TimeStamp15Minutes]
            )  AS [GroupBy1];",
        new SqlParameter("p__linq__0", 20827),
        new SqlParameter("p__linq__1", fromDate),
        new SqlParameter("p__linq__2", tillDate));
    
        TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
        Debug.WriteLine("{0}th run: {1}", i, excecutionTimeSpan.ToString());
    }
    

    0th run: 00:00:00.8381200

    1th run: 00:00:00.6920736

    2th run: 00:00:00.7081006

    and with OPTION(RECOMPILE):

    for (int i = 0; i < 3; i++)
    {
        DateTime begin = DateTime.UtcNow;
        int result = model.Database.ExecuteSqlCommand(@"SELECT 
            1 AS [C1], 
            [GroupBy1].[K1] AS [TimeStamp15Minutes], 
            [GroupBy1].[A1] AS [C2], 
            [GroupBy1].[A2] AS [C3], 
            [GroupBy1].[A3] AS [C4], 
            [GroupBy1].[A4] AS [C5]
            FROM ( SELECT 
                [Project1].[TimeStamp15Minutes] AS [K1], 
                MIN([Project1].[C1]) AS [A1], 
                MAX([Project1].[C1]) AS [A2], 
                AVG([Project1].[C1]) AS [A3], 
                STDEVP([Project1].[C1]) AS [A4]
                FROM ( SELECT 
                    [Extent1].[TimeStamp15Minutes] AS [TimeStamp15Minutes], 
                    [Extent1].[DCCurrent] / [Extent2].[CurrentMPP] AS [C1]
                    FROM    [dbo].[StringData] AS [Extent1]
                    INNER JOIN [dbo].[DCString] AS [Extent2] ON [Extent1].[DCStringID] = [Extent2].[ID]
                    INNER JOIN [dbo].[DCDistributionBox] AS [Extent3] ON [Extent2].[DCDistributionBoxID] = [Extent3].[ID]
                    INNER JOIN [dbo].[DataLogger] AS [Extent4] ON [Extent3].[DataLoggerID] = [Extent4].[ID]
                    WHERE ([Extent4].[ProjectID] = @p__linq__0) AND ([Extent1].[TimeStamp15Minutes] >= @p__linq__1) AND ([Extent1].[TimeStamp15Minutes] < @p__linq__2)
                )  AS [Project1]
                GROUP BY [Project1].[TimeStamp15Minutes]
            )  AS [GroupBy1]
            OPTION(RECOMPILE);",
        new SqlParameter("p__linq__0", 20827),
        new SqlParameter("p__linq__1", fromDate),
        new SqlParameter("p__linq__2", tillDate));
    
        TimeSpan excecutionTimeSpan = DateTime.UtcNow - begin;
        Debug.WriteLine("{0}th run: {1}", i, excecutionTimeSpan.ToString());
    }
    

    0th run with RECOMPILE: 00:00:00.8260932

    1th run with RECOMPILE: 00:00:00.9139730

    2th run with RECOMPILE: 00:00:01.0680665

    Same SQL query excecuted in SSMS (without RECOMPILE):

    00:00:01.105

    Same SQL query excecuted in SSMS (with RECOMPILE):

    00:00:00.902

    I hope this are all values you needed.

    解决方案

    In this answer I'm focusing on the original observation: the query generated by EF is slow, but when the same query is run in SSMS it is fast.

    One possible explanation of this behaviour is Parameter sniffing.

    SQL Server uses a process called parameter sniffing when it executes stored procedures that have parameters. When the procedure is compiled or recompiled, the value passed into the parameter is evaluated and used to create an execution plan. That value is then stored with the execution plan in the plan cache. On subsequent executions, that same value – and same plan – is used.

    So, EF generates a query that has few parameters. The first time you run this query the server creates an execution plan for this query using values of parameters that were in effect in the first run. That plan is usually pretty good. But, later on you run the same EF query using other values for parameters. It is possible that for new values of parameters the previously generated plan is not optimal and the query becomes slow. The server keeps using the previous plan, because it is still the same query, just values of parameters are different.

    If at this moment you take the query text and try to run it directly in SSMS the server will create a new execution plan, because technically it is not the same query that is issued by EF application. Even one character difference is enough, any change in the session settings is also enough for the server to treat the query as a new one. As a result the server has two plans for the seemingly same query in its cache. The first "slow" plan is slow for the new values of parameters, because it was originally built for different parameter values. The second "fast" plan is built for the current parameter values, so it is fast.

    The article Slow in the Application, Fast in SSMS by Erland Sommarskog explains this and other related areas in much more details.

    There are several ways to discard cached plans and force the server to regenerate them. Changing the table or changing the table indexes should do it - it should discard all plans that are related to this table, both "slow" and "fast". Then you run the query in EF application with new values of parameters and get a new "fast" plan. You run the query in SSMS and get a second "fast" plan with new values of parameters. The server still generates two plans, but both plans are fast now.

    Another variant is adding OPTION(RECOMPILE) to the query. With this option the server would not store the generated plan in its cache. So, every time the query runs the server would use actual parameter values to generate the plan that (it thinks) would be optimal for the given parameter values. The downside is an added overhead of the plan generation.

    Mind you, the server still could choose a "bad" plan with this option due to outdated statistics, for example. But, at least, parameter sniffing would not be a problem.

    这篇关于实体框架查询性能与原始SQL执行不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆