为什么实体框架在SELECT上生成JOIN [英] Why is Entity Framework generating a JOIN on SELECT

查看:34
本文介绍了为什么实体框架在SELECT上生成JOIN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在C#应用程序中使用Entity Framework,并且在使用延迟加载.我们注意到一个查询对我们的CPU的影响非常大,它仅计算总和.在调试由Entity Framework生成的查询时,它将创建一个无效的INNER JOIN (SELECT ....当我手动将查询更改为适当的JOIN时,查询时间从1.3秒变为0.03秒.

I am using Entity Framework in a C# application and I am using Lazy Loading. We noticed one query has an extremely high impact on our CPU, which merely calculates a sum. When debugging the query generated by Entity Framework it creates a INNER JOIN (SELECT ... which is not performant. When I manually change the query to a proper JOIN the query time goes from 1.3 sec to 0.03 sec.

让我用我的代码的简化版本进行说明.

Let me illustrate it with a simplified version of my code.

public decimal GetPortfolioValue(Guid portfolioId)
{
   var value = DbContext.Portfolios
        .Where( x => x.Id.Equals(portfolioId) )
        .SelectMany( p => p.Items
            .Where( i => i.Status == ItemStatusConstants.Subscribed 
                && _activeStatuses.Contains( i.Category.Status ) )
        )
        .Select( i => i.Amount )
        .DefaultIfEmpty(0)
        .Sum();

   return value;
}

这将生成一个查询,该查询选择总和,但对连接在一起的两个表的SELECT进行内部联接.我在此处创建了一个pastebin,用于生成的查询不会污染该问题,但缩短的版本是: >

This generates a query that selects the sum but does an inner join on a SELECT of two tables joined together. I created a pastebin here for the generated query not to pollute this question but a shortened version would be:

SELECT ...
FROM `portfolios` AS `Extent1`
INNER JOIN (SELECT 
               `Extent2`.*,
               `Extent3`.*
            FROM `items` AS `Extent2`
            INNER JOIN `categories` AS `Extent3` ON `Extent3`.`id` = 
`Extent2`.`category_id`) AS `Join1`
ON `Extent1`.`id` = `Join1`.`portfolio_id`
    AND ((`Join1`.`status` = @gp1)
    AND (`Join1`.`STATUS1` IN (@gp2, @gp3, @gp4, @gp5, @gp6)))
WHERE ...

我希望它生成的查询(花费0.03秒而不是1.3秒)将类似于

The query I'd expect it to generate (and which takes 0.03 sec instead of 1.3 sec) would be something like

SELECT ...
FROM `portfolios` AS `Extent1`
INNER JOIN `items` AS `Extent2` ON `Extent2`.`portfolio_id` = `Extent1`.`id`
INNER JOIN `categories` AS `Extent3` ON `Extent3`.`id` = `Extent2`.`category_id`
    AND ((`Extent2`.`status` = @gp1)
    AND (`Extent3`.`status` IN (@gp2, @gp3, @gp4, @gp5, @gp6)))
WHERE ...

我怀疑这是由于.SelectMany引起的,但是我看不到如何重写LINQ查询以使其更有效.至于实体,链接属性是虚拟的,并配置了外键:

I suspect it's due to the .SelectMany but I don't see how I should rewrite the LINQ query to make it more efficient. As for the Entities, the linking properties are virtual and have a foreign key configured:

public class Portfolio
{
   public Guid Id { get; set; }
   public virtual ICollection<Item> Items { get; set; }
}

public class Item
{
   public Guid Id { get; set; }
   public Guid PortfolioId { get; set; }
   public Guid CategoryId { get; set; }
   public decimal Amount { get; set; }
   public string Status { get; set; }
   public virtual Portfolio Portfolio { get; set; }
   public virtual Category Category { get; set; }
}

public class Category
{
   public Guid Id { get; set; }
   public string Status { get; set; }
   public virtual ICollection<Item> Items { get; set; }
}

任何帮助将不胜感激!

推荐答案

由于您不需要Portfolio中的任何内容,只需按PortfolioId进行过滤,就可以直接查询PortfolioItems.假设您的DbContext具有一个DbSet,其中包含所有投资组合中的所有项目,则可能是这样的:

As you dont need anything from Portfolio, just filter by PortfolioId, you could query directly PortfolioItems. Assuming your DbContext has a DbSet with all items in all portfolios, could be something like this:

var value = DbContext.PortfolioItems
                     .Where(i => i.PortfolioId == portfolioId && i.Status == ItemStatusConstants.Subscribed && _activeStatuses.Contains(i.Category.Status))
                     .Sum(i=>i.Amount);                 

我相信如果直接使用适当的Queryable.Sum重载,则不需要DefaultIfEmpty或select.

I believe you don't need the DefaultIfEmpty nor the select if you use directly the appropiate Queryable.Sum overload.

已在不公开DbSet的情况下尝试了两个不同的LINQ查询.

EDITED: Tried two different LINQ queries without exposing a DbSet.

第一个查询与您的查询基本相同:

The first query is basically the same as yours:

var value2 = dbContext.Portfolios
    .Where(p => p.Id == portfolioId)
    .SelectMany(p => p.Items)
    .Where(i => i.Status == "A" && _activeStatuses.Contains(i.Category.Status))
    .Select(i=>i.Amount)
    .DefaultIfEmpty()
    .Sum();

在SQL Server中剖析了查询(手头没有MySql)并产生了一个丑陋的句子(替换了参数,并且引号未转义以进行测试):

Profiled the query in SQL Server (don't have MySql at hand) and produces an ugly sentence (parameters replaced and quotes unescaped for testing):

SELECT [GroupBy1].[a1] AS [C1] 
FROM   (SELECT Sum([Join2].[a1_0]) AS [A1] 
    FROM   (SELECT CASE 
                     WHEN ( [Project1].[c1] IS NULL ) THEN Cast( 
                     0 AS DECIMAL(18)) 
                     ELSE [Project1].[amount] 
                   END AS [A1_0] 
            FROM   (SELECT 1 AS X) AS [SingleRowTable1] 
                   LEFT OUTER JOIN 
                   (SELECT [Extent1].[amount] AS [Amount], 
                           Cast(1 AS TINYINT) AS [C1] 
                    FROM   [dbo].[items] AS [Extent1] 
                           INNER JOIN [dbo].[categories] AS 
                                      [Extent2] 
                                   ON [Extent1].[categoryid] = 
                                      [Extent2].[id] 
                    WHERE  ( N'A' = [Extent1].[status] ) 
                           AND ( [Extent1].[portfolioid] = 
                                 'E2CC0CC2-066F-45C9-9D48-543D92C4C92E' ) 
                           AND ( [Extent2].[status] IN ( N'A', N'B', N'C' ) 
                               ) 
                           AND ( [Extent2].[status] IS NOT NULL )) AS 
                   [Project1] 
                                ON 1 = 1) AS [Join2]) AS [GroupBy1] 

如果我们删除"Select"和"DefaultIfEmpty"方法,并以此方式重写查询:

If we remove the "Select" and "DefaultIfEmpty" methods, and rewrite the query as this:

var value = dbContext.Portfolios
    .Where(p => p.Id == portfolioId)
    .SelectMany(p => p.Items)
    .Where(i => i.Status == "A" && _activeStatuses.Contains(i.Category.Status))
    .Sum(i => i.Amount);

生成的句子更简洁:

SELECT [GroupBy1].[a1] AS [C1] 
FROM   (SELECT Sum([Extent1].[amount]) AS [A1] 
    FROM   [dbo].[items] AS [Extent1] 
           INNER JOIN [dbo].[categories] AS [Extent2] 
                   ON [Extent1].[categoryid] = [Extent2].[id] 
    WHERE  ( N'A' = [Extent1].[status] ) 
           AND ( [Extent1].[portfolioid] = 
                 'E2CC0CC2-066F-45C9-9D48-543D92C4C92E' ) 
           AND ( [Extent2].[status] IN ( N'A', N'B', N'C' ) ) 
           AND ( [Extent2].[status] IS NOT NULL )) AS [GroupBy1] 

结论:我们不能依靠LINQ提供程序来创建优化的查询.甚至在思考生成的SQL语句之前,都必须对linq查询进行分析和优化.

Conclussion: We cannot rely on LINQ provider to create optimized queries. The linq query must be analyzed and optimized even before thinking in SQL sentence generated.

这篇关于为什么实体框架在SELECT上生成JOIN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆