多次使用 Include() 时实体框架代码很慢 [英] Entity-framework code is slow when using Include() many times

查看:28
本文介绍了多次使用 Include() 时实体框架代码很慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在调试一些缓慢的代码,似乎罪魁祸首是下面发布的 EF 代码.在稍后阶段评估查询需要 4-5 秒.我试图让它在 1 秒内运行.

I have been debugging some slow code and it seems that the culprit is the EF code posted below. It takes 4-5 seconds when the query is evaluated at a later stage. I'm trying to get it to run in under 1 second.

我已经使用 SQL Server Profiler 对此进行了测试,似乎执行了一堆 SQL 脚本.它还确认 SQL 服务器完成执行需要 3-4 秒.

I have tested this using the SQL Server Profiler, and it seems that a bunch of SQL scripts are executed. It also confirms that it takes 3-4 seconds before SQL server is done with the executions.

我已经阅读了有关使用 Include() 的其他类似问题,并且在使用它时似乎确实存在性能损失.我尝试将以下代码拆分为几个不同的查询,但没有太大区别.

I have read other similar questions about the use of Include() and it does seem that there is a performance penalty when using it. I've tried to split the below code into several different queries but it's not making much of difference.

知道如何让以下内容更快地执行吗?

Any idea how I can get the below to execute faster?

目前我正在开发的网络应用程序在等待以下完成时只显示一个空的 iframe.如果我无法获得更快的执行时间,我必须将其拆分并使用数据部分加载 iframe,或者使用另一个异步解决方案.这里的任何想法也将不胜感激!

Currently the web app I'm working on is just showing an empty iframe while waiting for the below to complete. If I cannot get faster execution time I have to split it up and partially load the iframe with data or go with another asynchronous solution. Any ideas here would also be appreciated!

using (var scope = new TransactionScope(TransactionScopeOption.Required, new TransactionOptions { IsolationLevel = System.Transactions.IsolationLevel.ReadUncommitted }))
        {
            formInstance = context.FormInstanceSet
                                .Includes(x => x.Include(fi => fi.FormDefinition).Include(fd => fd.FormSectionDefinitions).Include(fs => fs.FormStateDefinitionEditableSections))
                                .Includes(x => x.Include(fi => fi.FormDefinition).Include(fd => fd.FormStateDefinitions))
                                .Includes(x => x.Include(fi => fi.FormSectionInstances).Include(fs => fs.FormFieldInstances).Include(ff => ff.FormFieldDefinition).Include(ffd => ffd.FormFieldMetaDataDefinition).Include(ffmdd => ffmdd.ComplexTypePropertyNames))
                                .Include(x => x.CurrentFormStateInstance)      
                                .Include(x => x.Files)
                                .FirstOrDefault(x => x.FormInstanceIdentifier == formInstanceIdentifier);

            scope.Complete();
        }

推荐答案

tl;dr 多个 Include 会炸毁 SQL 结果集.很快,通过多个数据库调用加载数据而不是运行一个大型语句会变得更便宜.尝试找到 IncludeLoad 语句的最佳组合.

tl;dr Multiple Includes blow up the SQL result set. Soon it becomes cheaper to load data by multiple database calls instead of running one mega statement. Try to find the best mixture of Include and Load statements.

使用 Include 时似乎确实存在性能损失

it does seem that there is a performance penalty when using Include

那是轻描淡写!多个 Include 很快就会在宽度和长度上炸毁 SQL 查询结果.这是为什么?

That's an understatement! Multiple Includes quickly blow up the SQL query result both in width and in length. Why is that?

(此部分适用实体框架经典版、v6 及更早版本)

(This part applies Entity Framework classic, v6 and earlier)

假设我们有

  • 根实体Root
  • 父实体Root.Parent
  • 子实体 Root.Children1Root.Children2
  • 一个 LINQ 语句 Root.Include("Parent").Include("Children1").Include("Children2")

这将构建一个具有以下结构的 SQL 语句:

This builds a SQL statement that has the following structure:

SELECT *, <PseudoColumns>
FROM Root
JOIN Parent
JOIN Children1

UNION

SELECT *, <PseudoColumns>
FROM Root
JOIN Parent
JOIN Children2

这些 由类似CAST(NULL AS int) AS [C2], 的表达式组成,它们用于在所有UNION-ed 查询.第一部分为Child2添加伪列,第二部分为Child1添加伪列.

These <PseudoColumns> consist of expressions like CAST(NULL AS int) AS [C2], and they serve to have the same amount of columns in all UNION-ed queries. The first part adds pseudo columns for Child2, the second part adds pseudo columns for Child1.

这就是SQL结果集大小的含义:

This is what it means for the size of the SQL result set:

  • SELECT子句中的列数是四个表中所有列的总和
  • 行数是包含的子集合中记录的总和
  • Number of columns in the SELECT clause is the sum of all columns in the four tables
  • The number of rows is the sum of records in included child collections

由于数据点的总数为columns * rows,每增加一个Include,结果集中的数据点总数就会呈指数增加.让我通过再次获取 Root 来证明这一点,现在使用一个额外的 Children3 集合.如果所有表都有 5 列和 100 行,我们得到:

Since the total number of data points is columns * rows, each additional Include exponentially increases the total number of data points in the result set. Let me demonstrate that by taking Root again, now with an additional Children3 collection. If all tables have 5 columns and 100 rows, we get:

一个 Include(Root + 1 个子集合):10 列 * 100 行 = 1000 个数据点.
两个 Include(Root + 2 个子集合):15 列 * 200 行 = 3000 个数据点.
三个 Include(Root + 3 个子集合):20 列 * 300 行 = 6000 个数据点.

One Include (Root + 1 child collection): 10 columns * 100 rows = 1000 data points.
Two Includes (Root + 2 child collections): 15 columns * 200 rows = 3000 data points.
Three Includes (Root + 3 child collections): 20 columns * 300 rows = 6000 data points.

如果有 12 个 Includes,这将达到 78000 个数据点!

With 12 Includes this would amount to 78000 data points!

相反,如果分别获取每个表的所有记录而不是 12 个 Includes,则您有 13 * 5 * 100 个数据点:6500,不到 10%!

Conversely, if you get all records for each table separately instead of 12 Includes, you have 13 * 5 * 100 data points: 6500, less than 10%!

现在这些数字有些夸大了,因为这些数据点中有许多是null,因此它们对发送到客户端的结果集的实际大小没有多大贡献.但是查询优化器的查询大小和任务肯定会因 Include 数量的增加而受到负面影响.

Now these numbers are somewhat exaggerated in that many of these data points will be null, so they don't contribute much to the actual size of the result set that is sent to the client. But the query size and the task for the query optimizer certainly get affected negatively by increasing numbers of Includes.

所以使用Includes 是在数据库调用成本和数据量之间的微妙平衡.很难给出一个经验法则,但是现在您可以想象,如果子集合的 Includes 超过 3 个(但相当多),数据量通常很快就会超过额外调用的成本.父 Includes 多一点,只会扩大结果集).

So using Includes is a delicate balance between the cost of database calls and data volume. It's hard to give a rule of the thumb, but by now you can imagine that the data volume generally quickly outgrows the cost of extra calls if there are more than ~3 Includes for child collections (but quite a bit more for parent Includes, that only widen the result set).

Include 的替代方法是在单独的查询中加载数据:

The alternative to Include is to load data in separate queries:

context.Configuration.LazyLoadingEnabled = false;
var rootId = 1;
context.Children1.Where(c => c.RootId == rootId).Load();
context.Children2.Where(c => c.RootId == rootId).Load();
return context.Roots.Find(rootId);

这会将所有需要的数据加载到上下文的缓存中.在此过程中,EF 执行关系修复,通过加载的实体自动填充导航属性(Root.Children 等).最终结果与带有 Include 的语句相同,除了一个重要区别:子集合在实体状态管理器中未标记为已加载,因此如果您访问,EF 将尝试触发延迟加载他们.这就是为什么关闭延迟加载很重要的原因.

This loads all required data into the context's cache. During this process, EF executes relationship fixup by which it auto-populates navigation properties (Root.Children etc.) by loaded entities. The end result is identical to the statement with Includes, except for one important difference: the child collections are not marked as loaded in the entity state manager, so EF will try to trigger lazy loading if you access them. That's why it's important to turn off lazy loading.

实际上,您必须弄清楚IncludeLoad 语句的哪种组合最适合您.

In reality, you will have to figure out which combination of Include and Load statements work best for you.

每个Include 也会增加查询的复杂性,因此数据库的查询优化器将不得不越来越多地努力寻找最佳查询计划.在某些时候,这可能不再成功.此外,当缺少某些重要索引(尤其是外键)时,即使使用最佳查询计划,添加 Include 也可能会影响性能.

Each Include also increases query complexity, so the database's query optimizer will have to make increasingly more effort to find the best query plan. At some point this may no longer succeed. Also, when some vital indexes are missing (esp. on foreign keys) performance may suffer by adding Includes, even with the best query plan.

出于某种原因,上述行为,UNIONed 查询,从 EF 核心 3 开始被放弃.现在它使用连接构建一个查询.当查询是star"时shape1 这导致笛卡尔爆炸(在 SQL 结果集中).我只能找到一个说明宣布这一重大变化,但它没有说为什么.

For some reason, the behavior described above, UNIONed queries, was abandoned as of EF core 3. It now builds one query with joins. When the query is "star" shaped1 this leads to Cartesian explosion (in the SQL result set). I can only find a note announcing this breaking change, but it doesn't say why.

为了应对这种笛卡尔爆炸,Entity Framework core 5 引入了 拆分查询 允许在多个查询中加载相关数据.它可以防止构建一个庞大的、成倍增加的 SQL 结果集.此外,由于较低的查询复杂性,即使多次往返,它也可以减少获取数据所需的时间.但是,并发更新时可能会导致数据不一致.

To counter this Cartesian explosion, Entity Framework core 5 introduced the concept of split queries that enables loading related data in multiple queries. It prevents building one massive, multiplied SQL result set. Also, because of lower query complexity, it may reduce the time it takes to fetch data even with multiple roundtrips. However, it may lead to inconsistent data when concurrent updates occur.

1查询根之外的多个 1:n 关系.

1Multiple 1:n relationships off of the query root.

这篇关于多次使用 Include() 时实体框架代码很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆