多次使用Include()时,实体框架代码很慢 [英] Entity-framework code is slow when using Include() many times

查看:127
本文介绍了多次使用Include()时,实体框架代码很慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在调试一些慢速代码,似乎罪魁祸首是下面发布的EF代码。在稍后阶段评估查询时需要4-5秒。我试图让它在1秒内运行。

I have been debugging some slow code and it seems that the culprit is the EF code posted below. It takes 4-5 seconds when the query is evaluated at a later stage. I'm trying to get it to run in under 1 second.

我使用SQL Server Profiler对此进行了测试,似乎执行了一堆SQL脚本。它还确认SQL Server完成执行需要3-4秒。

I have tested this using the SQL Server Profiler, and it seems that a bunch of SQL scripts are executed. It also confirms that it takes 3-4 seconds before SQL server is done with the executions.

我已经阅读了有关使用Include()的其他类似问题,它确实似乎在使用它时会有性能损失。我试图将下面的代码分成几个不同的查询,但它没有太大区别。

I have read other similar questions about the use of Include() and it does seem that there is a performance penalty when using it. I've tried to split the below code into several different queries but it's not making much of difference.

我知道如何让下面的代码执行得更快?

Any idea how I can get the below to execute faster?

目前我正在处理的网络应用程序只是在等待以下内容完成时显示一个空的iframe。如果我无法获得更快的执行时间,我必须将其拆分并部分加载iframe数据或使用其他异步解决方案。这里的任何想法也将受到赞赏!

Currently the web app I'm working on is just showing an empty iframe while waiting for the below to complete. If I cannot get faster execution time I have to split it up and partially load the iframe with data or go with another asynchronous solution. Any ideas here would also be appreciated!

using (var scope = new TransactionScope(TransactionScopeOption.Required, new TransactionOptions { IsolationLevel = System.Transactions.IsolationLevel.ReadUncommitted }))
        {
            formInstance = context.FormInstanceSet
                                .Includes(x => x.Include(fi => fi.FormDefinition).Include(fd => fd.FormSectionDefinitions).Include(fs => fs.FormStateDefinitionEditableSections))
                                .Includes(x => x.Include(fi => fi.FormDefinition).Include(fd => fd.FormStateDefinitions))
                                .Includes(x => x.Include(fi => fi.FormSectionInstances).Include(fs => fs.FormFieldInstances).Include(ff => ff.FormFieldDefinition).Include(ffd => ffd.FormFieldMetaDataDefinition).Include(ffmdd => ffmdd.ComplexTypePropertyNames))
                                .Include(x => x.CurrentFormStateInstance)      
                                .Include(x => x.Files)
                                .FirstOrDefault(x => x.FormInstanceIdentifier == formInstanceIdentifier);

            scope.Complete();
        }


推荐答案


使用Include时似乎存在性能损失

it does seem that there is a performance penalty when using Include

这是轻描淡写!多个包含快速扩大SQL查询结果的宽度和长度。这是为什么?

That's an understatement! Multiple Includes quickly blow up the SQL query result both in width and in length. Why is that?

tl; dr 多个包含炸毁SQL结果组。很快,通过多个数据库调用加载数据而不是运行一个mega语句变得更便宜。尝试找到包含加载语句的最佳混合。

tl;dr Multiple Includes blow up the SQL result set. Soon it becomes cheaper to load data by multiple database calls instead of running one mega statement. Try to find the best mixture of Include and Load statements.

假设我们已经


  • root enitity Root

  • 父实体 Root.Parent

  • 子实体 Root.Children1 Root.Children2

  • 一个LINQ语句 Root.Include(Parent)。包含(Children1)。包含(Children2)

  • root enitity Root
  • parent entity Root.Parent
  • child entities Root.Children1 and Root.Children2
  • a LINQ statement Root.Include("Parent").Include("Children1").Include("Children2")

这会构建一个具有以下结构的SQL语句:

This builds a SQL statement that has the following structure:

SELECT *, <PseudoColumns>
FROM Root
JOIN Parent
JOIN Children1

UNION

SELECT *, <PseudoColumns>
FROM Root
JOIN Parent
JOIN Children2

这些< PseudoColumns> CAST(NULL AS int)AS [C2],等表达式组成,它们具有相同的表达式所有 UNION -ed查询中的列数。第一部分为 Child2 添加伪列,第二部分为 Child1 添加伪列。

These <PseudoColumns> consist of expressions like CAST(NULL AS int) AS [C2], and they serve to have the same amount of columns in all UNION-ed queries. The first part adds pseudo columns for Child2, the second part adds pseudo columns for Child1.

这就是SQL结果集大小的含义:

This is what it means for the size of the SQL result set:


  • 的数量 SELECT 子句中的列是四个表中所有列的总和

  • 行的数量是包含子集合中记录的总和

  • Number of columns in the SELECT clause is the sum of all columns in the four tables
  • The number of rows is the sum of records in included child collections

由于数据点总数 columns * rows ,每增加 Include 会以指数方式增加结果集中的数据点总数。让我通过再次获取 Root 来证明,现在使用额外的 Children3 集合。如果所有表格都有5列和100行,我们得到:

Since the total number of data points is columns * rows, each additional Include exponentially increases the total number of data points in the result set. Let me demonstrate that by taking Root again, now with an additional Children3 collection. If all tables have 5 columns and 100 rows, we get:

一个包含 Root + 1子集合):10列* 100行= 1000个数据点。

两个包含 s( Root + 2个子集合):15列* 200行= 3000个数据点。

包含 s( Root + 3个子集合):20列* 300行= 6000个数据点。

One Include (Root + 1 child collection): 10 columns * 100 rows = 1000 data points.
Two Includes (Root + 2 child collections): 15 columns * 200 rows = 3000 data points.
Three Includes (Root + 3 child collections): 20 columns * 300 rows = 6000 data points.

12 包含这相当于78000个数据点!

With 12 Includes this would amount to 78000 data points!

相反,如果你单独得到每个表的所有记录而不是12 包含,你有 13 * 5 * 100 数据点:6500,低于10%!

Conversely, if you get all records for each table separately instead of 12 Includes, you have 13 * 5 * 100 data points: 6500, less than 10%!

现在这些数字有点夸张,因为许多这些数据点都是 null ,因此它们对发送到客户端的结果集的实际大小贡献不大。但查询大小和查询优化器的任务肯定会受到 Include s的增加数量的负面影响。

Now these numbers are somewhat exaggerated in that many of these data points will be null, so they don't contribute much to the actual size of the result set that is sent to the client. But the query size and the task for the query optimizer certainly get affected negatively by increasing numbers of Includes.

所以使用包含是数据库调用和数据量之间的微妙平衡。很难给出一个经验法则,但是现在你可以想象,如果超过~3 包含,那么数据量通常会快速超过额外调用的成本子集合(但是对于父包含,只会扩大结果集)。

So using Includes is a delicate balance between the cost of database calls and data volume. It's hard to give a rule of the thumb, but by now you can imagine that the data volume generally quickly outgrows the cost of extra calls if there are more than ~3 Includes for child collections (but quite a bit more for parent Includes, that only widen the result set).

Include 的替代方法是在单独的查询中加载数据:

The alternative to Include is to load data in separate queries:

context.Configuration.LazyLoadingEnabled = false;
var rootId = 1;
context.Children1.Where(c => c.RootId == rootId).Load();
context.Children2.Where(c => c.RootId == rootId).Load();
return context.Roots.Find(rootId);

这会将所有必需的数据加载到上下文的缓存中。在此过程中,EF执行 relationship fixup ,通过它自动填充已加载实体的导航属性( Root.Children 等)。最终结果与包含 Include 的语句相同,除了一个重要区别:子集合未在实体状态管理器中标记为已加载,因此EF将尝试如果您访问它们,则触发延迟加载。这就是关闭延迟加载很重要的原因。

This loads all required data into the context's cache. During this process, EF executes relationship fixup by which it auto-populates navigation properties (Root.Children etc.) by loaded entities. The end result is identical to the statement with Includes, except for one important difference: the child collections are not marked as loaded in the entity state manager, so EF will try to trigger lazy loading if you access them. That's why it's important to turn off lazy loading.

实际上,你必须找出包含和加载语句最适合您。

In reality, you will have to figure out which combination of Include and Load statements work best for you.

这篇关于多次使用Include()时,实体框架代码很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆