改善大型EF多级包含的性能 [英] Improving performance of big EF multi-level Include

查看:91
本文介绍了改善大型EF多级包含的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是EF新手(就像我今天刚开始使用的,我只使用过其他ORM),并且正在经历火的洗礼.

I'm an EF noob (as in I just started today, I've only used other ORMs), and I'm experiencing a baptism of fire.

有人要求我改善另一个开发人员创建的此查询的性能:

I've been asked to improve the performance of this query created by another dev:

      var questionnaires = await _myContext.Questionnaires
            .Include("Sections")
            .Include(q => q.QuestionnaireCommonFields)
            .Include("Sections.Questions")
            .Include("Sections.Questions.Answers")
            .Include("Sections.Questions.Answers.AnswerMetadatas")
            .Include("Sections.Questions.Answers.SubQuestions")
            .Include("Sections.Questions.Answers.SubQuestions.Answers")
            .Include("Sections.Questions.Answers.SubQuestions.Answers.AnswerMetadatas")
            .Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions")
            .Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers")
            .Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.AnswerMetadatas")
            .Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions")
            .Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers")
            .Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.AnswerMetadatas")
            .Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions")
            .Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers")
            .Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.AnswerMetadatas")
            .Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions")
            .Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers")
            .Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.AnswerMetadatas")
        .Where(q => questionnaireIds.Contains(q.Id))
        .ToListAsync().ConfigureAwait(false);

快速的网上冲浪告诉我,如果您深入运行多个级别,则Include()会导致cols *行乘积和较差的性能.

A quick web-surf tells me that Include() results in a cols * rows product and poor performance if you run multiple levels deep.

我已经看到了一些关于SO的有用答案,但是它们仅提供了一些不太复杂的示例,因此我无法找出重写上述示例的最佳方法.

I've seen some helpful answers on SO, but they have limited less complex examples, and I can't figure out the best approach for a rewrite of the above.

该部分的多次重复-"Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers ..."对我来说似乎很可疑,因为可以单独完成然后再发出另一个查询,但是我不知道如何构建它,或者这种方法是否甚至可以提高性能.

The multiple repeat of the part -"Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers..." looks suspicious to me like it could be done separately and then another query issued, but I don't know how to build this up or whether such an approach would even improve performance.

问题:

  1. 如何在确保最终结果集相同的情况下将此查询重写为更明智的方法以提高性能?

  1. How do I rewrite this query to something more sensible to improve performance, while ensuring that the eventual result set is the same?

给出最后一行:.Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.AnswerMetadatas")
为什么需要所有中间线? (我猜这是因为某些联接可能不是左联接?)

Given the last line: .Include("Sections.Questions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.SubQuestions.Answers.AnswerMetadatas")
Why do I need all the intermediate lines? (I guess it's because some of the joins may not be left joins?)

EF版本信息:软件包id ="EntityFramework" version ="6.2.0" targetFramework ="net452"

EF Version info: package id="EntityFramework" version="6.2.0" targetFramework="net452"

我意识到这个问题有点废话,但是我正试图从一无所知的角度来尽快解决.

I realise this question is a bit rubbish, but I'm trying to resolve as fast as I can from a point of no knowledge.

修改

考虑了半天之后,并感谢StuartLC的建议,我提出了一些选择:

After mulling over this for half a day and thanks to StuartLC's suggestions I came up with some options:

差-拆分查询,使其执行多次往返以获取数据.这可能会为用户提供稍慢的体验,但是将停止SQL超时. (这并不比仅增加EF命令超时要好得多.)

Poor - split the query so that it performs multiple round-trips to fetch the data. This is likely to provide a slightly slower experience for the user, but will stop the SQL timing out. (This is not much better than just increasing the EF command timeout).

好-更改子表上的聚集索引以使其父表的外键聚集(假设您没有很多插入操作).

Good - change the clustered indexing on child tables to be clustered by their parent's foreign key (assuming you don't have a lot of insert operations).

好-更改代码以仅查询前几个级别,然后在此级别以下进行延迟加载(单独的数据库命中),即删除除前几个Includes之外的所有内容,然后更改ICollections-Answers.SubQuestions,AnswerMetadatas,和Question.Answers都是虚拟的.大概使这些虚拟化的不利之处在于,如果应用程序中的任何(其他)现有代码都希望这些ICollection属性被急切加载,则您可能必须更新该代码(即,如果希望/需要它们立即在该代码中加载) ).我将进一步研究此选项.进一步编辑-不幸的是,如果由于自引用循环而需要序列化响应,则此方法将无效.

Good - change the code to only query the first few levels and lazy-load (separate db hit) anything below this, i.e. remove all but the top few Includes, then change the ICollections - Answers.SubQuestions, Answers.AnswerMetadatas, and Question.Answers to all be virtual. Presumably the downside to making these virtual is that if any (other) existing code in the app expects those ICollection properties to be eager-loaded, you may have to update that code (i.e. if you want/need them to load immediately within that code). I will be investigating this option further. Further edit - unfortunately this won't work if you need to serialize the response due to self-referencing loop.

不平凡-手动编写一个sql存储的proc/view并建立一个指向它的新EF对象.

Non-trivial - Write a sql stored proc/view manually and build a new EF object pointed at it.

长期条款

最明显,最佳但最耗时的选项-重写应用程序设计,因此它不需要在单个api调用中就包含整个数据树,也不需要使用以下选项:

The obvious, best, but most time-consuming option - rewrite the app design, so it doesn't need the whole data tree in a single api call, or go with the option below:

重写应用程序,以NoSQL方式存储数据(例如,将对象树存储为json,因此不存在联接).正如Stuart所说,如果您需要以其他方式(通过问卷调查表ID以外的方式)过滤数据,则这不是一个好选择,您可能需要这样做.另一种选择是根据需要部分存储NoSQL样式和部分关系.

Rewrite the app to store the data in a NoSQL fashion (e.g. store the object tree as json so there are no joins). As Stuart mentioned this is not a good option if you need to filter the data in other ways (via something other than the questionnaireId), which you might need to do. Another alternative is to partially store NoSQL-style and partially relational as required.

推荐答案

首先,必须说这不是一个简单的查询.看来我们有:

First up, it must be said that this isn't a trivial query. Seemingly we have:

    通过嵌套的问答树实现
  • 6个级别的递归
  • 通过急切加载的.Include
  • 以这种方式连接了总共20个表
  • 6 levels of recursion through a nested question-answer tree
  • A total of 20 tables are joined in this way via eager loaded .Include

我首先要花时间确定此查询在您的应用中的使用位置以及需要使用的频率,尤其要注意最常使用的位置.

I would first take the time to determine where this query is used in your app, and how often it is needed, with particular attention to where it is used most frequently.

YAGNI优化

一个明显的起点是查看查询在应用程序中的使用位置,如果您一直不需要整棵树,那么建议您不要加入嵌套的问题和答案表并非在查询的所有用法中都是必需的.

The obvious place to start is to see where the query is used in your app, and if you don't need the whole tree all the time, then suggest you don't join in the nested question and answer tables if they are not needed in all usages of the query.

此外,还可以动态地在IQueryable上进行组合,因此,如果您的查询有多个用例(例如,从摘要"屏幕中不需要问题和答案,而详细信息树可以需要它们),那么您可以执行以下操作:

Also, it is possible to compose on IQueryable dynamically, so if there are multiple use cases for your query (e.g. from a "Summary" screen which doesn't need the question + answers, and a details tree which does need them), then you can do something like:

var questionnaireQuery = _myContext.Questionnaires
        .Include(q => q.Sections)
        .Include(q => q.QuestionnaireCommonFields);

// Conditionally extend the joins
if (mustIncludeQandA)
{
     questionnaireQuery = questionnaireQuery
       .Include(q => q.Sections.Select(s => s.Questions.Select(q => q.Answers..... etc);
}

// Execute + materialize the query
var questionnaires = await questionnaireQuery
    .Where(q => questionnaireIds.Contains(q.Id))
    .ToListAsync()
    .ConfigureAwait(false);

SQL优化

如果您确实必须始终提取整个树,请查看您的SQL表设计和索引.

If you really have to fetch the whole tree all the time, then look at your SQL table design and indexing.

1)过滤器

.Where(q => questionnaireIds.Contains(q.Id))

(我在这里假设使用SQL Server术语,但是这些概念也适用于大多数其他RDBM.)

(I'm assuming SQL Server terminology here, but the concepts are applicable in most other RDBMs as well.)

我猜想Questionnaires.Id是集群的主键,因此将被索引,但是只需检查是否合理(在SSMS中看起来会是PK_Questionnaires CLUSTERED UNIQUE PRIMARY KEY)

I'm guessing Questionnaires.Id is a clustered primary key, so will be indexed, but just check for sanity (it will look something PK_Questionnaires CLUSTERED UNIQUE PRIMARY KEY in SSMS)

2)确保所有子表在其外键上都具有返回到父表的索引.

2) Ensure all child tables have indexes on their foreign keys back to the parent.

例如q => q.Sections表示表Sections具有返回到Questionnaires.Id的外键-确保该表上至少具有非聚集索引-EF代码首先应自动执行此操作,但再次进行检查以确保.

e.g. q => q.Sections means that table Sections has a foreign key back to Questionnaires.Id - make sure this has at least a non-clustered index on it - EF Code First should do this automagically, but again, check to be sure.

这看起来像列Sections(QuestionairreId)上的IX_QuestionairreId NONCLUSTERED

3)考虑更改子表上的聚簇索引,以通过其父级的外键聚类.将Section集群为Questions.SectionId.这样会将与同一父级相关的所有子行保持在一起,并减少了SQL需要获取的数据页数. 首先在EF代码中实现并非易事,但您的DBA可以帮助您完成此操作,这可能是自定义的步骤.

3) Consider changing the clustered indexing on child tables to be clustered by their parent's foreign key, e.g. Cluster Section by Questions.SectionId. This will keep all child rows related to the same parent together, and reduce the number of pages of data that SQL needs to fetch. It isn't trivial to achieve in EF code first, but your DBA can assist you in doing this, perhaps as a custom step.

其他评论

如果此查询仅用于查询数据,而不用于更新或删除,则添加.AsNoTracking()可以略微减少EF的内存消耗和内存性能.

If this query is only used to query data, not to update or delete, then adding .AsNoTracking() will marginally reduce the memory consumption and in-memory performance of EF.

与性能无关,但是您混合了弱类型("Sections")和强类型.Include语句(q => q.QuestionnaireCommonFields).我建议改用强类型包含,以提高编译时的安全性.

Unrelated to performance, but you've mixed the weakly typed ("Sections") and strongly typed .Include statements (q => q.QuestionnaireCommonFields). I would suggest moving to the strongly typed includes for the additional compile time safety.

请注意,您只需要为渴望加载的最长链指定包含路径-这显然会迫使EF也包含所有更高级别.也就是说,您可以将20个.Include语句减少为2个.这将更有效地完成相同的工作:

Note that you only need to specify the include path for the longest chain(s) which are eager loaded - this will obviously force EF to include all higher levels too. i.e. You can reduce the 20 .Include statements to just 2. This will do the same job more efficiently:

.Include(q => q.QuestionnaireCommonFields)
.Include(q => q.Sections.Select(s => s.Questions.Select(q => q.Answers .... etc))

只要存在1:1:1关系,就需要.Select,但是如果导航为1:1(或N:1),则不需要.Select,例如City c => c.Country

You'll need .Select any time there is a 1:Many relationship, but if the navigation is 1:1 (or N:1) then you don't need the .Select, e.g. City c => c.Country

重新设计

最后但并非最不重要的一点是,如果仅从顶层(即Questionnaires)过滤数据,并且通常整个一次都添加或更新整个Questionairre树"(聚合根),那么您可能会尝试以NoSQL方式对问题和答案树进行数据建模,例如只需将整个树建模为XML或JSON,然后将整个树视为长字符串即可.这将完全避免所有令人讨厌的连接.您将需要在数据层中执行自定义反序列化步骤.如果您需要从树中的节点中进行过滤,则后一种方法将不是很有用(例如,像这样的查询可让我找到所有Questionairre,其中对问题5的SubAnswer为"Foo" 不会是非常适合)

Last but not least, if data is only ever filtered from the top level (i.e. Questionnaires), and if the whole questionairre 'tree' (Aggregate Root) is typically always added or updated all at once, then you might try and approach the data modelling of the question and answer tree in a NoSQL way, e.g. by simply modelling the whole tree as XML or JSON, and then treat the whole tree as a long string. This will avoid all the nasty joins altogether. You would need a custom deserialization step in your data tier. This latter approach won't be very useful if you need to filter from nodes in the tree (i.e. a Query like find me all questionairre's where the SubAnswer to Question 5 is "Foo" won't be a good fit)

这篇关于改善大型EF多级包含的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆