为什么 .Contains 很慢?通过主键获取多个实体的最有效方法? [英] Why is .Contains slow? Most efficient way to get multiple entities by primary key?

查看:98
本文介绍了为什么 .Contains 很慢?通过主键获取多个实体的最有效方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通过主键选择多个实体的最有效方法是什么?

public IEnumerable<Models.Image> GetImagesById(IEnumerable<int> ids)
{

    //return ids.Select(id => Images.Find(id));       //is this cool?
    return Images.Where( im => ids.Contains(im.Id));  //is this better, worse or the same?
    //is there a (better) third way?

}

我意识到我可以做一些性能测试来比较,但我想知道实际上是否有比两者更好的方法,并且我正在寻找关于这两个查询之间的区别的一些启示(如果有的话)一次它们已被翻译".

I realise that I could do some performance tests to compare, but I am wondering if there is in fact a better way than both, and am looking for some enlightenment on what the difference between these two queries is, if any, once they have been 'translated'.

推荐答案

更新:通过在 EF6 中添加 InExpression,处理 Enumerable.Contains 的性能显着提高.此答案中的分析很棒,但自 2013 年以来已基本过时.

在实体框架中使用 Contains 实际上非常慢.它确实在 SQL 中转换为 IN 子句,并且 SQL 查询本身执行得很快.但问题和性能瓶颈在于从 LINQ 查询到 SQL 的转换.将创建的表达式树被扩展为一长串 OR 连接,因为没有表示 IN 的本机表达式.当创建 SQL 时,这个许多 OR 的表达式被识别并折叠回 SQL IN 子句.

Using Contains in Entity Framework is actually very slow. It's true that it translates into an IN clause in SQL and that the SQL query itself is executed fast. But the problem and the performance bottleneck is in the translation from your LINQ query into SQL. The expression tree which will be created is expanded into a long chain of OR concatenations because there is no native expression which represents an IN. When the SQL is created this expression of many ORs is recognized and collapsed back into the SQL IN clause.

这并不意味着使用 Contains 比对 ids 集合(您的第一个选项)中的每个元素发出一个查询更糟糕.它可能仍然更好 - 至少对于不是太大的集合.但对于大型收藏来说,这真的很糟糕.我记得前段时间我测试了一个 Contains 查询,其中包含大约 12.000 个元素,尽管 SQL 中的查询在不到一秒的时间内执行,但它仍然可以工作但需要大约一分钟.

This does not mean that using Contains is worse than issuing one query per element in your ids collection (your first option). It's probably still better - at least for not too large collections. But for large collections it is really bad. I remember that I had tested some time ago a Contains query with about 12.000 elements which worked but took around a minute even though the query in SQL executed in less than a second.

在每次往返的 Contains 表达式中使用较少数量的元素来测试多次往返数据库的组合的性能可能是值得的.

It might be worth to test the performance of a combination of multiple roundtrips to the database with a smaller number of elements in a Contains expression for each roundtrip.

此处显示并解释了这种方法以及将 Contains 与实体框架一起使用的限制:

This approach and also the limitations of using Contains with Entity Framework is shown and explained here:

为什么contains() 运算符如此显着地降低了实体框架的性能?

在这种情况下,原始 SQL 命令可能会表现最佳,这意味着您调用 dbContext.Database.SqlQuery(sqlString)dbContext.Images.SqlQuery(sqlString) 其中 sqlString 是@Rune 的答案中显示的 SQL.

It's possible that a raw SQL command will perform best in this situation which would mean that you call dbContext.Database.SqlQuery<Image>(sqlString) or dbContext.Images.SqlQuery(sqlString) where sqlString is the SQL shown in @Rune's answer.

编辑

以下是一些测量值:

我在一个有 550000 条记录和 11 列(ID 从 1 开始没有间隙)的表上完成了这项工作,并随机选取了 20000 个 ID:

I have done this on a table with 550000 records and 11 columns (IDs start from 1 without gaps) and picked randomly 20000 ids:

using (var context = new MyDbContext())
{
    Random rand = new Random();
    var ids = new List<int>();
    for (int i = 0; i < 20000; i++)
        ids.Add(rand.Next(550000));

    Stopwatch watch = new Stopwatch();
    watch.Start();

    // here are the code snippets from below

    watch.Stop();
    var msec = watch.ElapsedMilliseconds;
}

测试 1

var result = context.Set<MyEntity>()
    .Where(e => ids.Contains(e.ID))
    .ToList();

结果 -> 毫秒 = 85.5 秒

测试 2

var result = context.Set<MyEntity>().AsNoTracking()
    .Where(e => ids.Contains(e.ID))
    .ToList();

结果 -> 毫秒 = 84.5 秒

AsNoTracking 的这种微小效果非常不寻常.它表明瓶颈不是对象物化(而不是如下所示的 SQL).

This tiny effect of AsNoTracking is very unusual. It indicates that the bottleneck is not object materialization (and not SQL as shown below).

对于这两个测试,可以在 SQL Profiler 中看到 SQL 查询到达数据库的时间很晚.(我没有准确测量,但它晚于 70 秒.)显然,将此 LINQ 查询转换为 SQL 的成本非常高.

For both tests it can be seen in SQL Profiler that the SQL query arrives at the database very late. (I didn't measure exactly but it was later than 70 seconds.) Obviously the translation of this LINQ query into SQL is very expensive.

测试 3

var values = new StringBuilder();
values.AppendFormat("{0}", ids[0]);
for (int i = 1; i < ids.Count; i++)
    values.AppendFormat(", {0}", ids[i]);

var sql = string.Format(
    "SELECT * FROM [MyDb].[dbo].[MyEntities] WHERE [ID] IN ({0})",
    values);

var result = context.Set<MyEntity>().SqlQuery(sql).ToList();

结果 -> 毫秒 = 5.1 秒

测试 4

// same as Test 3 but this time including AsNoTracking
var result = context.Set<MyEntity>().SqlQuery(sql).AsNoTracking().ToList();

结果 -> 毫秒 = 3.8 秒

这次禁用跟踪的效果更加明显.

This time the effect of disabling tracking is more noticable.

测试 5

// same as Test 3 but this time using Database.SqlQuery
var result = context.Database.SqlQuery<MyEntity>(sql).ToList();

结果 -> 毫秒 = 3.7 秒

我的理解是context.Database.SqlQuery(sql)context.Set().SqlQuery(sql).AsNoTracking(),因此测试 4 和测试 5 之间没有预期差异.

My understanding is that context.Database.SqlQuery<MyEntity>(sql) is the same as context.Set<MyEntity>().SqlQuery(sql).AsNoTracking(), so there is no difference expected between Test 4 and Test 5.

(由于随机 id 选择后可能存在重复,结果集的长度并不总是相同,但它总是在 19600 到 19640 个元素之间.)

(The length of the result sets was not always the same due to possible duplicates after the random id selection but it was always between 19600 and 19640 elements.)

编辑 2

测试 6

即使是到数据库的 20000 次往返也比使用 Contains 更快:

Even 20000 roundtrips to the database are faster than using Contains:

var result = new List<MyEntity>();
foreach (var id in ids)
    result.Add(context.Set<MyEntity>().SingleOrDefault(e => e.ID == id));

结果 -> 毫秒 = 73.6 秒

请注意,我使用了 SingleOrDefault 而不是 Find.使用与 Find 相同的代码非常慢(我在几分钟后取消了测试),因为 Find 在内部调用 DetectChanges.禁用自动更改检测 (context.Configuration.AutoDetectChangesEnabled = false) 导致与 SingleOrDefault 大致相同的性能.使用 AsNoTracking 可以将时间减少一到两秒.

Note that I have used SingleOrDefault instead of Find. Using the same code with Find is very slow (I cancelled the test after several minutes) because Find calls DetectChanges internally. Disabling auto change detection (context.Configuration.AutoDetectChangesEnabled = false) leads to roughly the same performance as SingleOrDefault. Using AsNoTracking reduces the time by one or two seconds.

测试是在同一台机器上使用数据库客户端(控制台应用程序)和数据库服务器完成的.由于多次往返,使用远程"数据库的最后结果可能会变得更糟.

Tests were done with database client (console app) and database server on the same machine. The last result might get significantly worse with a "remote" database due to the many roundtrips.

这篇关于为什么 .Contains 很慢?通过主键获取多个实体的最有效方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆