为什么。载慢?最有效的方法通过主键来获得多个实体? [英] Why is .Contains slow? Most efficient way to get multiple entities by primary key?

查看:159
本文介绍了为什么。载慢?最有效的方法通过主键来获得多个实体?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么是按主键来选择多个实体的最有效方法是什么?

public IEnumerable<Models.Image> GetImagesById(IEnumerable<int> ids)
{

    //return ids.Select(id => Images.Find(id));       //is this cool?
    return Images.Where( im => ids.Contains(im.Id));  //is this better, worse or the same?
    //is there a (better) third way?

}

我知道我可以做一些性能测试来比较,但我想知道是否有实际上比这两个更好的办法,并正在寻找这两个查询之间的区别是什么一些启示,如果有的话,一旦他们已被翻译。

I realise that I could do some performance tests to compare, but I am wondering if there is in fact a better way than both, and am looking for some enlightenment on what the difference between these two queries is, if any, once they have been 'translated'.

推荐答案

使用包含在实体框架其实是很慢的。这是真的,它转换成SQL的条款和自身快速执行的SQL查询。但问题和性能瓶颈是在您的LINQ查询到SQL翻译。这将创建前pression树被扩展成串联的长链,因为没有提供原生的前pression从而重新presents的。当SQL创建的这个前pression许多 s被认可和倒塌回SQL 条款。

Using Contains in Entity Framework is actually very slow. It's true that it translates into an IN clause in SQL and that the SQL query itself is executed fast. But the problem and the performance bottleneck is in the translation from your LINQ query into SQL. The expression tree which will be created is expanded into a long chain of OR concatenations because there is no native expression which represents an IN. When the SQL is created this expression of many ORs is recognized and collapsed back into the SQL IN clause.

这并不意味着使用包含比你的 IDS 集合发行的每个单元一个查询恶化(您第一个选项)。这可能仍好 - 至少没有太大的集合。但对于大集合这是非常糟糕的。我记得我曾测试前段时间包含查询约12.000元件,其工作,但前后花了一分钟,即使在不到一秒钟执行的SQL查询。

This does not mean that using Contains is worse than issuing one query per element in your ids collection (your first option). It's probably still better - at least for not too large collections. But for large collections it is really bad. I remember that I had tested some time ago a Contains query with about 12.000 elements which worked but took around a minute even though the query in SQL executed in less than a second.

这可能是值得与在包含前pression元件为每个往返的较小数目来测试多个往返的组合的性能到数据库

It might be worth to test the performance of a combination of multiple roundtrips to the database with a smaller number of elements in a Contains expression for each roundtrip.

此方法,并使用也限制包含与实体框架显示在这里解释的:

This approach and also the limitations of using Contains with Entity Framework is shown and explained here:

<一个href=\"http://stackoverflow.com/questions/7897630/why-does-the-contains-operator-degrade-entity-frameworks-performance-so-drama/7936350#7936350\">Why是否含有()运算符降低实体框架的表现如此巨大?

这可能是一个原始的SQL命令将在此情况下表现最好这将意味着你叫 dbContext.Database.SqlQuery&lt;图像&GT;(的SqlString) dbContext.Images.SqlQuery(的SqlString),其中的SqlString 是@所示的SQL符文的答案。

It's possible that a raw SQL command will perform best in this situation which would mean that you call dbContext.Database.SqlQuery<Image>(sqlString) or dbContext.Images.SqlQuery(sqlString) where sqlString is the SQL shown in @Rune's answer.

修改

下面是一些测量:

我曾与55万的记录和11列(编号从1开始无间隙),这样做在桌子上,拿起随机20000 IDS:

I have done this on a table with 550000 records and 11 columns (IDs start from 1 without gaps) and picked randomly 20000 ids:

using (var context = new MyDbContext())
{
    Random rand = new Random();
    var ids = new List<int>();
    for (int i = 0; i < 20000; i++)
        ids.Add(rand.Next(550000));

    Stopwatch watch = new Stopwatch();
    watch.Start();

    // here are the code snippets from below

    watch.Stop();
    var msec = watch.ElapsedMilliseconds;
}

测试1

var result = context.Set<MyEntity>()
    .Where(e => ids.Contains(e.ID))
    .ToList();

结果 - > 毫秒= 85.5秒

测试2

var result = context.Set<MyEntity>().AsNoTracking()
    .Where(e => ids.Contains(e.ID))
    .ToList();

结果 - > 毫秒= 84.5秒

AsNoTracking 的这个微小的影响是非常不寻常的。这表明,瓶颈不是对象具体化(和如下图所​​示未SQL)。

This tiny effect of AsNoTracking is very unusual. It indicates that the bottleneck is not object materialization (and not SQL as shown below).

有关两个测试它可以在SQL探查可以看出,SQL查询在数据库到达很晚。 (我没有精确计算,但它是晚于70秒。)显然,这LINQ查询到SQL的翻译是非常昂贵的。

For both tests it can be seen in SQL Profiler that the SQL query arrives at the database very late. (I didn't measure exactly but it was later than 70 seconds.) Obviously the translation of this LINQ query into SQL is very expensive.

测试3

var values = new StringBuilder();
values.AppendFormat("{0}", ids[0]);
for (int i = 1; i < ids.Count; i++)
    values.AppendFormat(", {0}", ids[i]);

var sql = string.Format(
    "SELECT * FROM [MyDb].[dbo].[MyEntities] WHERE [ID] IN ({0})",
    values);

var result = context.Set<MyEntity>().SqlQuery(sql).ToList();

结果 - > 毫秒= 5.1秒

测试4

// same as Test 3 but this time including AsNoTracking
var result = context.Set<MyEntity>().SqlQuery(sql).AsNoTracking().ToList();

结果 - > 毫秒= 3.8秒

这一次禁用跟踪的效果更noticable。

This time the effect of disabling tracking is more noticable.

测试5

// same as Test 3 but this time using Database.SqlQuery
var result = context.Database.SqlQuery<MyEntity>(sql).ToList();

结果 - > 毫秒= 3.7秒

我的理解是, context.Database.SqlQuery&LT; myEntity所&GT;(SQL)相同 context.Set&LT; myEntity所&GT;()。 SqlQuery类(SQL).AsNoTracking(),所以没有区别有望测试4和测试5。之间

My understanding is that context.Database.SqlQuery<MyEntity>(sql) is the same as context.Set<MyEntity>().SqlQuery(sql).AsNoTracking(), so there is no difference expected between Test 4 and Test 5.

(结果集的长度并不总是相同的,由于随机ID选择之后可能重复,但它19600和19640元件之间总是。)

(The length of the result sets was not always the same due to possible duplicates after the random id selection but it was always between 19600 and 19640 elements.)

编辑2

测试6

甚至20000往返到数据库比使用快包含

Even 20000 roundtrips to the database are faster than using Contains:

var result = new List<MyEntity>();
foreach (var id in ids)
    result.Add(context.Set<MyEntity>().SingleOrDefault(e => e.ID == id));

结果 - > 毫秒= 73.6秒

请注意,我已经使用的SingleOrDefault 而不是的查找。使用相同的code。与查找很慢(我取消了几分钟后测试),因为查找通话 DetectChanges 内部。禁用自动变化检测( context.Configuration.AutoDetectChangesEnabled = FALSE )导致大致相同的性能的SingleOrDefault 。使用 AsNoTracking 减少了一两秒钟的时间。

Note that I have used SingleOrDefault instead of Find. Using the same code with Find is very slow (I cancelled the test after several minutes) because Find calls DetectChanges internally. Disabling auto change detection (context.Configuration.AutoDetectChangesEnabled = false) leads to roughly the same performance as SingleOrDefault. Using AsNoTracking reduces the time by one or two seconds.

测试使用数据库客户端(控制台应用程序)和数据库服务器在同一台机器上完成。最后的结果可能会与远程数据库显著恶化,由于许多往返。

Tests were done with database client (console app) and database server on the same machine. The last result might get significantly worse with a "remote" database due to the many roundtrips.

这篇关于为什么。载慢?最有效的方法通过主键来获得多个实体?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆