为什么?通过主键获取多个实体的最有效的方法? [英] Why is .Contains slow? Most efficient way to get multiple entities by primary key?

查看:100
本文介绍了为什么?通过主键获取多个实体的最有效的方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

通过主键选择多个实体最有效的方式是什么?

  public IEnumerable< Models.Image> GetImagesById(IEnumerable< int> ids)
{

//返回ids.Select(id => Images.Find(id)); //这很酷吗?
return Images.Where(im => ids.Contains(im.Id)); //是更好,更糟还是一样吗?
//有没有(更好的)第三种方式?

}

我意识到我可以做一些性能测试来比较,但我想知道事实上是否比两者都好一些,而且正在寻找一些启示,即这两个查询之间的区别,如果有的话,一旦被翻译了。

解决方案

使用在实体框架中包含实际上非常慢。这是真的,它转换成SQL中的 IN 子句,并且SQL查询本身执行得很快。但问题和性能瓶颈在于将LINQ查询转换为SQL。将创建的表达式树将扩展为 OR 连接的长链,因为没有表示 IN 。当创建SQL时,许多 OR s的表达式将被识别并折回到SQL IN 子句中。 p>

这并不意味着使用包含比您的 ID中的每个元素发出一个查询更差 collection(你的第一个选项)。这可能还是更好 - 至少对于不太大的收藏。但是对于大集合来说真的很糟糕。我记得以前我已经测试过一个包含查询,大约有12.000个元素有效但是花了大约一分钟,即使在SQL执行的查询不到一秒钟。 / p>

可能值得在中包含较少数量的元素来测试多次往返数据库的组合性能包含表达式为每个往返。



此方法以及使用的限制包含实体框架的并在此解释:





这可能是一个原始SQL命令在这种情况下将表现最好,这意味着您将调用 dbContext.Database.SqlQuery< Image>(sqlString) dbContext.Images.SqlQuery(sqlString)其中 sqlString 是@ Rune的答案中显示的SQL。



修改



以下是一些测量:



我已经在一个有550000条记录和11列(ID从1开始没有间隙)的表上完成,并随机挑选了20000个ids:

  using(var context = new MyDbContext())
{
随机rand = new Random();
var ids = new List< int>(); (int i = 0; i< 20000; i ++)
ids.Add(rand.Next(550000));


秒表watch =新秒表();
watch.Start();

//这里是以下代码片段

watch.Stop();
var msec = watch.ElapsedMilliseconds;
}

测试1

  var result = context.Set&MyEntity>()
.Where(e => ids.Contains(e.ID))
.ToList();

结果 - > msec = 85.5秒



测试2

  var result = context.Set&MyEntity> ().AsNoTracking()
.Where(e => ids.Contains(e.ID))
.ToList();

结果 - > msec = 84.5秒



AsNoTracking 的这个微小效果是非常不寻常的。这表明瓶颈不是对象实现(而不是SQL,如下所示)。



对于这两个测试,可以在SQL Profiler中看到SQL查询到达数据库很晚(我没有完全测量但是晚于70秒)。显然,这个LINQ查询转换成SQL是非常昂贵的。



测试3

  var values = new StringBuilder(); 
values.AppendFormat({0},ids [0]); (int i = 1; i< iids.Count; i ++)
values.AppendFormat(,{0},ids [i]);

var sql = string.Format(
SELECT * FROM [MyDb]。[dbo]。[MyEntities] WHERE [ID] IN({0}),
值);

var result = context.Set&MyEntity>()。SqlQuery(sql).ToList();

结果 - > msec = 5.1秒



测试4

  //与测试3相同,但这次包括AsNoTracking 
var result = context.Set&MyEntity>()。SqlQuery(sql).AsNoTracking()。ToList();

结果 - > msec = 3.8秒



此次禁用跟踪的效果更为显着。



测试5

  //与测试3相同,但这次使用Database.SqlQuery 
var result = context.Database.SqlQuery&MyEntity>(sql).ToList ();

结果 - > msec = 3.7秒



我的理解是, context.Database.SqlQuery< MyEntity>(sql) context.Set< MyEntity>()。SqlQuery(sql).AsNoTracking(),所以在测试4和测试5之间没有任何区别。



结果集的长度并不总是相同的,因为随机选择后可能会重复,但总是在19600和19640之间。)



编辑2



测试6



即使是20000往返数据库比使用更快包含

  var result = new List< MyEntity> ;(); 
foreach(ids中的var id)
result.Add(context.Set&MyEntity>()。SingleOrDefault(e => e.ID == id));

结果 - > msec = 73.6秒



请注意,我已经使用 SingleOrDefault 而不是查找。使用与 Find 相同的代码非常慢(我在几分钟后取消了测试),因为查找调用 DetectChanges 内部。禁用自动更改检测( context.Configuration.AutoDetectChangesEnabled = false )会导致与 SingleOrDefault 大致相同的性能。使用 AsNoTracking 将时间缩短一到两秒。



使用数据库客户端(控制台应用程序)和数据库服务器在同一台机器上。由于多次往返,最后一个结果可能会因为远程数据库而显着恶化。


What's the most efficient way to select multiple entities by primary key?

public IEnumerable<Models.Image> GetImagesById(IEnumerable<int> ids)
{

    //return ids.Select(id => Images.Find(id));       //is this cool?
    return Images.Where( im => ids.Contains(im.Id));  //is this better, worse or the same?
    //is there a (better) third way?

}

I realise that I could do some performance tests to compare, but I am wondering if there is in fact a better way than both, and am looking for some enlightenment on what the difference between these two queries is, if any, once they have been 'translated'.

解决方案

Using Contains in Entity Framework is actually very slow. It's true that it translates into an IN clause in SQL and that the SQL query itself is executed fast. But the problem and the performance bottleneck is in the translation from your LINQ query into SQL. The expression tree which will be created is expanded into a long chain of OR concatenations because there is no native expression which represents an IN. When the SQL is created this expression of many ORs is recognized and collapsed back into the SQL IN clause.

This does not mean that using Contains is worse than issuing one query per element in your ids collection (your first option). It's probably still better - at least for not too large collections. But for large collections it is really bad. I remember that I had tested some time ago a Contains query with about 12.000 elements which worked but took around a minute even though the query in SQL executed in less than a second.

It might be worth to test the performance of a combination of multiple roundtrips to the database with a smaller number of elements in a Contains expression for each roundtrip.

This approach and also the limitations of using Contains with Entity Framework is shown and explained here:

Why does the Contains() operator degrade Entity Framework's performance so dramatically?

It's possible that a raw SQL command will perform best in this situation which would mean that you call dbContext.Database.SqlQuery<Image>(sqlString) or dbContext.Images.SqlQuery(sqlString) where sqlString is the SQL shown in @Rune's answer.

Edit

Here are some measurements:

I have done this on a table with 550000 records and 11 columns (IDs start from 1 without gaps) and picked randomly 20000 ids:

using (var context = new MyDbContext())
{
    Random rand = new Random();
    var ids = new List<int>();
    for (int i = 0; i < 20000; i++)
        ids.Add(rand.Next(550000));

    Stopwatch watch = new Stopwatch();
    watch.Start();

    // here are the code snippets from below

    watch.Stop();
    var msec = watch.ElapsedMilliseconds;
}

Test 1

var result = context.Set<MyEntity>()
    .Where(e => ids.Contains(e.ID))
    .ToList();

Result -> msec = 85.5 sec

Test 2

var result = context.Set<MyEntity>().AsNoTracking()
    .Where(e => ids.Contains(e.ID))
    .ToList();

Result -> msec = 84.5 sec

This tiny effect of AsNoTracking is very unusual. It indicates that the bottleneck is not object materialization (and not SQL as shown below).

For both tests it can be seen in SQL Profiler that the SQL query arrives at the database very late. (I didn't measure exactly but it was later than 70 seconds.) Obviously the translation of this LINQ query into SQL is very expensive.

Test 3

var values = new StringBuilder();
values.AppendFormat("{0}", ids[0]);
for (int i = 1; i < ids.Count; i++)
    values.AppendFormat(", {0}", ids[i]);

var sql = string.Format(
    "SELECT * FROM [MyDb].[dbo].[MyEntities] WHERE [ID] IN ({0})",
    values);

var result = context.Set<MyEntity>().SqlQuery(sql).ToList();

Result -> msec = 5.1 sec

Test 4

// same as Test 3 but this time including AsNoTracking
var result = context.Set<MyEntity>().SqlQuery(sql).AsNoTracking().ToList();

Result -> msec = 3.8 sec

This time the effect of disabling tracking is more noticable.

Test 5

// same as Test 3 but this time using Database.SqlQuery
var result = context.Database.SqlQuery<MyEntity>(sql).ToList();

Result -> msec = 3.7 sec

My understanding is that context.Database.SqlQuery<MyEntity>(sql) is the same as context.Set<MyEntity>().SqlQuery(sql).AsNoTracking(), so there is no difference expected between Test 4 and Test 5.

(The length of the result sets was not always the same due to possible duplicates after the random id selection but it was always between 19600 and 19640 elements.)

Edit 2

Test 6

Even 20000 roundtrips to the database are faster than using Contains:

var result = new List<MyEntity>();
foreach (var id in ids)
    result.Add(context.Set<MyEntity>().SingleOrDefault(e => e.ID == id));

Result -> msec = 73.6 sec

Note that I have used SingleOrDefault instead of Find. Using the same code with Find is very slow (I cancelled the test after several minutes) because Find calls DetectChanges internally. Disabling auto change detection (context.Configuration.AutoDetectChangesEnabled = false) leads to roughly the same performance as SingleOrDefault. Using AsNoTracking reduces the time by one or two seconds.

Tests were done with database client (console app) and database server on the same machine. The last result might get significantly worse with a "remote" database due to the many roundtrips.

这篇关于为什么?通过主键获取多个实体的最有效的方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆