为什么?通过主键获取多个实体的最有效的方法? [英] Why is .Contains slow? Most efficient way to get multiple entities by primary key?
问题描述
通过主键选择多个实体最有效的方式是什么?
public IEnumerable< Models.Image> GetImagesById(IEnumerable< int> ids)
{
//返回ids.Select(id => Images.Find(id)); //这很酷吗?
return Images.Where(im => ids.Contains(im.Id)); //是更好,更糟还是一样吗?
//有没有(更好的)第三种方式?
}
我意识到我可以做一些性能测试来比较,但我想知道事实上是否比两者都好一些,而且正在寻找一些启示,即这两个查询之间的区别,如果有的话,一旦被翻译了。
使用在实体框架中包含
实际上非常慢。这是真的,它转换成SQL中的 IN
子句,并且SQL查询本身执行得很快。但问题和性能瓶颈在于将LINQ查询转换为SQL。将创建的表达式树将扩展为 OR
连接的长链,因为没有表示 IN
。当创建SQL时,许多 OR
s的表达式将被识别并折回到SQL IN
子句中。 p>
这并不意味着使用包含
比您的 ID中的每个元素发出一个查询更差
collection(你的第一个选项)。这可能还是更好 - 至少对于不太大的收藏。但是对于大集合来说真的很糟糕。我记得以前我已经测试过一个包含
查询,大约有12.000个元素有效但是花了大约一分钟,即使在SQL执行的查询不到一秒钟。 / p>
可能值得在中包含较少数量的元素来测试多次往返数据库的组合性能包含
表达式为每个往返。
此方法以及使用的限制包含实体框架的
并在此解释:
这可能是一个原始SQL命令在这种情况下将表现最好,这意味着您将调用 dbContext.Database.SqlQuery< Image>(sqlString)
或 dbContext.Images.SqlQuery(sqlString)
其中 sqlString
是@ Rune的答案中显示的SQL。
修改
以下是一些测量:
我已经在一个有550000条记录和11列(ID从1开始没有间隙)的表上完成,并随机挑选了20000个ids:
using(var context = new MyDbContext())
{
随机rand = new Random();
var ids = new List< int>(); (int i = 0; i< 20000; i ++)
ids.Add(rand.Next(550000));
秒表watch =新秒表();
watch.Start();
//这里是以下代码片段
watch.Stop();
var msec = watch.ElapsedMilliseconds;
}
测试1
var result = context.Set&MyEntity>()
.Where(e => ids.Contains(e.ID))
.ToList();
结果 - > msec = 85.5秒
测试2
var result = context.Set&MyEntity> ().AsNoTracking()
.Where(e => ids.Contains(e.ID))
.ToList();
结果 - > msec = 84.5秒
AsNoTracking
的这个微小效果是非常不寻常的。这表明瓶颈不是对象实现(而不是SQL,如下所示)。
对于这两个测试,可以在SQL Profiler中看到SQL查询到达数据库很晚(我没有完全测量但是晚于70秒)。显然,这个LINQ查询转换成SQL是非常昂贵的。
测试3
var values = new StringBuilder();
values.AppendFormat({0},ids [0]); (int i = 1; i< iids.Count; i ++)
values.AppendFormat(,{0},ids [i]);
var sql = string.Format(
SELECT * FROM [MyDb]。[dbo]。[MyEntities] WHERE [ID] IN({0}),
值);
var result = context.Set&MyEntity>()。SqlQuery(sql).ToList();
结果 - > msec = 5.1秒
测试4
//与测试3相同,但这次包括AsNoTracking
var result = context.Set&MyEntity>()。SqlQuery(sql).AsNoTracking()。ToList();
结果 - > msec = 3.8秒
此次禁用跟踪的效果更为显着。
测试5
//与测试3相同,但这次使用Database.SqlQuery
var result = context.Database.SqlQuery&MyEntity>(sql).ToList ();
结果 - > msec = 3.7秒
我的理解是, context.Database.SqlQuery< MyEntity>(sql)
与 context.Set< MyEntity>()。SqlQuery(sql).AsNoTracking()
,所以在测试4和测试5之间没有任何区别。
结果集的长度并不总是相同的,因为随机选择后可能会重复,但总是在19600和19640之间。)
编辑2
测试6
即使是20000往返数据库比使用更快包含
:
var result = new List< MyEntity> ;();
foreach(ids中的var id)
result.Add(context.Set&MyEntity>()。SingleOrDefault(e => e.ID == id));
结果 - > msec = 73.6秒
请注意,我已经使用 SingleOrDefault
而不是查找
。使用与 Find
相同的代码非常慢(我在几分钟后取消了测试),因为查找
调用 DetectChanges
内部。禁用自动更改检测( context.Configuration.AutoDetectChangesEnabled = false
)会导致与 SingleOrDefault
大致相同的性能。使用 AsNoTracking
将时间缩短一到两秒。
使用数据库客户端(控制台应用程序)和数据库服务器在同一台机器上。由于多次往返,最后一个结果可能会因为远程数据库而显着恶化。
What's the most efficient way to select multiple entities by primary key?
public IEnumerable<Models.Image> GetImagesById(IEnumerable<int> ids)
{
//return ids.Select(id => Images.Find(id)); //is this cool?
return Images.Where( im => ids.Contains(im.Id)); //is this better, worse or the same?
//is there a (better) third way?
}
I realise that I could do some performance tests to compare, but I am wondering if there is in fact a better way than both, and am looking for some enlightenment on what the difference between these two queries is, if any, once they have been 'translated'.
Using Contains
in Entity Framework is actually very slow. It's true that it translates into an IN
clause in SQL and that the SQL query itself is executed fast. But the problem and the performance bottleneck is in the translation from your LINQ query into SQL. The expression tree which will be created is expanded into a long chain of OR
concatenations because there is no native expression which represents an IN
. When the SQL is created this expression of many OR
s is recognized and collapsed back into the SQL IN
clause.
This does not mean that using Contains
is worse than issuing one query per element in your ids
collection (your first option). It's probably still better - at least for not too large collections. But for large collections it is really bad. I remember that I had tested some time ago a Contains
query with about 12.000 elements which worked but took around a minute even though the query in SQL executed in less than a second.
It might be worth to test the performance of a combination of multiple roundtrips to the database with a smaller number of elements in a Contains
expression for each roundtrip.
This approach and also the limitations of using Contains
with Entity Framework is shown and explained here:
Why does the Contains() operator degrade Entity Framework's performance so dramatically?
It's possible that a raw SQL command will perform best in this situation which would mean that you call dbContext.Database.SqlQuery<Image>(sqlString)
or dbContext.Images.SqlQuery(sqlString)
where sqlString
is the SQL shown in @Rune's answer.
Edit
Here are some measurements:
I have done this on a table with 550000 records and 11 columns (IDs start from 1 without gaps) and picked randomly 20000 ids:
using (var context = new MyDbContext())
{
Random rand = new Random();
var ids = new List<int>();
for (int i = 0; i < 20000; i++)
ids.Add(rand.Next(550000));
Stopwatch watch = new Stopwatch();
watch.Start();
// here are the code snippets from below
watch.Stop();
var msec = watch.ElapsedMilliseconds;
}
Test 1
var result = context.Set<MyEntity>()
.Where(e => ids.Contains(e.ID))
.ToList();
Result -> msec = 85.5 sec
Test 2
var result = context.Set<MyEntity>().AsNoTracking()
.Where(e => ids.Contains(e.ID))
.ToList();
Result -> msec = 84.5 sec
This tiny effect of AsNoTracking
is very unusual. It indicates that the bottleneck is not object materialization (and not SQL as shown below).
For both tests it can be seen in SQL Profiler that the SQL query arrives at the database very late. (I didn't measure exactly but it was later than 70 seconds.) Obviously the translation of this LINQ query into SQL is very expensive.
Test 3
var values = new StringBuilder();
values.AppendFormat("{0}", ids[0]);
for (int i = 1; i < ids.Count; i++)
values.AppendFormat(", {0}", ids[i]);
var sql = string.Format(
"SELECT * FROM [MyDb].[dbo].[MyEntities] WHERE [ID] IN ({0})",
values);
var result = context.Set<MyEntity>().SqlQuery(sql).ToList();
Result -> msec = 5.1 sec
Test 4
// same as Test 3 but this time including AsNoTracking
var result = context.Set<MyEntity>().SqlQuery(sql).AsNoTracking().ToList();
Result -> msec = 3.8 sec
This time the effect of disabling tracking is more noticable.
Test 5
// same as Test 3 but this time using Database.SqlQuery
var result = context.Database.SqlQuery<MyEntity>(sql).ToList();
Result -> msec = 3.7 sec
My understanding is that context.Database.SqlQuery<MyEntity>(sql)
is the same as context.Set<MyEntity>().SqlQuery(sql).AsNoTracking()
, so there is no difference expected between Test 4 and Test 5.
(The length of the result sets was not always the same due to possible duplicates after the random id selection but it was always between 19600 and 19640 elements.)
Edit 2
Test 6
Even 20000 roundtrips to the database are faster than using Contains
:
var result = new List<MyEntity>();
foreach (var id in ids)
result.Add(context.Set<MyEntity>().SingleOrDefault(e => e.ID == id));
Result -> msec = 73.6 sec
Note that I have used SingleOrDefault
instead of Find
. Using the same code with Find
is very slow (I cancelled the test after several minutes) because Find
calls DetectChanges
internally. Disabling auto change detection (context.Configuration.AutoDetectChangesEnabled = false
) leads to roughly the same performance as SingleOrDefault
. Using AsNoTracking
reduces the time by one or two seconds.
Tests were done with database client (console app) and database server on the same machine. The last result might get significantly worse with a "remote" database due to the many roundtrips.
这篇关于为什么?通过主键获取多个实体的最有效的方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!