可扩展包含针对SQL后端的LINQ方法 [英] Scalable Contains method for LINQ against a SQL backend

查看:127
本文介绍了可扩展包含针对SQL后端的LINQ方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种优雅的方式来执行 以可扩展的方式包含() 语句。



IN 语句



在实体框架和LINQ to SQL中,包含语句被翻译为SQL IN 语句。例如,从这个语句:

  var ids = Enumerable.Range(1,10); 
var courses = Courses.Where(c => ids.Contains(c.CourseID))。ToList();

实体框架将生成

  SELECT 
[Extent1]。[CourseID] AS [CourseID],
[Extent1]。[标题] AS [标题],
[Extent1]。[Credits] AS [Credits],
[Extent1]。[DepartmentID] AS [DepartmentID]
FROM [dbo]。[课程] AS [Extent1]
WHERE [ Extent1]。[CourseID] IN(1,2,3,4,5,6,7,8,9,10)

不幸的是,语句中是不可扩展的。根据 MSDN


在IN子句中包含非常大量的值(数千)可以消耗资源并返回错误8623或8632


这与资源耗尽或超出表情限制有关。



但是,在之前,这些错误发生,越来越多的项目, IN 语句变得越来越慢。我无法找到有关其增长率的文档,但它可以达到数以千计的项目,但是超过这个数量会变得非常慢。 (基于SQL Server体验)。



可扩展性



我们不能总是避免使用此语句。源代码数据替代的 JOIN 通常会执行得更好,但只有当源数据位于相同的上下文中时才可能。在这里我正在处理在断开连接的场景中来自客户端的数据。所以我一直在寻找一个可扩展的解决方案。令人满意的做法是将操作切割成块:

  var courses = ids.ToChunks(1000)
。选择(chunk => Courses.Where(c => chunk.Contains(c.CourseID)))
.SelectMany(x => x).ToList();

(其中 ToChunks 这个小扩展方法)。



这将以1000的大小写执行查询都表现得很好。例如5000个项目,5个查询将一起运行,可能比一个具有5000个项目的查询更快。



但不是DRY



但是当然我不想把这个结构分散在我的代码上。我正在寻找一种扩展方法,通过该方法可以将任何 IQueryable 转换成大块执行语句。理想情况是这样的:

  var courses = Courses.Where(c => ids.Contains(c.CourseID))
.AsChunky(1000)
.ToList();

但也许这个

  var courses = Courses.ChunkyContains(c => c.CourseID,ids,1000)
.ToList();

我给了第一个解决方案:

  public static IEnumerable< TEntity> ChunkyContains< TEntity,TContains>(
这个IQueryable< TEntity>查询,
表达式< Func< TEntity,TContains>&match;
IEnumerable< TContains> containsList,
int chunkSize = 500)
{
return containsList.ToChunks(chunkSize)
.Select(chunk => query.Where(x => chunk.Contains(match)))
.SelectMany(x => x);
}

显然,零件 x => chunk.Contains(match)不编译。但是我不知道如何操作 match 表达式到包含表达式。



也许有人可以帮助我使这个解决方案工作。当然,我可以使用其他方法来使这个语句可扩展。

解决方案

我已经解决了这个问题一个月前看不同的方法。也许这是一个很好的解决方案。



我不希望我的解决方案更改查询本身。所以一个ids.ChunkContains(p.Id)或一个特殊的WhereContains方法是不可行的。解决方案也可以将Contains与另一个过滤器结合使用,并多次使用相同的集合。

  db.TestEntities .Where(p =>(ids.Contains(p.Id)|| ids.Contains(p.ParentId))&& p.Name.StartsWith(Test))

所以我试图将逻辑封装在一个特殊的ToList方法中,可以重写要在块中查询的指定集合的​​Expression。 p>

  var ids = Enumerable.Range(1,11); 
var result = db.TestEntities.Where(p => Ids.Contains(p.Id)&& p.Name.StartsWith(Test))
.ToChunkedList(ids,4 );

要重写表达式树,我发现查询中的本地集合中包含所有Contains方法调用帮助课程

  private class ContainsExpression 
{
public ContainsExpression(MethodCallExpression methodCall)
{
this.MethodCall = methodCall;
}

public MethodCallExpression MethodCall {get;私人集合}

public object GetValue()
{
var parent = MethodCall.Object? MethodCall.Arguments.FirstOrDefault();
return Expression.Lambda&FunC< object>>(parent).Compile()();
}

public bool IsLocalList()
{
表达式parent = MethodCall.Object? MethodCall.Arguments.FirstOrDefault();
while(parent!= null){
if(parent为ConstantExpression)
返回true;
var member = parent作为MemberExpression;
if(member!= null){
parent = member.Expression;
} else {
parent = null;
}
}
return false;
}
}

私有类FindExpressionVisitor< T> :ExpressionVisitor其中T:表达式
{
public List< T> FoundItems {get;私人集合}

public FindExpressionVisitor()
{
this.FoundItems = new List< T>();
}

public override表达式访问(表达式节点)
{
var found = node as T;
if(found!= null){
this.FoundItems.Add(found);
}
return base.Visit(node);
}
}

public static List< T> ToChunkedList< T,TValue>(此IQueryable< T>查询,IEnumerable< TValue> list,int chunkSize)
{
var finder = new FindExpressionVisitor< MethodCallExpression>();
finder.Visit(query.Expression);
var methodCalls = finder.FoundItems.Where(p => p.Method.Name ==Contains)。选择(p => new ContainsExpression(p))其中(p => p。 IsLocalList())ToList();
var localLists = methodCalls.Where(p => p.GetValue()== list).ToList();

如果在查询表达式中找到ToChunkedList方法中传递的本地集合,则替换Contains调用到原始列表,新的调用包含一个批次的ids的临时列表。

  if(localLists.Any() ){
var result = new List< T>();
var valueList = new List< TValue>();

var containsMethod = typeof(Enumerable).GetMethods(BindingFlags.Static | BindingFlags.Public)
.Single(p => p.Name ==包含&& p .GetParameters()。Count()== 2)
.MakeGenericMethod(typeof(TValue));

var queryExpression = query.Expression;

foreach(localLists中的var item){
var parameter = new List< Expression>();
parameter.Add(Expression.Constant(valueList));
if(item.MethodCall.Object == null){
parameter.AddRange(item.MethodCall.Arguments.Skip(1));
} else {
parameter.AddRange(item.MethodCall.Arguments);
}

var call = Expression.Call(containsMethod,parameter.ToArray());

var replacer = new ExpressionReplacer(item.MethodCall,call);

queryExpression = replacer.Visit(queryExpression);
}

var chunkQuery = query.Provider.CreateQuery< T>(queryExpression); (int i = 0; i< Math.Ceiling((decimal)list.Count()/ chunkSize); i ++){
valueList.Clear()


;
valueList.AddRange(list.Skip(i * chunkSize).Take(chunkSize));

result.AddRange(chunkQuery.ToList());
}
返回结果;
}
//如果没有找到集合return query.ToList()
return query.ToList();

表达式替换:

  private class ExpressionReplacer:ExpressionVisitor {

private Expression find,replace;

public ExpressionReplacer(Expression find,Expression replace)
{
this.find = find;
this.replace = replace;
}

public override表达式访问(表达式节点)
{
if(node == this.find)
return this.replace;

返回base.Visit(node);
}
}


I'm looking for an elegant way to execute a Contains() statement in a scalable way. Please allow me to give some background before I come to the actual question.

The IN statement

In Entity Framework and LINQ to SQL the Contains statement is translated as a SQL IN statement. For instance, from this statement:

var ids = Enumerable.Range(1,10);
var courses = Courses.Where(c => ids.Contains(c.CourseID)).ToList();

Entity Framework will generate

SELECT 
    [Extent1].[CourseID] AS [CourseID], 
    [Extent1].[Title] AS [Title], 
    [Extent1].[Credits] AS [Credits], 
    [Extent1].[DepartmentID] AS [DepartmentID]
    FROM [dbo].[Course] AS [Extent1]
    WHERE [Extent1].[CourseID] IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

Unfortunately, the In statement is not scalable. As per MSDN:

Including an extremely large number of values (many thousands) in an IN clause can consume resources and return errors 8623 or 8632

which has to do with running out of resources or exceeding expression limits.

But before these errors occur, the IN statement becomes increasingly slow with growing numbers of items. I can't find documentation about its growth rate, but it performs well up to a few thousands of items, but beyond that it gets dramatically slow. (Based on SQL Server experiences).

Scalable

We can't always avoid this statement. A JOIN with the source data in stead would generally perform much better, but that's only possible when the source data is in the same context. Here I'm dealing with data coming from a client in a disconnected scenario. So I have been looking for a scalable solution. A satisfactory approach turned out to be cutting the operation into chunks:

var courses = ids.ToChunks(1000)
                 .Select(chunk => Courses.Where(c => chunk.Contains(c.CourseID)))
                 .SelectMany(x => x).ToList();

(where ToChunks is this little extension method).

This executes the query in chunks of 1000 that all perform well enough. With e.g. 5000 items, 5 queries will run that together are likely to be faster than one query with 5000 items.

But not DRY

But of course I don't want to scatter this construct all over my code. I am looking for an extension method by which any IQueryable<T> can be transformed into a chunky executing statement. Ideally something like this:

var courses = Courses.Where(c => ids.Contains(c.CourseID))
              .AsChunky(1000)
              .ToList();

But maybe this

var courses = Courses.ChunkyContains(c => c.CourseID, ids, 1000)
              .ToList();

I've given the latter solution a first shot:

public static IEnumerable<TEntity> ChunkyContains<TEntity, TContains>(
    this IQueryable<TEntity> query, 
    Expression<Func<TEntity,TContains>> match, 
    IEnumerable<TContains> containList, 
    int chunkSize = 500)
{
    return containList.ToChunks(chunkSize)
               .Select (chunk => query.Where(x => chunk.Contains(match)))
               .SelectMany(x => x);
}

Obviously, the part x => chunk.Contains(match) doesn't compile. But I don't know how to manipulate the match expression into a Contains expression.

Maybe someone can help me make this solution work. And of course I'm open to other approaches to make this statement scalable.

解决方案

I’ve solved this problem with a little different approach a view month ago. Maybe it’s a good solution for you too.

I didn’t want my solution to change the query itself. So a ids.ChunkContains(p.Id) or a special WhereContains method was unfeasible. Also should the solution be able to combine a Contains with another filter as well as using the same collection multiple times.

db.TestEntities.Where(p => (ids.Contains(p.Id) || ids.Contains(p.ParentId)) && p.Name.StartsWith("Test"))

So I tried to encapsulate the logic in a special ToList method that could rewrite the Expression for a specified collection to be queried in chunks.

var ids = Enumerable.Range(1, 11);
var result = db.TestEntities.Where(p => Ids.Contains(p.Id) && p.Name.StartsWith ("Test"))
                                .ToChunkedList(ids,4);

To rewrite the expression tree I discovered all Contains Method calls from local collections in the query with a view helping classes.

private class ContainsExpression
{
    public ContainsExpression(MethodCallExpression methodCall)
    {
        this.MethodCall = methodCall;
    }

    public MethodCallExpression MethodCall { get; private set; }

    public object GetValue()
    {
        var parent = MethodCall.Object ?? MethodCall.Arguments.FirstOrDefault();
        return Expression.Lambda<Func<object>>(parent).Compile()();
    }

    public bool IsLocalList()
    {
        Expression parent = MethodCall.Object ?? MethodCall.Arguments.FirstOrDefault();
        while (parent != null) {
            if (parent is ConstantExpression)
                return true;
            var member = parent as MemberExpression;
            if (member != null) {
                parent = member.Expression;
            } else {
                parent = null;
            }
        }
        return false;
    }
}

private class FindExpressionVisitor<T> : ExpressionVisitor where T : Expression
{
    public List<T> FoundItems { get; private set; }

    public FindExpressionVisitor()
    {
        this.FoundItems = new List<T>();
    }

    public override Expression Visit(Expression node)
    {
        var found = node as T;
        if (found != null) {
            this.FoundItems.Add(found);
        }
        return base.Visit(node);
    }
}

public static List<T> ToChunkedList<T, TValue>(this IQueryable<T> query, IEnumerable<TValue> list, int chunkSize)
{
    var finder = new FindExpressionVisitor<MethodCallExpression>();
    finder.Visit(query.Expression);
    var methodCalls = finder.FoundItems.Where(p => p.Method.Name == "Contains").Select(p => new ContainsExpression(p)).Where(p => p.IsLocalList()).ToList();
    var localLists = methodCalls.Where(p => p.GetValue() == list).ToList();

If the local collection passed in the ToChunkedList method was found in the query expression, I replace the Contains call to the original list with a new call to a temporary list containing the ids for one batch.

if (localLists.Any()) {
    var result = new List<T>();
    var valueList = new List<TValue>();

    var containsMethod = typeof(Enumerable).GetMethods(BindingFlags.Static | BindingFlags.Public)
                        .Single(p => p.Name == "Contains" && p.GetParameters().Count() == 2)
                        .MakeGenericMethod(typeof(TValue));

    var queryExpression = query.Expression;

    foreach (var item in localLists) {
        var parameter = new List<Expression>();
        parameter.Add(Expression.Constant(valueList));
        if (item.MethodCall.Object == null) {
            parameter.AddRange(item.MethodCall.Arguments.Skip(1));
        } else {
            parameter.AddRange(item.MethodCall.Arguments);
        }

        var call = Expression.Call(containsMethod, parameter.ToArray());

        var replacer = new ExpressionReplacer(item.MethodCall,call);

        queryExpression = replacer.Visit(queryExpression);
    }

    var chunkQuery = query.Provider.CreateQuery<T>(queryExpression);


    for (int i = 0; i < Math.Ceiling((decimal)list.Count() / chunkSize); i++) {
        valueList.Clear();
        valueList.AddRange(list.Skip(i * chunkSize).Take(chunkSize));

        result.AddRange(chunkQuery.ToList());
    }
    return result;
}
// if the collection was not found return query.ToList()
return query.ToList();

Expression Replacer:

private class ExpressionReplacer : ExpressionVisitor {

    private Expression find, replace;

    public ExpressionReplacer(Expression find, Expression replace)
    {
        this.find = find;
        this.replace = replace;
    }

    public override Expression Visit(Expression node)
    {
        if (node == this.find)
            return this.replace;

        return base.Visit(node);
    }
}

这篇关于可扩展包含针对SQL后端的LINQ方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆