检测具有相同的孩子实体 [英] Detect entities which have the same children

查看:147
本文介绍了检测具有相同的孩子实体的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个实体,学生,在许多一对多的关系联系起来。

I have two entities, Class and Student, linked in a many-to-many relationship.

当数据从外部应用程序导入的,不幸的是一些类重复创建。在'重复'阶级有不同的名称,但相同的主题和相同的学生。

When data is imported from an external application, unfortunately some classes are created in duplicate. The 'duplicate' classes have different names, but the same subject and the same students.

例如:

{n = 341,标题='10RS / PE1a',SubjectId = 60,学生= {杰克,比尔,莎拉}}

{ Id = 341, Title = '10rs/PE1a', SubjectId = 60, Students = { Jack, Bill, Sarah } }

{n = 429,标题='10RS / PE1b',SubjectId = 60,学生= {杰克,比尔,莎拉}}

{ Id = 429, Title = '10rs/PE1b', SubjectId = 60, Students = { Jack, Bill, Sarah } }

有匹配这些重复类的名称没有一般的规律,所以唯一的办法,以确定两个类是重复的是,他们有相同的 SubjectId 学生

There is no general rule for matching the names of these duplicate classes, so the only way to identify that two classes are duplicates is that they have the same SubjectId and Students.

我想使用LINQ来检测所有重复(并最终将它们合并)。到目前为止,我已经尝试过:

I'd like to use LINQ to detect all duplicates (and ultimately merge them). So far I have tried:

var sb = new StringBuilder();
using (var ctx = new Ctx()) {
  ctx.CommandTimeout = 10000; // Because the next line takes so long!
  var allClasses = ctx.Classes.Include("Students").OrderBy(o => o.Id);
  foreach (var c in allClasses) {
    var duplicates = allClasses.Where(o => o.SubjectId == c.SubjectId && o.Id != c.Id && o.Students.Equals(c.Students));
    foreach (var d in duplicates)
      sb.Append(d.LongName).Append(" is a duplicate of ").Append(c.LongName).Append("<br />");
  }
}
lblResult.Text = sb.ToString();

这是没有好,因为我得到的错误:

This is no good because I get the error:

NotSupportedException异常:无法创建类型TeachEDM.Student的恒定值。只有原始类型('如的Int32,字符串和GUID')在这方面的支持。

NotSupportedException: Unable to create a constant value of type 'TeachEDM.Student'. Only primitive types ('such as Int32, String, and Guid') are supported in this context.

显然,它不喜欢我尝试匹配 o.SubjectId == c.SubjectId 在LINQ。

Evidently it doesn't like me trying to match o.SubjectId == c.SubjectId in LINQ.

此外,这似乎在一般可怕的方法和很慢。到数据库的通话时间超过5分钟。

Also, this seems a horrible method in general and is very slow. The call to the database takes more than 5 minutes.

我真的AP preciate一些建议。

I'd really appreciate some advice.

推荐答案

SubjectId 是没有问题的,因为 c.SubjectId <的比较/ code>是一个基本类型(<$ C C> INT ,我猜)的值。关于例外抱怨等于(c.Students) c.Students 是一个常数(相对于查询复制),而不是一个基本类型。

The comparison of the SubjectId is not the problem because c.SubjectId is a value of a primitive type (int, I guess). The exception complains about Equals(c.Students). c.Students is a constant (with respect to the query duplicates) but not a primitive type.

我也想尝试做在内存中,而不是在数据库进行比较。您正在加载整个数据到内存无论如何,当你开始你的第一个的foreach 循环:它执行查询 allClasses 。然后循环内您扩展IQueryable的 allClasses 到的IQueryable 复制其中获取内部<$ C执行,那么$ C>的foreach 循环。这是根据你的外循环的元素一个数据库查询!这可以解释的code中的表现不佳。

I would also try to do the comparison in memory and not in the database. You are loading the whole data into memory anyway when you start your first foreach loop: It executes the query allClasses. Then inside of the loop you extend the IQueryable allClasses to the IQueryable duplicates which gets executed then in the inner foreach loop. This is one database query per element of your outer loop! This could explain the poor performance of the code.

所以,我会尝试在内存中执行的第一个的foreach 的内容。对于学生列表的比较有必要通过元素比较元素,而不是因为他们是肯定不同的参考学生的集合。

So I would try to perform the content of the first foreach in memory. For the comparison of the Students list it is necessary to compare element by element, not the references to the Students collections because they are for sure different.

var sb = new StringBuilder();
using (var ctx = new Ctx())
{
    ctx.CommandTimeout = 10000; // Perhaps not necessary anymore
    var allClasses = ctx.Classes.Include("Students").OrderBy(o => o.Id)
        .ToList(); // executes query, allClasses is now a List, not an IQueryable

    // everything from here runs in memory
    foreach (var c in allClasses)
    {
        var duplicates = allClasses.Where(
           o => o.SubjectId == c.SubjectId &&
           o.Id != c.Id &&
           o.Students.OrderBy(s => s.Name).Select(s => s.Name)
            .SequenceEqual(c.Students.OrderBy(s => s.Name).Select(s => s.Name)));

        // duplicates is an IEnumerable, not an IQueryable
        foreach (var d in duplicates)
            sb.Append(d.LongName)
              .Append(" is a duplicate of ")
              .Append(c.LongName)
              .Append("<br />");
    }
}
lblResult.Text = sb.ToString();

按名称排序的序列是必要的,因为,我相信, SequenceEqual 比较序列的长度,然后元素0与元素0,元素1与元素1等上。

Ordering the sequences by name is necessary because, I believe, SequenceEqual compares length of the sequence and then element 0 with element 0, then element 1 with element 1 and so on.

修改以您的评论的第一个查询仍然很慢。

Edit To your comment that the first query is still slow.

如果你有1300个教学班,每30名学生预先加载的性能(包含)可能会遭受数据,这些数据是数据库和客户端之间转移的乘法。这是在这里解释:<一href="http://stackoverflow.com/questions/5521749/how-many-include-i-can-use-on-objectset-in-entityframework-to-retain-performance/5522195#5522195">How很多包括我可以在对象集使用的EntityFramework保留效果?。查询是复杂的,因为它需要一个加入班学生和对象具体化之间的复杂以及由于EF必须过滤掉在创建对象时,重复的数据。

If you have 1300 classes with 30 students each the performance of eager loading (Include) could suffer from the multiplication of data which are transfered between database and client. This is explained here: How many Include I can use on ObjectSet in EntityFramework to retain performance? . The query is complex because it needs a JOIN between classes and students and object materialization is complex as well because EF must filter out the duplicated data when the objects are created.

这是另一种方法是只加载类,而无需学生在第一个查询,然后再加载同学们一个个循环明确地内部。它是这样的:

An alternative approach is to load only the classes without the students in the first query and then load the students one by one inside of a loop explicitely. It would look like this:

var sb = new StringBuilder();
using (var ctx = new Ctx())
{
    ctx.CommandTimeout = 10000; // Perhaps not necessary anymore
    var allClasses = ctx.Classes.OrderBy(o => o.Id).ToList(); // <- No Include!
    foreach (var c in allClasses)
    {
        // "Explicite loading": This is a new roundtrip to the DB
        ctx.LoadProperty(c, "Students");
    }

    foreach (var c in allClasses)
    {
        // ... same code as above
    }
}
lblResult.Text = sb.ToString();

您将在这个例子中,而不是只有一个1 + 1300数据库查询,但你不会有发生与预先加载数据的乘法和查询更简单(没有加入班和学生)之间。

You would have 1 + 1300 database queries in this example instead of only one, but you won't have the data multiplication which occurs with eager loading and the queries are simpler (no JOIN between classes and students).

Explicite装载了说明:

Explicite loading is explained here:

  • http://msdn.microsoft.com/en-us/library/bb896272.aspx
  • For POCOs (works also for EntityObject derived entities): http://msdn.microsoft.com/en-us/library/dd456855.aspx
  • For EntityObject derived entities you can also use the Load method of EntityCollection: http://msdn.microsoft.com/en-us/library/bb896370.aspx

如果您使用延迟加载的第一个的foreach LoadProperty 不会是必要的,因为学生藏品将在第一次访问它被装载。它应该有这样几个explicite加载相同的1300个其他查询。

If you work with Lazy Loading the first foreach with LoadProperty would not be necessary as the Students collections will be loaded the first time you access it. It should result in the same 1300 additional queries like explicite loading.

这篇关于检测具有相同的孩子实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆