具有自定义IEqualityComparer +组合属性的LINQ groupby-性能问题 [英] LINQ groupby with custom IEqualityComparer + combined properties - performance problems

查看:145
本文介绍了具有自定义IEqualityComparer +组合属性的LINQ groupby-性能问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据列表,该数据是从实体框架数据库查询与另一个相同类型的IEnumerable组合而成的,并且具有来自其他来源的内存数据.对于我们的某些客户,此列表总计约200000个条目(大约是数据库的一半),这使分组操作花费了非常长的时间(在我们廉价的虚拟Windows服务器上最多需要30分钟).

I have a List of data that's combined from a entity framework database query with another IEnumerable of the same type with in memory data from other sources. For some of our clients this list amounts to about 200000 entries (about half from the db), which makes the grouping operating take extremely long (up to 30 minutes on our cheap virtual Windows server).

分组操作将列表向下缩小到大约10000个对象(大约20:1).

The grouping operation turns the list down to about 10000 objects (so about 20:1).

List的数据类基本上只是一排很大的Strings和Ints以及其他一些基本类型:

The data class of the List is basically just a big row of Strings and Ints and a few other basic types:

public class ExportData
{
  public string FirstProperty;
  public string StringProperty;
  public string String1;
  ...
  public string String27;
  public int Int1;
  ...
  public int Int15;
  public decimal Mass;
  ...
}

通过自定义IEqualityComparer进行分组,基本上相当于:

The grouping is done through a custom IEqualityComparer that basically amounts to this:

  1. 如果允许通过自定义逻辑对项目进行分组,则这意味着两个对象的大约一半属性是相等的,这是我们从此以后关注的唯一属性,除了ID,Mass和特殊的StringProperty即使允许将项目分组,它们仍然可以不同.
  2. 每个新的分组对象都应具有相关属性(与步骤1中的属性相同),再加上来自分组项目的组合ID(以字符串形式)和分组项目的所有质量(十进制)总和,并且应该根据是否在任何分组项目中出现特殊字符串来设置特殊的StringProperty.

List<ExportData> exportData;//来自数据库+内存数据的组合数据的内存列表中

List<ExportData> exportData; // in memory list of combined data from database + memory data

exportData = exportData.GroupBy(w => w, new ExportCompare(data)).Select(g =>
{
  ExportData group = g.Key;
  group.Mass = g.Sum(s => s.Mass);

  if (g.Count() > 1)
  {
    group.CombinedIds = string.Join("-", g.Select(a => a.Id.ToString()));
  }

  if (g.Any(s => s.StringProperty.Equals("AB"))) 
  {
    group.StringProperty= "AB";
  }
  else if (g.Any(s => s.StringProperty.Equals("CD"))) 
  {
    group.StringProperty= "CD";
  }
  else
  {
    group.StringProperty= "EF";
  }

  return group;
}).ToList();

以及自定义比较器的完整性:

And the custom comparer for completeness:

public class ExportComparer : IequalityComparer<ExportData>
{
  private CompareData data;

  public ExportComparer()
  {
  }
  public ExportComparer(CompareData comparedata)
  {
    // Additional data needed for comparison logic
    // prefetched from another database
    data = comparedata;
  }
  public bool Equals(ExportData x, ExportData y)
  {
    if (ReferenceEquals(x, y)) return true;

    if (ReferenceEquals(x, null) || ReferenceEquals(y, null)) return false;

    (...) // Rest of the unit-tested and already optimized very long comparison logic
    return equality; // result from the custom comparison
  }

  public int GetHashCode(ExportData obj)
  {
    if (ReferenceEquals(obj, null)) return 0;

    int hash = 17;

    hash = hash * 23 + obj.FirstProperty.GetHashCode();
    (...) // repeated for each property used in the comparison logic
    return hash;

我该怎么做才能使此groupby更快地运行?

What can I do to make this groupby run faster?

推荐答案

很难建议对比较器进行优化,因为未显示其代码,但是对Select子句进行了优化.

It's hard to suggest optimization for comparer, since its code is not shown, but there is an optimization for Select clause.

现在,您正在选择中使用SumCountSelectAny(2次).这意味着对每个组中的元素进行5次评估(至少要对其进行至少3次评估).相反,您可以一次使用一个foreach循环,然后自己评估条件:

Right now you are using Sum, Count, Select, Any (2 times) inside that select. That means elements in each group are evaluated 5 times (from them at least 3 times completely). Instead you can use a foreach loop, once, and evaluate your conditions yourself:

exportData.GroupBy(w => w, new ExportCompare(data)).Select(g =>
{                
    ExportData group = g.Key;
    decimal mass = 0m;
    var ids = new List<int>();
    bool anyAb = false;
    bool anyCd = false;
    // only one loop
    foreach (var item in g) {
        mass += item.Mass;
        ids.Add(item.Id);
        anyAb = anyAb || item.StringProperty.Equals("AB");
        anyCd = anyCd || item.StringProperty.Equals("CD");
    }
    group.Mass = mass;
    if (ids.Count > 1) {
        group.CombinedIds = string.Join("-", ids);
    }
    if (anyAb)
        group.StringProperty = "AB";
    else if (anyCd)
        group.StringProperty = "CD";
    else
        group.StringProperty = "EF";

    return group;
}).ToList();

现在,我们只对分组进行循环一次,这应该比进行5次分组更为有效.

Now we loop over grouping just once, which should be more efficient that doing that 5 times.

这篇关于具有自定义IEqualityComparer +组合属性的LINQ groupby-性能问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆