比较两个包含很多对象的列表 [英] Compare two lists that contain a lot of objects

查看:63
本文介绍了比较两个包含很多对象的列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要比较两个列表,每个列表包含大约60,000个对象.什么是最有效的方法?我要选择源列表中目标列表中不存在的所有项目.

I need to compare two lists where each list contains about 60,000 objects. what would be the most efficient way of doing this? I want to select all the items that are in the source list that do not exist in the destination list.

我正在创建一个同步应用程序,其中c#扫描目录并将每个文件的属性放在列表中.因此,存在一个用于源目录的列表和一个用于目标目录的列表.然后,而不是复制所有文件,我将比较列表并查看哪些文件不同.如果两个列表都具有相同的文件,那么我将不会复制该文件.这是我使用的Linq查询,当我扫描一个小文件夹时它会起作用,但是当我扫描一个大文件夹时它不会起作用.

I am creating a sync application where c# scans a directory and places the attributes of each file in a list. therefore there is a list for the source directory and another list for the destination directory. Then instead of copying all the files I will just compare the list and see which ones are different. If both list have the same file then I will not copy that file. Here is the Linq query that I use and it works when I scan a small folder but it does not when I scan a large folder.

// s.linst is the list of the source files
// d.list is the list of the files contained in the destination folder
  var q = from a in s.lstFiles
        from b in d.lstFiles
        where
        a.compareName == b.compareName &&
        a.size == b.size &&
        a.dateCreated == b.dateCreated
        select a;

// create a list to hold the items that are the same later select the outer join
List<Classes.MyPathInfo.MyFile> tempList = new List<Classes.MyPathInfo.MyFile>();

foreach (Classes.MyPathInfo.MyFile file in q)
{
    tempList.Add(file);
}

我不知道为什么这个查询要花很长时间.还有其他我可以利用的东西.例如,我知道,如果源文件与目标文件匹配,则不可能再与该文件重复,因为不必使用相同的名称和相同的路径来命名文件.

I don't know why this query takes forever. Also there are other things that I can take advantage. For example, I know that if the source file matches a destination file, then it is impossible to have another duplicate with that file because it is not possible to have to file names with the same name and same path.

推荐答案

为该类型创建一个相等比较器,然后可以使用它来高效地比较集合:

Create an equality comparer for the type, then you can use that to efficiently compare the sets:

public class MyFileComparer : IEqualityComparer<MyFile> {

  public bool Equals(MyFile a, MyFile b) {
    return
      a.compareName == b.compareName &&
      a.size == b.size &&
      a.dateCreated == b.dateCreated;
  }

  public int GetHashCode(MyFile a) {
    return
     (a.compareName.GetHashCode() * 251 + a.size.GetHashCode()) * 251 +
      a.dateCreated.GetHashCode();
  }

}

现在,您可以将其与Intersect之类的方法一起使用,以获取两个列表中都存在的所有项,或者Except以获得其中一个列表中存在但另一个列表中不存在的所有项:

Now you can use this with methods like Intersect to get all items that exist in both lists, or Except to get all items that exist in one list but not the other:

List<MyFile> tempList =
  s.lstFiles.Intersect(d.lstFiles, new MyFileComparer()).ToList();

由于这些方法可以使用哈希码将项目划分为存储桶,因此与联接相比,该方法需要将一个列表中的所有项目与另一个列表中的所有项目进行比较,因此需要进行的比较少得多

As the methods can use the hash code to divide the items into buckets, there are a lot less comparisons that needs to be done compared to a join where it has to compare all items in one list to all items in the other list.

这篇关于比较两个包含很多对象的列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆