通过计算其MD5获得重复的文件列表 [英] Get duplicate file list by computing their MD5

查看:70
本文介绍了通过计算其MD5获得重复的文件列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含文件路径的数组,我想列出一个根据其MD5复制的文件.我这样计算他们的MD5:

I have a array which contains a files path, I want to make a list a those file which are duplicate on the basis of their MD5. I calculate their MD5 like this:

private void calcMD5(Array files)  //Array contains a path of all files
{
    int i=0;
    string[] md5_val = new string[files.Length];
    foreach (string file_name in files)
    {
        using (var md5 = MD5.Create())
        {
            using (var stream = File.OpenRead(file_name))
            {
                md5_val[i] = BitConverter.ToString(md5.ComputeHash(stream)).Replace("-", "").ToLower();
                i += 1;
            }
        }
    }                
}

从上面我可以计算出它们的MD5,但是如何仅获取那些重复的文件的列表.如果还有其他方法可以解决,请告诉我,我也是Linq的新手

From above I able to calculate their MD5 but how to get only list of those files which are duplicate. If there is any other way to do same please let me know, and also I am new to Linq

推荐答案

1.重写您的calcMD5函数以获取单个文件路径,然后返回 MD5.
2.如果可能,将文件名存储在string[]List<string>中,而不是无类型的数组中.
3.使用以下LINQ来获取具有相同哈希值的文件组:

1. Rewrite your calcMD5 function to take in a single file path and return the MD5.
2. Store your file names in a string[] or List<string>, not an untyped array, if possible.
3. Use the following LINQ to get groups of files with the same hash:

var groupsOfFilesWithSameHash = files
  // or files.Cast<string>() if you're stuck with an Array
   .GroupBy(f => calcMD5(f))
   .Where(g => g.Count() > 1);

4.您可以进入具有嵌套foreach循环的组,例如:

4. You can get to the groups with nested foreach loops, for example:

foreach(var group in groupsOfFilesWithSameHash)
{
    Console.WriteLine("Shared MD5: " + g.Key);
    foreach (var file in group)
        Console.WriteLine("    " + file);
}

这篇关于通过计算其MD5获得重复的文件列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆