通过计算其MD5获得重复的文件列表 [英] Get duplicate file list by computing their MD5
问题描述
我有一个包含文件路径的数组,我想列出一个根据其MD5复制的文件.我这样计算他们的MD5:
I have a array which contains a files path, I want to make a list a those file which are duplicate on the basis of their MD5. I calculate their MD5 like this:
private void calcMD5(Array files) //Array contains a path of all files
{
int i=0;
string[] md5_val = new string[files.Length];
foreach (string file_name in files)
{
using (var md5 = MD5.Create())
{
using (var stream = File.OpenRead(file_name))
{
md5_val[i] = BitConverter.ToString(md5.ComputeHash(stream)).Replace("-", "").ToLower();
i += 1;
}
}
}
}
从上面我可以计算出它们的MD5,但是如何仅获取那些重复的文件的列表.如果还有其他方法可以解决,请告诉我,我也是Linq的新手
From above I able to calculate their MD5 but how to get only list of those files which are duplicate. If there is any other way to do same please let me know, and also I am new to Linq
推荐答案
1.
重写您的calcMD5
函数以获取单个文件路径,然后返回 MD5.
2.
如果可能,将文件名存储在string[]
或List<string>
中,而不是无类型的数组中.
3.
使用以下LINQ来获取具有相同哈希值的文件组:
1.
Rewrite your calcMD5
function to take in a single file path and return the MD5.
2.
Store your file names in a string[]
or List<string>
, not an untyped array, if possible.
3.
Use the following LINQ to get groups of files with the same hash:
var groupsOfFilesWithSameHash = files
// or files.Cast<string>() if you're stuck with an Array
.GroupBy(f => calcMD5(f))
.Where(g => g.Count() > 1);
4.
您可以进入具有嵌套foreach
循环的组,例如:
4.
You can get to the groups with nested foreach
loops, for example:
foreach(var group in groupsOfFilesWithSameHash)
{
Console.WriteLine("Shared MD5: " + g.Key);
foreach (var file in group)
Console.WriteLine(" " + file);
}
这篇关于通过计算其MD5获得重复的文件列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!