为什么EnumerateFiles比计算大小要快得多 [英] Why is EnumerateFiles much quicker than calculating the sizes

查看:36
本文介绍了为什么EnumerateFiles比计算大小要快得多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于我的WPF项目,我必须在一个目录(可能包含子目录)中计算文件的总大小.

For my WPF project, I have to calculate the total file size in a single directory (which could have sub directories).

样本1

DirectoryInfo di = new DirectoryInfo(path);
var totalLength = di.EnumerateFiles("*.*", SearchOption.AllDirectories).Sum(fi => fi.Length);

if (totalLength / 1000000 >= size)
    return true;

示例2

 var sizeOfHtmlDirectory = Directory.GetFiles(path, "*.*", SearchOption.AllDirectories);
 long totalLength = 0;
 foreach (var file in sizeOfHtmlDirectory)
 {
     totalLength += new FileInfo(file).Length;
     if (totalLength / 1000000 >= size)
         return true;
 }

两个示例都可以工作.

样品1的完成时间大大缩短.我的计时不准确,但是在我的PC上,使用相同的文件夹具有相同的内容/文件大小,示例1需要花费几秒钟,示例2需要花费几分钟.

Sample 1 complete in a massivly faster time. I've not timed this accurately but on my PC, using the same folder with the same content/file sizes, Sample 1 takes a few seconds, Sample 2 takes a few minutes.

编辑

我应该指出,示例2中的瓶颈在foreach循环之内!它会快速读取GetFiles,并快速进入foreach循环.

I should point out, the bottle neck in Sample 2 is within the foreach loop! It reads the GetFiles quickly and enters the foreach loop quickly.

我的问题是,如何找出为什么?

My question is, how do I find out why this is the case?

推荐答案

与其他答案相反,主要区别不是 EnumerateFiles GetFiles -它是 DirectoryInfo Directory -在后一种情况下,您只有字符串,并且必须分别创建新的 FileInfo 实例,这非常昂贵.

Contrary to what the other answers indicate the main difference is not EnumerateFiles vs GetFiles - it's DirectoryInfo vs Directory - in the latter case you only have strings and have to create new FileInfo instances separately which is very costly.

DirectoryInfo 返回使用缓存信息的 FileInfo 实例,而不是直接创建新的 FileInfo 实例,而这些实例不使用-详细信息此处.

DirectoryInfo returns FileInfo instances that use cached information vs directly creating new FileInfo instances which does not - more details here and here.

相关报价(通过旧事物"):

Relevant quote (via "The Old New Thing"):

在NTFS中,文件系统元数据不是目录条目的属性而不是文件,其中一些元数据已复制到目录条目作为改进目录枚举的一项调整表现.诸如FindFirstFile之类的功能报告目录条目,并放入FAT用户习惯的元数据获得免费",他们可以避免比FAT慢目录列表.目录枚举功能报告最后更新的元数据,可能与实际的元数据不符如果目录条目是陈旧的.

In NTFS, file system metadata is a property not of the directory entry but rather of the file, with some of the metadata replicated into the directory entry as a tweak to improve directory enumeration performance. Functions like Find­First­File report the directory entry, and by putting the metadata that FAT users were accustomed to getting "for free", they could avoid being slower than FAT for directory listings. The directory-enumeration functions report the last-updated metadata, which may not correspond to the actual metadata if the directory entry is stale.

这篇关于为什么EnumerateFiles比计算大小要快得多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆