有没有比这更快的方法来查找目录和所有子目录中的所有文件? [英] Is there a faster way than this to find all the files in a directory and all sub directories?

查看:24
本文介绍了有没有比这更快的方法来查找目录和所有子目录中的所有文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个程序,需要在一个目录及其所有子目录中搜索具有特定扩展名的文件.这将在本地和网络驱动器上使用,因此性能有点问题.

这是我现在使用的递归方法:

private void GetFileList(string fileSearchPattern, string rootFolderPath, List files){DirectoryInfo di = new DirectoryInfo(rootFolderPath);FileInfo[] fiArr = di.GetFiles(fileSearchPattern, SearchOption.TopDirectoryOnly);文件.AddRange(fiArr);DirectoryInfo[] diArr = di.GetDirectories();foreach(diArr 中的目录信息信息){GetFileList(fileSearchPattern, info.FullName, files);}}

我可以将 SearchOption 设置为 AllDirectories 并且不使用递归方法,但将来我想插入一些代码来通知用户当前正在扫描哪个文件夹.

在创建 FileInfo 对象列表时,我现在真正关心的是文件的路径.我将有一个现有的文件列表,我想将其与新的文件列表进行比较,以查看添加或删除了哪些文件.有没有更快的方法来生成这个文件路径列表?有什么办法可以围绕查询共享网络驱动器上的文件来优化此文件搜索吗?

<小时>

更新 1

我尝试创建一个非递归方法,它首先查找所有子目录,然后迭代扫描每个目录中的文件,从而实现相同的功能.方法如下:

public static ListGetFileList(string fileSearchPattern, string rootFolderPath){DirectoryInfo rootDir = new DirectoryInfo(rootFolderPath);列表<目录信息>dirList = new List(rootDir.GetDirectories("*", SearchOption.AllDirectories));dirList.Add(rootDir);列表<文件信息>fileList = new List();foreach(dirList 中的目录信息目录){fileList.AddRange(dir.GetFiles(fileSearchPattern, SearchOption.TopDirectoryOnly));}返回文件列表;}

<小时>

更新 2

好的,所以我在本地和远程文件夹上运行了一些测试,这两个文件夹都有很多文件(~1200).这是我运行测试的方法.结果如下.

  • GetFileListA():上述更新中的非递归解决方案.我认为这相当于 Jay 的解决方案.
  • GetFileListB():来自原始问题的递归方法
  • GetFileListC():使用静态 Directory.GetDirectories() 方法获取所有目录.然后使用静态 Directory.GetFiles() 方法获取所有文件路径.填充并返回一个列表
  • GetFileListD():Marc Gravell 的解决方案使用队列并返回 IEnumberable.我用生成的 IEnumerable 填充了一个列表
    • DirectoryInfo.GetFiles:未创建其他方法.从根文件夹路径实例化 DirectoryInfo.使用 SearchOption.AllDirectories 调用 GetFiles
  • Directory.GetFiles:未创建其他方法.使用 SearchOption.AllDirectories 调用目录的静态 GetFiles 方法
<块引用>

方法本地文件夹远程文件夹GetFileListA() 00:00.0781235 05:22.9000502GetFileListB() 00:00.0624988 03:43.5425829GetFileListC() 00:00.0624988 05:19.7282361GetFileListD() 00:00.0468741 03:38.1208120DirectoryInfo.GetFiles 00:00.0468741 03:45.4644210Directory.GetFiles 00:00.0312494 03:48.0737459

...so 看起来 Marc 是最快的.

解决方案

试试这个避免递归和 Info 对象的迭代器块版本:

public static IEnumerableGetFileList(string fileSearchPattern, string rootFolderPath){队列<字符串>待处理 = 新队列<字符串>();挂起.入队(rootFolderPath);字符串[] tmp;while (pending.Count > 0){rootFolderPath = pending.Dequeue();尝试{tmp = Directory.GetFiles(rootFolderPath, fileSearchPattern);}捕获(未授权访问异常){继续;}for (int i = 0; i < tmp.Length; i++){收益回报 tmp[i];}tmp = Directory.GetDirectories(rootFolderPath);for (int i = 0; i < tmp.Length; i++){挂起.入队(tmp [i]);}}}

另请注意,4.0 具有内置的迭代器块版本(EnumerateFiles, EnumerateFileSystemEntries) 可能更快(更直接地访问文件系统;更少的数组)

I'm writing a program that needs to search a directory and all its sub directories for files that have a certain extension. This is going to be used both on a local, and a network drive, so performance is a bit of an issue.

Here's the recursive method I'm using now:

private void GetFileList(string fileSearchPattern, string rootFolderPath, List<FileInfo> files)
{
    DirectoryInfo di = new DirectoryInfo(rootFolderPath);

    FileInfo[] fiArr = di.GetFiles(fileSearchPattern, SearchOption.TopDirectoryOnly);
    files.AddRange(fiArr);

    DirectoryInfo[] diArr = di.GetDirectories();

    foreach (DirectoryInfo info in diArr)
    {
        GetFileList(fileSearchPattern, info.FullName, files);
    }
}

I could set the SearchOption to AllDirectories and not use a recursive method, but in the future I'll want to insert some code to notify the user what folder is currently being scanned.

While I'm creating a list of FileInfo objects now all I really care about is the paths to the files. I'll have an existing list of files, which I want to compare to the new list of files to see what files were added or deleted. Is there any faster way to generate this list of file paths? Is there anything that I can do to optimize this file search around querying for the files on a shared network drive?


Update 1

I tried creating a non-recursive method that does the same thing by first finding all the sub directories and then iteratively scanning each directory for files. Here's the method:

public static List<FileInfo> GetFileList(string fileSearchPattern, string rootFolderPath)
{
    DirectoryInfo rootDir = new DirectoryInfo(rootFolderPath);

    List<DirectoryInfo> dirList = new List<DirectoryInfo>(rootDir.GetDirectories("*", SearchOption.AllDirectories));
    dirList.Add(rootDir);

    List<FileInfo> fileList = new List<FileInfo>();

    foreach (DirectoryInfo dir in dirList)
    {
        fileList.AddRange(dir.GetFiles(fileSearchPattern, SearchOption.TopDirectoryOnly));
    }

    return fileList;
}


Update 2

Alright so I've run some tests on a local and a remote folder both of which have a lot of files (~1200). Here are the methods I've run the tests on. The results are below.

  • GetFileListA(): Non-recursive solution in the update above. I think it's equivalent to Jay's solution.
  • GetFileListB(): Recursive method from the original question
  • GetFileListC(): Gets all the directories with static Directory.GetDirectories() method. Then gets all the file paths with the static Directory.GetFiles() method. Populates and returns a List
  • GetFileListD(): Marc Gravell's solution using a queue and returns IEnumberable. I populated a List with the resulting IEnumerable
    • DirectoryInfo.GetFiles: No additional method created. Instantiated a DirectoryInfo from the root folder path. Called GetFiles using SearchOption.AllDirectories
  • Directory.GetFiles: No additional method created. Called the static GetFiles method of the Directory using using SearchOption.AllDirectories

Method                       Local Folder       Remote Folder
GetFileListA()               00:00.0781235      05:22.9000502
GetFileListB()               00:00.0624988      03:43.5425829
GetFileListC()               00:00.0624988      05:19.7282361
GetFileListD()               00:00.0468741      03:38.1208120
DirectoryInfo.GetFiles       00:00.0468741      03:45.4644210
Directory.GetFiles           00:00.0312494      03:48.0737459

. . .so looks like Marc's is the fastest.

解决方案

Try this iterator block version that avoids recursion and the Info objects:

public static IEnumerable<string> GetFileList(string fileSearchPattern, string rootFolderPath)
{
    Queue<string> pending = new Queue<string>();
    pending.Enqueue(rootFolderPath);
    string[] tmp;
    while (pending.Count > 0)
    {
        rootFolderPath = pending.Dequeue();
        try
        {
            tmp = Directory.GetFiles(rootFolderPath, fileSearchPattern);
        }
        catch (UnauthorizedAccessException)
        {
            continue;
        }
        for (int i = 0; i < tmp.Length; i++)
        {
            yield return tmp[i];
        }
        tmp = Directory.GetDirectories(rootFolderPath);
        for (int i = 0; i < tmp.Length; i++)
        {
            pending.Enqueue(tmp[i]);
        }
    }
}

Note also that 4.0 has inbuilt iterator block versions (EnumerateFiles, EnumerateFileSystemEntries) that may be faster (more direct access to the file system; less arrays)

这篇关于有没有比这更快的方法来查找目录和所有子目录中的所有文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆