C#中的多线程目录循环 [英] Multithreaded Directory Looping in C#

查看:97
本文介绍了C#中的多线程目录循环的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试遍历所有文件和文件夹并对具有特定扩展名的所有文件执行操作.这种方法工作正常,但我想让它成为多线程的,因为当完成数万个文件时,它真的很慢,我会使用多线程进行成像会加快速度.我只是不确定在这种情况下如何使用线程.

I am trying to loop through all files and folders and perform an action on all files that have a certain extension. This method works fine, but I would like to make it multithreaded because when done over tens of thousands of files, it is really slow and I would imaging using multithreading would speed things up. I am just unsure about how to use threading in this case.

doStuff 从文件中读取属性(修改日期等,并将它们插入到 sqlite 数据库中.我在调用 scan 方法之前开始一个事务,以便尽可能地优化

doStuff reads properties (date modified, etc. from the files and inserts them into a sqlite database. I am starting a transaction before the scan method is called so that is optimized as much as it can be.

提供有关如何做的理论的答案与完整的工作代码答案一样好.

Answers that provide the theory on how to do it are just as good as full working code answers.

    private static string[] validTypes = { ".x", ".y", ".z", ".etc" };
    public static void scan(string rootDirectory)
    {
        try
        {

            foreach (string dir in Directory.GetDirectories(rootDirectory))
            {

                if (dir.ToLower().IndexOf("$recycle.bin") == -1)
                    scan(dir);
            }

            foreach (string file in Directory.GetFiles(rootDirectory))
            {

                if (!((IList<string>)validTypes).Contains(Path.GetExtension(file)))
                {
                    continue;
                }


                doStuff(file);
            }
        }
        catch (Exception)
        {
        }
    }

推荐答案

假设 doStuff 是线程安全的,并且不需要等待整个扫描完成,则可以在线程池上同时调用 doStuffscan,就像这样:

Assuming that doStuff is thread-safe, and that you don't need to wait for the entire scan to finish, you can call both doStuff and scan on the ThreadPool, like this:

string path = file;
ThreadPool.QueueUserWorkItem(delegate { doStuff(path); });

您需要创建一个单独的局部变量,因为匿名方法会捕获file 变量本身,并且会在整个循环中看到对它的更改.(也就是说,如果线程池只在循环到下一个文件后才执行任务,就会处理错误的文件)

You need to make a separate local variable because the anonymous method would have capture the file variable itself, and would see changes to it throughout the loop. (In other words, if the ThreadPool only executed the task after the loop continued to the next file, it would process the wrong file)

但是,阅读您的评论,这里的主要问题是磁盘 IO,所以我怀疑多线程不会有太大帮助.

However, reading your comment, the main issue here is disk IO, so I suspect that multithreading will not help much.

请注意,对于包含大量文件的目录,Directory.GetFiles 的执行速度会很慢.(因为它需要分配一个数组来保存文件名)
如果您使用 .Net 4.0,您可以通过调用 EnumerateFiles 方法 代替,它使用迭代器返回一个 IEnumerable,它在您运行循环时枚举目录.
您还可以通过传递 SearchOption 参数来避免使用任一方法进行递归 scan 调用,如下所示:

Note that Directory.GetFiles will perform slowly for directories with large numbers of files. (Since it needs to allocate an array to hold of the filenames)
If you're using .Net 4.0, you can make it faster by calling the EnumerateFiles method instead, which uses an iterator to return a IEnumerable<string> that enumerates the directory as you run your loop.
You can also avoid the recursive scan calls with either method by passing the SearchOption parameter, like this:

foreach (string file in Directory.EnumerateFiles(rootDirectory, "*", SearchOption.AllDirectories))

这将递归扫描所有子目录,因此您只需要一个 foreach 循环.
请注意,这会加剧 GetFiles 的性能问题,因此您可能希望避免使用 .Net 4.0 之前的版本.

This will recursively scan all subdirectories, so you'll only need a single foreach loop.
Note that this will exacerbate the performance issues with GetFiles, so you may want to avoid this pre-.Net 4.0.

这篇关于C#中的多线程目录循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆