并行处理队列的好策略是什么? [英] What's a good strategy for processing a queue in parallel?

查看:65
本文介绍了并行处理队列的好策略是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个程序,该程序需要递归搜索文件夹结构,并希望与多个线程并行进行搜索.

I'm writing a program which needs to recursively search through a folder structure, and would like to do so in parallel with several threads.

我已经写了一个相当琐碎的同步方法-首先将根目录添加到队列中,然后使目录出列,对其子目录进行排队等,直到队列为空.我将使用 ConcurrentQueue< T> >作为我的队列,但已经意识到我的循环将过早停止.第一个线程将使根目录出队,每个其他线程立即可以看到队列为空并退出,从而使第一个线程成为唯一一个正在运行的线程.我希望每个线程循环执行直到队列为空,然后等到另一个线程对更多目录进行排队,然后继续进行下去.我在循环中需要某种检查点,以便在每个线程都到达循环末尾之前,所有线程都不会退出,但是我不确定在没有更多目录可进行时不死锁的最佳方法过程.

I've written the rather trivial synchronous method already - adding the root directory to the queue initially, then dequeuing a directory, queuing its subdirectories, etc., until the queue is empty. I'll use a ConcurrentQueue<T> for my queue, but have already realized that my loops will stop prematurely. The first thread will dequeue the root directory, and immediately every other thread could see that the queue is empty and exit, leaving the first thread as the only one running. I would like each thread to loop until the queue is empty, then wait until another thread queues some more directories, and keep going. I need some sort of checkpoint in my loop so that none of the threads will exit until every thread has reached the end of the loop, but I'm not sure the best way to do this without deadlocking when there really are no more directories to process.

推荐答案

使用任务并行库.

创建 Task 处理第一个文件夹.在此过程中,创建一个 Task 来处理每个子文件夹(递归),并为每个相关文件创建一个任务.然后等待所有此文件夹的任务.

Create a Task to process the first folder. In this create a Task to process each subfolder (recursively) and a task for each relevant file. Then wait on all the tasks for this folder.

TPL运行时将利用线程池,避免创建线程,这是一项昂贵的操作.用于小件作品.

The TPL runtime will make use of the thread pool avoiding creating threads, which is an expensive operation. for small pieces of work.

注意:

  • 如果每个文件的工作都很琐碎,请直接内联而不是创建另一个任务(IO性能将是限制因素).
  • 如果避免阻塞操作,这种方法通常效果最好,但是如果IO性能受到限制,那么无论如何都不要紧&.
  • 在.NET 4之前,许多操作都可以通过线程池完成,但是您需要使用
  • If the work per file is trivial do it inline rather than creating another task (IO performance will be the limiting factor).
  • This approach will generally work best if blocking operations are avoided, but if IO performance is the limit then this might not matter anyway—start simple and measure.
  • Before .NET 4 much of this can be done with the thread pool, but you'll need to use events to wait for tasks to complete, and that waiting will tie up thread pool threads.1

1 据我了解,在TPL中,当等待任务时-使用TPL方法-TPL会将该线程重用于其他任务,直到完成等待.

1 As I understand it, in the TPL when waiting on tasks—using a TPL method—TPL will reuse that thread for other tasks until the wait is fulfilled.

这篇关于并行处理队列的好策略是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆