Python:相互依赖的进程/线程队列 [英] Python: interdependent process/thread queues
问题描述
我有四个队列,每个队列具有多个以下列方式相互依赖的进程/线程:
I have four queues that each have multiple processes/threads that are interdependent in the following way:
- 队列1正在从磁盘读取文件并将其复制到RAM
- 队列2将文件提取到RAM中并对其执行操作
- 队列3获取队列2的结果并对其执行单独的操作
- 队列4将最终结果写回到磁盘
我希望这4个队列尽可能并行运行,但需要注意的是,队列2必须等待队列1在其上放置至少一个进程/线程(类似地,队列2必须将项目放置在队列中) 3,并在4上排队3).
I would like these 4 queues to operate in parallel as much as possible with the caveat that Queue 2 has to wait for Queue 1 to place at least one process/thread on it (and similarly queue 2 has to place items on queue 3, and queue 3 on 4).
在Python中实现此目标的最佳方法是什么(对于队列和线程/进程实现)?
What is the best way in Python to go about implementing this (both for the queue and for the thread/process implementation)?
如果我使用线程,队列2和队列3是否会由于GIL而互相阻塞?我读到I/O和计算仍然可以并行进行,所以我没问题,即使队列1/2/4可以并行工作,并且队列3与队列2是连续的.
Will queue 2 and queue 3 block each other due to GIL if I use threads? I read that I/O and compute can still happen in parallel so I am ok even if Queue 1/2/4 can work in parallel, and queue 3 is sequential with queue 2.
推荐答案
您实际上是否有任何特定原因需要将这4个步骤中的每个步骤分别设置为单独的线程/进程?就我个人而言,我只需要在一个函数/可调用类中实现所有四个步骤,然后使用multiprocessing.Pool的映射就可以在感兴趣的文件名上并行调用该函数.
Is there any particular reason you actually need each of those 4 steps be separate threads/processes? Personally I'd just implement all 4 steps in one function/callable class, and then use multiprocessing.Pool's map to invoke the function in parallel over the filenames of interest.
此 Q & A .如答案所示,如果它似乎是I/O瓶颈而不是处理瓶颈,那么只需在池中创建更多进程即可.
Simpler example of this sort of pattern (just reading and processing) discussed in this Q&A. As the answer notes, if it appears to bottleneck on I/O rather than processing, just create more processes in the pool.
这篇关于Python:相互依赖的进程/线程队列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!