Python:相互依赖的进程/线程队列 [英] Python: interdependent process/thread queues

查看:147
本文介绍了Python:相互依赖的进程/线程队列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有四个队列,每个队列具有多个以下列方式相互依赖的进程/线程:

I have four queues that each have multiple processes/threads that are interdependent in the following way:

  1. 队列1正在从磁盘读取文件并将其复制到RAM
  2. 队列2将文件提取到RAM中并对其执行操作
  3. 队列3获取队列2的结果并对其执行单独的操作
  4. 队列4将最终结果写回到磁盘

我希望这4个队列尽可能并行运行,但需要注意的是,队列2必须等待队列1在其上放置至少一个进程/线程(类似地,队列2必须将项目放置在队列中) 3,并在4上排队3).

I would like these 4 queues to operate in parallel as much as possible with the caveat that Queue 2 has to wait for Queue 1 to place at least one process/thread on it (and similarly queue 2 has to place items on queue 3, and queue 3 on 4).

在Python中实现此目标的最佳方法是什么(对于队列和线程/进程实现)?

What is the best way in Python to go about implementing this (both for the queue and for the thread/process implementation)?

如果我使用线程,队列2和队列3是否会由于GIL而互相阻塞?我读到I/O和计算仍然可以并行进行,所以我没问题,即使队列1/2/4可以并行工作,并且队列3与队列2是连续的.

Will queue 2 and queue 3 block each other due to GIL if I use threads? I read that I/O and compute can still happen in parallel so I am ok even if Queue 1/2/4 can work in parallel, and queue 3 is sequential with queue 2.

推荐答案

您实际上是否有任何特定原因需要将这4个步骤中的每个步骤分别设置为单独的线程/进程?就我个人而言,我只需要在一个函数/可调用类中实现所有四个步骤,然后使用multiprocessing.Pool的映射就可以在感兴趣的文件名上并行调用该函数.

Is there any particular reason you actually need each of those 4 steps be separate threads/processes? Personally I'd just implement all 4 steps in one function/callable class, and then use multiprocessing.Pool's map to invoke the function in parallel over the filenames of interest.

Q & A .如答案所示,如果它似乎是I/O瓶颈而不是处理瓶颈,那么只需在池中创建更多进程即可.

Simpler example of this sort of pattern (just reading and processing) discussed in this Q&A. As the answer notes, if it appears to bottleneck on I/O rather than processing, just create more processes in the pool.

这篇关于Python:相互依赖的进程/线程队列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆