python multiprocessing apply_async仅使用一个进程 [英] python multiprocessing apply_async only uses one process

查看:124
本文介绍了python multiprocessing apply_async仅使用一个进程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个脚本,其中包括从列表中打开文件,然后对该文件中的文本进行处理.我正在使用python multiprocessing和Pool尝试并行化此操作.脚本的摘要如下:

I have a script that includes opening a file from a list and then doing something to the text within that file. I'm using python multiprocessing and Pool to try to parallelize this operation. A abstraction of the script is below:

import os
from multiprocessing import Pool

results = []
def testFunc(files):
    for file in files:
        print "Working in Process #%d" % (os.getpid())
        #This is just an illustration of some logic. This is not what I'm actually doing.
        for line in file:
            if 'dog' in line:
                results.append(line)

if __name__=="__main__":
    p = Pool(processes=2)
    files = ['/path/to/file1.txt', '/path/to/file2.txt']
    results = p.apply_async(testFunc, args = (files,))
    results2 = results.get()

当我运行此命令时,每次迭代的进程ID的打印均相同.基本上,我想做的是获取输入列表中的每个元素并将其分叉到一个单独的进程中,但是似乎一个进程正在完成所有工作.

When I run this the print out of the process id is the same for each iteration. Basically what I'm trying to do is take each element of the input list and fork it out to a separate process, but it seems like one process is doing all of the work.

推荐答案

  • apply_async将一项任务分配到池中.您需要致电 apply_async多次使用更多处理器.
  • 不允许两个进程尝试写入同一列表, results.由于池工作人员是独立的流程,因此这两个 将不会写入相同的列表.解决此问题的一种方法是使用输出队列.您可以自己设置,也可以使用apply_async的回调为您设置队列.函数完成后,apply_async将调用回调.
  • 您可以使用map_async而不是apply_async,但是随后您会 获取列表列表,然后必须将其展平.
    • apply_async farms out one task to the pool. You would need to call apply_async many times to exercise more processors.
    • Don't allow both processes to try to write to the same list, results. Since the pool workers are separate processes, the two won't be writing to the same list. One way to work around this is to use an ouput Queue. You could set it up yourself, or use apply_async's callback to setup the Queue for you. apply_async will call the callback once the function completes.
    • You could use map_async instead of apply_async, but then you'd get a list of lists, which you'd then have to flatten.
    • 因此,也许尝试改用类似的方法:

      So, perhaps try instead something like:

      import os
      import multiprocessing as mp
      
      results = []   
      
      def testFunc(file):
          result = []
          print "Working in Process #%d" % (os.getpid())
          # This is just an illustration of some logic. This is not what I'm
          # actually doing.
          with open(file, 'r') as f:
              for line in f:
                  if 'dog' in line:
                      result.append(line)
          return result
      
      
      def collect_results(result):
          results.extend(result)
      
      if __name__ == "__main__":
          p = mp.Pool(processes=2)
          files = ['/path/to/file1.txt', '/path/to/file2.txt']
          for f in files:
              p.apply_async(testFunc, args=(f, ), callback=collect_results)
          p.close()
          p.join()
          print(results)
      

      这篇关于python multiprocessing apply_async仅使用一个进程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆