Python多处理进程过了一会儿就睡着了 [英] Python multiprocessing processes sleep after a while

查看:482
本文介绍了Python多处理进程过了一会儿就睡着了的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个脚本,该脚本贯穿目录运行,并以给定的字符串(即.xml)搜索所有文件,并替换给定的字符串.为此,我使用了python多处理库.

I have a script that runs through a directory and searches all files with a given ending (i.e. .xml) for given strings and replaces them. To achieve this I used the python multiprocessing library.

作为示例,我使用的是1100个.xml文件,其中包含大约200MB的数据.在我的MBP '15 15上,完整的执行时间是8分钟.

As an example I am using 1100 .xml files with around 200MB of data. The complete execution time is 8 minutes on my MBP '15 15".

但是几分钟后,过程的过程将进入睡眠状态,这在顶部"(在7m之后...)中看到.

But after some minutes, process for process is going to sleep which I see in "top" (here after 7m...).

PID   COMMAND      %CPU  TIME     #TH    #WQ  #PORT MEM    PURG   CMPR PGRP PPID STATE    BOOSTS         %CPU_ME %CPU_OTHRS
1007  Python       0.0   07:03.51 1      0    7     5196K  0B     0B   998  998  sleeping *0[1]          0.00000 0.00000
1006  Python       99.8  07:29.07 1/1    0    7     4840K  0B     0B   998  998  running  *0[1]          0.00000 0.00000
1005  Python       0.0   02:10.02 1      0    7     4380K  0B     0B   998  998  sleeping *0[1]          0.00000 0.00000
1004  Python       0.0   04:24.44 1      0    7     4624K  0B     0B   998  998  sleeping *0[1]          0.00000 0.00000
1003  Python       0.0   04:25.34 1      0    7     4572K  0B     0B   998  998  sleeping *0[1]          0.00000 0.00000
1002  Python       0.0   04:53.40 1      0    7     4612K  0B     0B   998  998  sleeping *0[1]          0.00000 0.00000

因此,现在只有一个进程正在完成所有工作,而其他进程则在4分钟后进入睡眠状态.

So now only one process is doing all the work while the others went asleep after 4 minutes.

# set cpu pool to cores in computer
pool_size = multiprocessing.cpu_count()

# create pool
pool = multiprocessing.Pool(processes=pool_size)

# give pool function and input data - here for each file in file_list
pool_outputs = pool.map(check_file, file_list)

# if no more tasks are available: close all
pool.close()
pool.join()

那为什么所有进程都进入睡眠状态?

So why are all processes going asleep?

我的猜测:文件列表与池中的所有Worker分开(每个都是相同的数量),只有少数几个是幸运的"来获得小文件-因此更早完成.这是真的吗?我只是在想,它的工作方式更像一个队列,以便每个工作人员在完成后都会获得一个新文件-直到列表为空.

My guess: The file list is separated to all Workers in the Pool (same amount each) and a fews are just "lucky" to get the small files - and therefore finish earlier. Can this be true? I Was just thinking that it works more like a Queue so that every worker gets a new file when it is finished - until the list is empty.

推荐答案

正如@ Felipe-Lema指出的那样,这是一种经典的RTFM.

As @Felipe-Lema pointed out it is a classical RTFM.

我使用多处理队列而不是Pool重新处理了脚本中提到的部分,并改善了运行时间:

I reworked the mentioned part of the script using a multiprocessing Queue instead of a Pool and improved the runtime:

def check_files(file_list):
    """Checks and replaces lines in files
    @param file_list: list of files to search
    @return counter: number of occurrence """

    # as much workers as CPUs are available (HT included)
    workers = multiprocessing.cpu_count()

    # create two queues: one for files, one for results
    work_queue = Queue()
    done_queue = Queue()
    processes = []

    # add every file to work queue
    for filename in file_list:
        work_queue.put(filename)

    # start processes
    for w in xrange(workers):
        p = Process(target=worker, args=(work_queue, done_queue))
        p.start()
        processes.append(p)
        work_queue.put('STOP')

    # wait until all processes finished
    for p in processes:
        p.join()

    done_queue.put('STOP')

    # beautify results and return them
    results = []
    for status in iter(done_queue.get, 'STOP'):
        if status is not None:
             results.append(status)

     return results

这篇关于Python多处理进程过了一会儿就睡着了的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆