Python:使用多处理池时,写入具有队列的单个文件 [英] Python: Writing to a single file with queue while using multiprocessing Pool

查看:247
本文介绍了Python:使用多处理池时,写入具有队列的单个文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有成千上万的文本文件,希望通过各种方式进行解析.我想将输出保存到单个文件中,而不会出现同步问题.我一直在使用多处理池来节省时间,但是我不知道如何结合使用池和队列.

I have hundreds of thousands of text files that I want to parse in various ways. I want to save the output to a single file without synchronization problems. I have been using multiprocessing pool to do this to save time, but I can't figure out how to combine Pool and Queue.

以下代码将保存文件名和文件中连续"x"的最大数目.但是,我希望所有进程都将结果保存到同一文件中,而不是像示例中那样保存到不同文件中.任何帮助,将不胜感激.

The following code will save the infile name as well as the maximum number of consecutive "x"s in the file. However, I want all processes to save results to the same file, and not to different files as in my example. Any help on this would be greatly appreciated.

import multiprocessing

with open('infilenamess.txt') as f:
    filenames = f.read().splitlines()

def mp_worker(filename):
 with open(filename, 'r') as f:
      text=f.read()
      m=re.findall("x+", text)
      count=len(max(m, key=len))
      outfile=open(filename+'_results.txt', 'a')
      outfile.write(str(filename)+'|'+str(count)+'\n')
      outfile.close()

def mp_handler():
    p = multiprocessing.Pool(32)
    p.map(mp_worker, filenames)

if __name__ == '__main__':
    mp_handler()

推荐答案

多处理池为您实现了一个队列.只需使用将工作程序返回值返回给调用方的池方法即可. imap运作良好:

Multiprocessing pools implement a queue for you. Just use a pool method that returns the worker return value to the caller. imap works well:

import multiprocessing 
import re

def mp_worker(filename):
    with open(filename) as f:
        text = f.read()
    m = re.findall("x+", text)
    count = len(max(m, key=len))
    return filename, count

def mp_handler():
    p = multiprocessing.Pool(32)
    with open('infilenamess.txt') as f:
        filenames = [line for line in (l.strip() for l in f) if line]
    with open('results.txt', 'w') as f:
        for result in p.imap(mp_worker, filenames):
            # (filename, count) tuples from worker
            f.write('%s: %d\n' % result)

if __name__=='__main__':
    mp_handler()

这篇关于Python:使用多处理池时,写入具有队列的单个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆