Python多处理-如何拆分工作负载以提高速度? [英] Python multiprocessing - How can I split workload to get speed improvement?

查看:85
本文介绍了Python多处理-如何拆分工作负载以提高速度?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个简单的代码来裁剪图像并将其保存.
但是问题是图像的数量大约为150,000+,我想提高速度.

I am writing a simple code of cropping images and saving it.
But the problem is that the number of images is about 150,000+ and I want to improve the speed.

因此,起初我用简单的for循环编写了一个代码,如下所示:

So, at first I wrote a code with simple for loops, like the following:

import cv2
import numpy
import sys

textfile=sys.argv[1]
file_list=open(textfile)
files=file_list.read().split('\n')
idx=0
for eachfile in files:
    image=cv2.imread(eachfile)
    idx+=1
    if image is None:
        pass
    outName=eachfile.replace('/data','/changed_data')
    if image.shape[0]==256:
        image1=image[120:170,120:170]
    elif image.shape[0]==50:
        image1=image
    cv2.imwrite(outName,image1)
    print idx,outName

此代码花费了大约38秒的时间才能拍摄90000张图像. 但是,使用双核比使用单核要花费更多的时间,对于相同的90000张图像,大约需要48秒.

This code took about 38 seconds for 90000 images. But, using dual core took more time than single process, about 48 seconds for the same 90000 images.

import cv2
import sys
import numpy
from multiprocessing import Pool

def crop(eachfile):
    image=cv2.imread(eachfile)
    idx+=1
    if image is None:
        pass
    outName=eachfile.replace('/data','/changed_data')
    if image.shape[0]==256:
        image1=image[120:170,120:170]
    elif image.shape[0]==50:
        image1=image
    cv2.imwrite(outName,image1)
    print idx,outName


if __name__=='__main__':
    textfile=sys.argv[1]
    file_list=open(textfile)
    files=file_list.read().split('\n')
    pool=Pool(2)
    pool.map(crop,files)

我是否在做正确的事情以加快流程?还是应该拆分列表并将每个列表发送给流程?

Am I doing the right thing for speeding up the process? Or should I split the list and send each list to the process?

任何评论都认为我的代码很棒!

Any comments regard my code would be great!!!

提前谢谢!

推荐答案

您确实应该将任务分为两个核心.尝试使用此示例代码稍作修改".可以在此处找到OP.您看到data的地方就是您提供图像的钩子.使用multiprocessing时,defs在类下不起作用...如果尝试使用pathos,则会从cPickle中得到错误信息,而最新的2.7版本存在一些令人讨厌的问题.在3.5或更高版本中不会出现.享受吧!

You should indeed split the task over two cores. Play around with this example code "slightly modified". OP can be found here. Where you see data that is your hook providing your images. The defs don't work under class when using multiprocessing... If your trying to use pathos...you'll get errors from cPickle... some nagging issue with latest 2.7 version. Doesn't occur in 3.5 or something. Enjoy!

import multiprocessing

def mp_worker((inputs, the_time)):
    print " Process %s\tWaiting %s seconds" % (inputs, the_time)
    time.sleep(int(the_time))
    print " Process %s\tDONE" % inputs
    sys.stdout.flush()

def mp_handler():                           # Non tandem pair processing
    p = multiprocessing.Pool(2)
    p.map(mp_worker, data)

def mp_handler_tandem():
    subdata = zip(data[0::2], data[1::2])
#    print subdata
    for task1, task2 in subdata:
        p = multiprocessing.Pool(2)
        p.map(mp_worker, (task1, task2))

#data = (['a', '1'], ['b', '2'], ['c', '3'], ['d', '4'])
data = (['a', '2'], ['b', '3'], ['c', '1'], ['d', '4'], 
        ['e', '1'], ['f', '2'], ['g', '3'], ['h', '4'])

if __name__ == '__main__':
    sys.stdout.flush()
#    print 'mp_handler():'
#    mp_handler()
#    print '---'
#    time.sleep(2)

#    print '\nmp_handler_tandem():'
#    mp_handler_tandem()
    print '---'
#    time.sleep(2)

    Multiprocess().qmp_handler()

在编辑器中工作:使用sys.stdout.flush()可以在发生输出时将输出刷新到屏幕上.

working within an editor: use sys.stdout.flush() to flush your output to screen while it happens.

但是也请在此处使用内核和拆分作业进行检查.

But check also here using kernels and splitting jobs.

这篇关于Python多处理-如何拆分工作负载以提高速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆