Python多处理-如何拆分工作负载以提高速度? [英] Python multiprocessing - How can I split workload to get speed improvement?
问题描述
我正在编写一个简单的代码来裁剪图像并将其保存.
但是问题是图像的数量大约为150,000+,我想提高速度.
I am writing a simple code of cropping images and saving it.
But the problem is that the number of images is about 150,000+ and I want to improve the speed.
因此,起初我用简单的for循环编写了一个代码,如下所示:
So, at first I wrote a code with simple for loops, like the following:
import cv2
import numpy
import sys
textfile=sys.argv[1]
file_list=open(textfile)
files=file_list.read().split('\n')
idx=0
for eachfile in files:
image=cv2.imread(eachfile)
idx+=1
if image is None:
pass
outName=eachfile.replace('/data','/changed_data')
if image.shape[0]==256:
image1=image[120:170,120:170]
elif image.shape[0]==50:
image1=image
cv2.imwrite(outName,image1)
print idx,outName
此代码花费了大约38秒的时间才能拍摄90000张图像. 但是,使用双核比使用单核要花费更多的时间,对于相同的90000张图像,大约需要48秒.
This code took about 38 seconds for 90000 images. But, using dual core took more time than single process, about 48 seconds for the same 90000 images.
import cv2
import sys
import numpy
from multiprocessing import Pool
def crop(eachfile):
image=cv2.imread(eachfile)
idx+=1
if image is None:
pass
outName=eachfile.replace('/data','/changed_data')
if image.shape[0]==256:
image1=image[120:170,120:170]
elif image.shape[0]==50:
image1=image
cv2.imwrite(outName,image1)
print idx,outName
if __name__=='__main__':
textfile=sys.argv[1]
file_list=open(textfile)
files=file_list.read().split('\n')
pool=Pool(2)
pool.map(crop,files)
我是否在做正确的事情以加快流程?还是应该拆分列表并将每个列表发送给流程?
Am I doing the right thing for speeding up the process? Or should I split the list and send each list to the process?
任何评论都认为我的代码很棒!
Any comments regard my code would be great!!!
提前谢谢!
推荐答案
您确实应该将任务分为两个核心.尝试使用此示例代码稍作修改".可以在此处找到OP.您看到data
的地方就是您提供图像的钩子.使用multiprocessing
时,defs在类下不起作用...如果尝试使用pathos,则会从cPickle中得到错误信息,而最新的2.7版本存在一些令人讨厌的问题.在3.5或更高版本中不会出现.享受吧!
You should indeed split the task over two cores. Play around with this example code "slightly modified". OP can be found here. Where you see data
that is your hook providing your images. The defs don't work under class when using multiprocessing
... If your trying to use pathos...you'll get errors from cPickle... some nagging issue with latest 2.7 version. Doesn't occur in 3.5 or something. Enjoy!
import multiprocessing
def mp_worker((inputs, the_time)):
print " Process %s\tWaiting %s seconds" % (inputs, the_time)
time.sleep(int(the_time))
print " Process %s\tDONE" % inputs
sys.stdout.flush()
def mp_handler(): # Non tandem pair processing
p = multiprocessing.Pool(2)
p.map(mp_worker, data)
def mp_handler_tandem():
subdata = zip(data[0::2], data[1::2])
# print subdata
for task1, task2 in subdata:
p = multiprocessing.Pool(2)
p.map(mp_worker, (task1, task2))
#data = (['a', '1'], ['b', '2'], ['c', '3'], ['d', '4'])
data = (['a', '2'], ['b', '3'], ['c', '1'], ['d', '4'],
['e', '1'], ['f', '2'], ['g', '3'], ['h', '4'])
if __name__ == '__main__':
sys.stdout.flush()
# print 'mp_handler():'
# mp_handler()
# print '---'
# time.sleep(2)
# print '\nmp_handler_tandem():'
# mp_handler_tandem()
print '---'
# time.sleep(2)
Multiprocess().qmp_handler()
在编辑器中工作:使用
sys.stdout.flush()
可以在发生输出时将输出刷新到屏幕上.
working within an editor: use
sys.stdout.flush()
to flush your output to screen while it happens.
但是也请在此处使用内核和拆分作业进行检查.
But check also here using kernels and splitting jobs.
这篇关于Python多处理-如何拆分工作负载以提高速度?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!