共享内存以便在Python中进行多处理的更好方法? [英] Better way to share memory for multiprocessing in Python?

查看:76
本文介绍了共享内存以便在Python中进行多处理的更好方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经解决这个问题了一个星期了,并且它变得越来越令人沮丧,因为每次我实现一个我需要做的更简单但类似规模的示例时,事实证明,多处理将使它变得更加混乱.它处理共享内存的方式令我感到困惑,因为它是如此有限,它很快就会变得毫无用处.

I have been tackling this problem for a week now and it's been getting pretty frustrating because every time I implement a simpler but similar scale example of what I need to do, it turns out multiprocessing will fudge it up. The way it handles shared memory baffles me because it is so limited, it can become useless quite rapidly.

因此,对我的问题的基本描述是,我需要创建一个进程,该进程将传入一些参数以打开图像并创建大约20K大小为60x40的色块.这些补丁一次被保存到列表2中,需要返回到主线程,然后由运行在GPU上的其他2个并发进程再次处理.

So the basic description of my problem is that I need to create a process that gets passed in some parameters to open an image and create about 20K patches of size 60x40. These patches are saved into a list 2 at a time and need to be returned to the main thread to then be processed again by 2 other concurrent processes that run on the GPU.

过程和工作流以及所有需要照顾的一切,现在我需要的是最容易实现的部分.我一直无法保存并将包含20K补丁的列表返回到主线程.

The process and the workflow and all that are mostly taken care of, what I need now is the part that was supposed to be the easiest is turning out to be the most difficult. I have not been able to save and get the list with 20K patches back to the main thread.

第一个问题是因为我将这些补丁保存为PIL图像.然后,我发现必须对添加到Queue对象的所有数据进行腌制. 第二个问题是我然后将补丁转换为每个60x40的阵列并将其保存到列表中.现在那还是行不通的?显然,队列只能保存有限数量的数据,否则当您调用queue_obj.get()时,程序将挂起.

First problem was because I was saving these patches as PIL images. I then found out all data added to a Queue object has to be pickled. Second problem was I then converted the patches to an array of 60x40 each and saved them to a list. And now that still doesn't work? Apparently Queues have a limited amount of data they can save otherwise when you call queue_obj.get() the program hangs.

我已经尝试了许多其他方法,但是我尝试的每项新方法都无法正常工作,因此我想知道是否有人对我的图书馆提出了其他建议,可以用来共享对象而不会产生任何混乱?

I have tried many other things, and every new thing I try does not work, so I would like to know if anyone has other recommendations of a library I can use to share objects without all the fuzz?

这是我正在查看的示例实现.请记住,这完全可以正常工作,但是完整的实现却不能.而且,我确实有代码打印参考消息,以确保所保存的数据具有完全相同的形状和所有内容,但是由于某些原因它无法正常工作.在完整的实现中,独立过程成功完成,但冻结在q.get().

Here is a sample implementation of kind of what I'm looking at. Keep in mind this works perfectly fine, but the full implementation doesn't. And I do have the code print informational messages to see that the data being saved has the exact same shape and everything, but for some reason it doesn't work. In the full implementation the independent process completes successfully but freezes at q.get().

from PIL import Image
from multiprocessing import Queue, Process
import StringIO
import numpy

img = Image.open("/path/to/image.jpg")
q = Queue()
q2 = Queue()
#
#
# MAX Individual Queue limit for 60x40 images in BW is 31,466.
# Multiple individual Queues can be filled to the max limit of 31,466.
# A single Queue can only take up to 31,466, even if split up in different puts.
def rz(patch, qn1, qn2):
    totalPatchCount = 20000
    channels = 1
    patch = patch.resize((60,40), Image.ANTIALIAS)
    patch = patch.convert('L')
    # ImgArray = numpy.asarray(im, dtype=numpy.float32)
    list_im_arr = []
    # ----Create a 4D Array
    # returnImageArray = numpy.zeros(shape=(totalPatchCount, channels, 40, 60))
    imgArray = numpy.asarray(patch, dtype=numpy.float32)
    imgArray = imgArray[numpy.newaxis, ...]
    # ----End 4D array
    # list_im_arr2 = []
    for i in xrange(totalPatchCount):
        # returnImageArray[i] = imgArray
        list_im_arr.append(imgArray)
    qn1.put(list_im_arr)
    qn1.cancel_join_thread()
    # qn2.cancel_join_thread()
    print "PROGRAM Done"

# rz(img,q,q2)
# l = q.get()

#
p = Process(target=rz,args=(img, q, q2,))
p.start()
p.join()
#
# # l = []
# # for i in xrange(1000): l.append(q.get())
#
imdata = q.get()

推荐答案

队列用于进程之间的通信.就您而言,您实际上并没有这种沟通方式.您可以简单地让进程返回结果,并使用.get()方法来收集它们. (请记住要添加if __name__ == "main":,请参见编程指南)

Queue is for communication between processes. In your case, you don't really have this kind of communication. You can simply let the process return result, and use the .get() method to collect them. (Remember to add if __name__ == "main":, see programming guideline)

from PIL import Image
from multiprocessing import Pool, Lock
import numpy

img = Image.open("/path/to/image.jpg")

def rz():
    totalPatchCount = 20000
    imgArray = numpy.asarray(patch, dtype=numpy.float32)
    list_im_arr = [imgArray] * totalPatchCount  # A more elegant way than a for loop
    return list_im_arr

if __name__ == '__main__':  
    # patch = img....  Your code to get generate patch here
    patch = patch.resize((60,40), Image.ANTIALIAS)
    patch = patch.convert('L')

    pool = Pool(2)
    imdata = [pool.apply_async(rz).get() for x in range(2)]
    pool.close()
    pool.join()

现在,根据此帖子的第一个答案,多重处理只会传递可腌制的对象.在多处理中,酸洗是不可避免的,因为进程不共享内存.他们根本不生活在同一个宇宙中. (它们在初次生成时确实继承了内存,但是它们无法到达自己的宇宙之外). PIL图像对象本身不可腌制.您可以通过仅提取存储在其中的图像数据来使其可腌制,如建议的帖子所示.

Now, according to first answer of this post, multiprocessing only pass objects that's picklable. Pickling is probably unavoidable in multiprocessing because processes don't share memory. They simply don't live in the same universe. (They do inherit memory when they're first spawned, but they can not reach out of their own universe). PIL image object itself is not picklable. You can make it picklable by extracting only the image data stored in it, like this post suggested.

由于您的问题主要是受I/O约束,因此您也可以尝试多线程.为了您的目的,它可能甚至更快.线程共享所有内容,因此不需要酸洗.如果您使用的是python 3,ThreadPoolExecutor是一个很棒的工具.对于Python 2,可以使用ThreadPool.为了获得更高的效率,您将不得不重新安排您的工作方式,想要分拆流程并让不同的线程来完成工作.

Since your problem is mostly I/O bound, you can also try multi-threading. It might be even faster for your purpose. Threads share everything so no pickling is required. If you're using python 3, ThreadPoolExecutor is a wonderful tool. For Python 2, you can use ThreadPool. To achieve higher efficiency, you'll have to rearrange how you do things, you want to break-up the process and let different threads do the job.

from PIL import Image
from multiprocessing.pool import ThreadPool
from multiprocessing import Lock
import numpy

img = Image.open("/path/to/image.jpg")
lock = Lock():
totalPatchCount = 20000

def rz(x):
    patch = ...
    return patch

pool = ThreadPool(8)
imdata = [pool.map(rz, range(totalPatchCount)) for i in range(2)]
pool.close()
pool.join()

这篇关于共享内存以便在Python中进行多处理的更好方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆