与theano一起使用多处理 [英] using multiprocessing with theano

查看:81
本文介绍了与theano一起使用多处理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将theano与带有神经网络库Keras的cpu多重处理一起使用.

我使用device=gpu标志并加载keras模型.然后使用多重处理池提取超过一百万张图像的特征.

该函数看起来像这样:

from keras import backend as K

f = K.function([model.layers[0].input, K.learning_phase()], [model.layers[-3].output,])

def feature_gen(flz):
    im = imread(flz)
    cPickle.dump(f([im, 0])[0][0], open(flz, 'wb'), -1)

pool = mp.Pool(processes=10)
results = [pool.apply_async(feature_gen, args=(f, )) for f in filelist]]

但是,这开始在GPU内存中创建池,并且我的代码因内存错误而失败.是否可以强制进行多处理以在CPU内存中创建线程,然后使用特定部分来进行特征提取,例如使用GPU的f([im, 0])[0][0]?

如果没有,是否有另一种方法可以在python中并行执行相同的操作?

解决方案

如果其他进程不使用keras,则可以使用多个进程,据我所知,您需要将keras的使用限制为单个进程.这似乎包括所有keras类和方法,甚至包括那些似乎不使用gpu的类和方法,例如ImageDataGenerator.

如果工作负载受GPU限制,则也可以使用线程库,该库创建线程而不是进程,例如在GPU处理前一批数据时加载数据,则该限制不适用.由于全局解释器锁定,这在CPU受限的环境中不是解决方案.

您的情况看起来像是并行的[读取,在GPU上工作,写入].可以将其重新构建为管道,例如读取一些进程,执行GPU工作的主要进程,以及写入的一些进程.

  1. 创建用于输入/输出(threading.Queue或multiprocessing.Queue)的Queue对象
  2. 创建后台工作线程/进程,以从磁盘读取数据并将其馈送到输入队列
  3. 创建后台工作线程/进程,以将数据从输出队列写入磁盘
  4. 主循环,该循环从输入队列中获取数据,创建批处理,在gpu上处理数据并填充输出队列

I'm trying to use theano with cpu-multiprocessing with a neural network library, Keras.

I use device=gpu flag and load the keras model. Then for extracting features for over a million images, im using multiprocessing pool.

The function looks something like this:

from keras import backend as K

f = K.function([model.layers[0].input, K.learning_phase()], [model.layers[-3].output,])

def feature_gen(flz):
    im = imread(flz)
    cPickle.dump(f([im, 0])[0][0], open(flz, 'wb'), -1)

pool = mp.Pool(processes=10)
results = [pool.apply_async(feature_gen, args=(f, )) for f in filelist]]

This however starts creating pools in the GPU memory and my code fails with memory error. Is it possible to force multiprocessing to create threads in CPU memory and then use specific parts for feature extraction such as f([im, 0])[0][0] with GPU?

If not, is there an alternative to do the same thing in parallel in python?

解决方案

It is possible to use multiple processes if the other processes do not use keras, to my knowledge you need to restrict usage of keras to a single process. This seems to include all keras classes and methods, even those who do not seem to use the gpu, e.g. ImageDataGenerator.

If the workload is GPU limited it is also possible to use the threading library, which creates threads instead of processes, e.g. to load data while the GPU processes the previous batch, then the restriction does not apply. Due to the global interpreter lock this is not a solution in CPU limited environments.

Your situation looks like a parallel [read, do work on GPU, write]. This can be reformed to a pipeline, e.g. some processes reading, the main process performing GPU work and some processes writing.

  1. Create Queue objects for input/output (threading.Queue or multiprocessing.Queue)
  2. Create background worker threads/processes which read data from disk and feed it to the input queue
  3. Create background worker threads/processes which write data from the output queue to disk
  4. main loop which takes data from the input queue, creates batches, processes the data on the gpu and fills the output queue

这篇关于与theano一起使用多处理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆