使用Keras ImageDataGenerator时出现内存错误 [英] Memory error when using Keras ImageDataGenerator

查看:53
本文介绍了使用Keras ImageDataGenerator时出现内存错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用具有TensorFlow后端的keras预测图像中的特征.具体来说,我正在尝试使用 ImageDataGenerator 的keras.该模型设置为可以运行4个纪元,并且可以正常运行,直到第4个纪元以MemoryError失败.

I am attempting to predict features in imagery using keras with a TensorFlow backend. Specifically, I am attempting to use a keras ImageDataGenerator. The model is set to run for 4 epochs and runs fine until the 4th epoch where it fails with a MemoryError.

我正在运行Ubuntu Server的AWS g2.2xlarge 实例上运行此模型16.04 LTS(HVM),SSD卷类型.

I am running this model on an AWS g2.2xlarge instance running Ubuntu Server 16.04 LTS (HVM), SSD Volume Type.

训练图像是256x256 RGB像素图块(8位无符号),训练掩码是256x256单波段(8位无符号)图块数据,其中255 ==感兴趣的特征,0 ==其他所有内容.

The training images are 256x256 RGB pixel tiles (8 bit unsigned) and the training mask is 256x256 single band (8 bit unsigned) tiled data where 255 == a feature of interest and 0 == everything else.

以下3个功能与该错误有关.

The following 3 functions are the ones pertinent to this error.

如何解决此MemoryError?

How can I resolve this MemoryError?

def train_model():
        batch_size = 1
        training_imgs = np.lib.format.open_memmap(filename=os.path.join(data_path, 'data.npy'),mode='r+')
        training_masks = np.lib.format.open_memmap(filename=os.path.join(data_path, 'mask.npy'),mode='r+')
        dl_model = create_model()
        print(dl_model.summary())
        model_checkpoint = ModelCheckpoint(os.path.join(data_path,'mod_weight.hdf5'), monitor='loss',verbose=1, save_best_only=True)
        dl_model.fit_generator(generator(training_imgs, training_masks, batch_size), steps_per_epoch=(len(training_imgs)/batch_size), epochs=4,verbose=1,callbacks=[model_checkpoint])

def generator(train_imgs, train_masks=None, batch_size=None):

# Create empty arrays to contain batch of features and labels#

        if train_masks is not None:
                train_imgs_batch = np.zeros((batch_size,y_to_res,x_to_res,bands))
                train_masks_batch = np.zeros((batch_size,y_to_res,x_to_res,1))

                while True:
                        for i in range(batch_size):
                                # choose random index in features
                                index= random.choice(range(len(train_imgs)))
                                train_imgs_batch[i] = train_imgs[index]
                                train_masks_batch[i] = train_masks[index]
                        yield train_imgs_batch, train_masks_batch
        else:
                rec_imgs_batch = np.zeros((batch_size,y_to_res,x_to_res,bands))
                while True:
                        for i in range(batch_size):
                                # choose random index in features
                                index= random.choice(range(len(train_imgs)))
                                rec_imgs_batch[i] = train_imgs[index]
                        yield rec_imgs_batch

def train_generator(train_images,train_masks,batch_size):
        data_gen_args=dict(rotation_range=90.,horizontal_flip=True,vertical_flip=True,rescale=1./255)
        image_datagen = ImageDataGenerator()
        mask_datagen = ImageDataGenerator()
# # Provide the same seed and keyword arguments to the fit and flow methods
        seed = 1
        image_datagen.fit(train_images, augment=True, seed=seed)
        mask_datagen.fit(train_masks, augment=True, seed=seed)
        image_generator = image_datagen.flow(train_images,batch_size=batch_size)
        mask_generator = mask_datagen.flow(train_masks,batch_size=batch_size)
        return zip(image_generator, mask_generator)


以下是模型的输出,详细描述了历元和错误消息:


The following os the output from the model detailing the epochs and the error message:

Epoch 00001: loss improved from inf to 0.01683, saving model to /home/ubuntu/deep_learn/client_data/mod_weight.hdf5
Epoch 2/4
7569/7569 [==============================] - 3394s 448ms/step - loss: 0.0049 - binary_crossentropy: 0.0027 - jaccard_coef_int: 0.9983  

Epoch 00002: loss improved from 0.01683 to 0.00492, saving model to /home/ubuntu/deep_learn/client_data/mod_weight.hdf5
Epoch 3/4
7569/7569 [==============================] - 3394s 448ms/step - loss: 0.0049 - binary_crossentropy: 0.0026 - jaccard_coef_int: 0.9982  

Epoch 00003: loss improved from 0.00492 to 0.00488, saving model to /home/ubuntu/deep_learn/client_data/mod_weight.hdf5
Epoch 4/4
7569/7569 [==============================] - 3394s 448ms/step - loss: 0.0074 - binary_crossentropy: 0.0042 - jaccard_coef_int: 0.9975  

Epoch 00004: loss did not improve
Traceback (most recent call last):
  File "image_rec.py", line 291, in <module>
    train_model()
  File "image_rec.py", line 208, in train_model
    dl_model.fit_generator(train_generator(training_imgs,training_masks,batch_size),steps_per_epoch=1,epochs=1,workers=1)
  File "image_rec.py", line 274, in train_generator
    image_datagen.fit(train_images, augment=True, seed=seed)
  File "/home/ubuntu/pyvirt_test/local/lib/python2.7/site-packages/keras/preprocessing/image.py", line 753, in fit
    x = np.copy(x)
  File "/home/ubuntu/pyvirt_test/local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 1505, in copy
    return array(a, order=order, copy=True)
MemoryError

推荐答案

似乎您的问题是由于数据太大.我可以看到两种解决方案.第一个是通过spark在分布式系统中运行您的代码,我想您可能没有这种支持,所以让我们继续学习另一个.

it seems your problem is due to the data is too huge. I can see two solutions. The first one is run your code in a distributed system by means of spark, I guess you do not have this support, so let us move on to the other.

第二个是我认为可行的.我将对数据进行切片,然后尝试逐步添加模型.我们可以通过 Dask 完成此操作.该库可以对数据进行切片并保存在对象中,然后您可以从磁盘中检索所需的内容,而仅是所需的部分.

The second one is which I think is viable. I would slice the data and I would try feeding the model incrementally. We can do this with Dask. This library can slice the data and save in objects which then you can retrieve reading from disk, only in the part you want.

如果您的图像尺寸为100x100的矩阵,我们可以检索每个数组,而无需将100个数组加载到内存中.我们可以在内存中逐个数组加载(释放前一个),这将是您神经网络的输入.

If you have a image which size is an matrix of 100x100, we can retrieve each array without the needed to load the 100 arrays in memory. We can load array by array in memory (releasing the previous one), which would be the input in your Neural Network.

为此,您可以将np.array转换为dask array并分配分区.例如:

To do this, you can to transform your np.array to dask array and assign the partitions. For example:

>>> k = np.random.randn(10,10) # Matrix 10x10
>>> import dask.array as da
>>> k2 = da.from_array(k,chunks = 3)
dask.array<array, shape=(10, 10), dtype=float64, chunksize=(3, 3)>
>>> k2.to_delayed()
array([[Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 0)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 1)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 2)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 3))],
   [Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 0)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 1)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 2)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 3))],
   [Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 0)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 1)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 2)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 3))],
   [Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 0)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 1)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 2)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 3))]],
  dtype=object)

在这里,您可以看到如何将数据保存在对象中,然后可以检索零件以提供模型.

Here, you can see how the data is saved in objects, and then you can retrieve in parts to feed your model.

要实现此解决方案,您必须在函数中引入一个循环,该循环调用每个分区并馈送NN以获得增量训练.

To implement this solution you must introduce a loop in your function which call each partition and feed the NN to get the incremental trainning.

有关更多信息,请参见黄昏文档

For more information, see Dask documentation

这篇关于使用Keras ImageDataGenerator时出现内存错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆