使用 Keras ImageDataGenerator 时出现内存错误 [英] Memory error when using Keras ImageDataGenerator

查看:34
本文介绍了使用 Keras ImageDataGenerator 时出现内存错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用带有 TensorFlow 后端的 keras 来预测图像中的特征.具体来说,我正在尝试使用 keras ImageDataGenerator.该模型设置为运行 4 个 epoch 并运行良好,直到第 4 个 epoch 失败并出现 MemoryError.

我在运行 Ubuntu Server 的 AWS g2.2xlarge 实例上运行此模型16.04 LTS (HVM),SSD 卷类型.

训练图像是 256x256 RGB 像素平铺(8 位无符号),训练掩码是 256x256 单波段(8 位无符号)平铺数据,其中 255 == 感兴趣的特征,0 == 其他所有内容.

以下 3 个函数是与此错误相关的函数.

我该如何解决这个 MemoryError?

<小时>

def train_model():批量大小 = 1training_imgs = np.lib.format.open_memmap(filename=os.path.join(data_path, 'data.npy'),mode='r+')training_masks = np.lib.format.open_memmap(filename=os.path.join(data_path, 'mask.npy'),mode='r+')dl_model = create_model()打印(dl_model.summary())model_checkpoint = ModelCheckpoint(os.path.join(data_path,'mod_weight.hdf5'), monitor='loss',verbose=1, save_best_only=True)dl_model.fit_generator(发电机(training_imgs,training_masks,batch_size),steps_per_epoch=(len(training_imgs)/batch_size),epochs=4,verbose=1,callbacks=[model_checkpoint])定义生成器(train_imgs,train_masks=None,batch_size=None):# 创建空数组以包含一批特征和标签#如果 train_masks 不是 None:train_imgs_batch = np.zeros((batch_size,y_to_res,x_to_res,bands))train_masks_batch = np.zeros((batch_size,y_to_res,x_to_res,1))而真:对于范围内的我(batch_size):# 在特征中选择随机索引索引= random.choice(range(len(train_imgs)))train_imgs_batch[i] = train_imgs[index]train_masks_batch[i] = train_masks[index]产量 train_imgs_batch, train_masks_batch别的:rec_imgs_batch = np.zeros((batch_size,y_to_res,x_to_res,bands))而真:对于范围内的我(batch_size):# 在特征中选择随机索引索引= random.choice(range(len(train_imgs)))rec_imgs_batch[i] = train_imgs[index]产量 rec_imgs_batchdef train_generator(train_images,train_masks,batch_size):data_gen_args=dict(rotation_range=90.,horizo​​ntal_flip=True,vertical_flip=True,rescale=1./255)image_datagen = ImageDataGenerator()mask_datagen = ImageDataGenerator()# # 为 fit 和 flow 方法提供相同的种子和关键字参数种子 = 1image_datagen.fit(train_images,增加=真,种子=种子)mask_datagen.fit(train_masks,增加=真,种子=种子)image_generator = image_datagen.flow(train_images,batch_size=batch_size)mask_generator = mask_datagen.flow(train_masks,batch_size=batch_size)返回 zip(image_generator, mask_generator)

<小时>

以下是模型的输出,详细说明了历元和错误消息:

Epoch 00001:loss从inf提高到0.01683,模型保存到/home/ubuntu/deep_learn/client_data/mod_weight.hdf5时代2/47569/7569 [==============================] - 3394s 448ms/步 - 损失:0.0049 - binary_crossentropy:0.0027 - jaccard_coef_int: 0.9983Epoch 00002:损失从 0.01683 提高到 0.00492,将模型保存到/home/ubuntu/deep_learn/client_data/mod_weight.hdf5时代 3/47569/7569 [==============================] - 3394s 448ms/步 - 损失:0.0049 - binary_crossentropy:0.0026 - jaccard_coef_int: 0.9982Epoch 00003:损失从 0.00492 提高到 0.00488,将模型保存到/home/ubuntu/deep_learn/client_data/mod_weight.hdf5时代 4/47569/7569 [==============================] - 3394s 448ms/步 - 损失:0.0074 - binary_crossentropy:0.0042 - jaccard_coef_int: 0.9975Epoch 00004:损失没有改善回溯(最近一次调用最后一次): 中的文件image_rec.py",第 291 行train_model()文件image_rec.py",第 208 行,在 train_modeldl_model.fit_generator(train_generator(training_imgs,training_masks,batch_size),steps_per_epoch=1,epochs=1,workers=1)文件image_rec.py",第 274 行,在 train_generatorimage_datagen.fit(train_images,增加=真,种子=种子)文件/home/ubuntu/pyvirt_test/local/lib/python2.7/site-packages/keras/preprocessing/image.py",第753行,合适x = np.copy(x)文件/home/ubuntu/pyvirt_test/local/lib/python2.7/site-packages/numpy/lib/function_base.py",第 1505 行,复制返回数组(a,订单=订单,复制=真)内存错误

解决方案

看来你的问题是数据太大了.我可以看到两种解决方案.第一个是通过spark在分布式系统中运行你的代码,我猜你没有这个支持,所以让我们继续另一个.

我认为第二个是可行的.我会对数据进行切片,然后尝试以增量方式提供模型.我们可以使用 Dask 来做到这一点.该库可以将数据切片并保存在对象中,然后您可以从磁盘中检索读取,仅在您想要的部分中.

如果您有一个大小为 100x100 矩阵的图像,我们可以检索每个数组,而无需在内存中加载 100 个数组.我们可以在内存中逐个加载数组(释放前一个),这将是您神经网络的输入.

为此,您可以将 np.array 转换为 dask 数组并分配分区.例如:

<预><代码>>>>k = np.random.randn(10,10) # 矩阵 10x10>>>导入 dask.array 作为 da>>>k2 = da.from_array(k,chunks = 3)dask.array>>>k2.to_delayed()数组([[延迟(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 0)),延迟(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 1)),延迟(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 2)),延迟(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 3))],[延迟(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 0)),延迟(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 1)),延迟(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 2)),延迟(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 3))],[延迟(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 0)),延迟(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 1)),延迟(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 2)),延迟(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 3))],[延迟(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 0)),延迟(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 1)),延迟(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 2)),延迟(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 3))]],数据类型=对象)

在这里,您可以看到数据是如何保存在对象中的,然后您可以分段检索以提供给您的模型.

要实现此解决方案,您必须在函数中引入一个循环,该循环调用每个分区并馈送 NN 以获得增量训练.

有关详细信息,请参阅 Dask 文档

I am attempting to predict features in imagery using keras with a TensorFlow backend. Specifically, I am attempting to use a keras ImageDataGenerator. The model is set to run for 4 epochs and runs fine until the 4th epoch where it fails with a MemoryError.

I am running this model on an AWS g2.2xlarge instance running Ubuntu Server 16.04 LTS (HVM), SSD Volume Type.

The training images are 256x256 RGB pixel tiles (8 bit unsigned) and the training mask is 256x256 single band (8 bit unsigned) tiled data where 255 == a feature of interest and 0 == everything else.

The following 3 functions are the ones pertinent to this error.

How can I resolve this MemoryError?


def train_model():
        batch_size = 1
        training_imgs = np.lib.format.open_memmap(filename=os.path.join(data_path, 'data.npy'),mode='r+')
        training_masks = np.lib.format.open_memmap(filename=os.path.join(data_path, 'mask.npy'),mode='r+')
        dl_model = create_model()
        print(dl_model.summary())
        model_checkpoint = ModelCheckpoint(os.path.join(data_path,'mod_weight.hdf5'), monitor='loss',verbose=1, save_best_only=True)
        dl_model.fit_generator(generator(training_imgs, training_masks, batch_size), steps_per_epoch=(len(training_imgs)/batch_size), epochs=4,verbose=1,callbacks=[model_checkpoint])

def generator(train_imgs, train_masks=None, batch_size=None):

# Create empty arrays to contain batch of features and labels#

        if train_masks is not None:
                train_imgs_batch = np.zeros((batch_size,y_to_res,x_to_res,bands))
                train_masks_batch = np.zeros((batch_size,y_to_res,x_to_res,1))

                while True:
                        for i in range(batch_size):
                                # choose random index in features
                                index= random.choice(range(len(train_imgs)))
                                train_imgs_batch[i] = train_imgs[index]
                                train_masks_batch[i] = train_masks[index]
                        yield train_imgs_batch, train_masks_batch
        else:
                rec_imgs_batch = np.zeros((batch_size,y_to_res,x_to_res,bands))
                while True:
                        for i in range(batch_size):
                                # choose random index in features
                                index= random.choice(range(len(train_imgs)))
                                rec_imgs_batch[i] = train_imgs[index]
                        yield rec_imgs_batch

def train_generator(train_images,train_masks,batch_size):
        data_gen_args=dict(rotation_range=90.,horizontal_flip=True,vertical_flip=True,rescale=1./255)
        image_datagen = ImageDataGenerator()
        mask_datagen = ImageDataGenerator()
# # Provide the same seed and keyword arguments to the fit and flow methods
        seed = 1
        image_datagen.fit(train_images, augment=True, seed=seed)
        mask_datagen.fit(train_masks, augment=True, seed=seed)
        image_generator = image_datagen.flow(train_images,batch_size=batch_size)
        mask_generator = mask_datagen.flow(train_masks,batch_size=batch_size)
        return zip(image_generator, mask_generator)


The following os the output from the model detailing the epochs and the error message:

Epoch 00001: loss improved from inf to 0.01683, saving model to /home/ubuntu/deep_learn/client_data/mod_weight.hdf5
Epoch 2/4
7569/7569 [==============================] - 3394s 448ms/step - loss: 0.0049 - binary_crossentropy: 0.0027 - jaccard_coef_int: 0.9983  

Epoch 00002: loss improved from 0.01683 to 0.00492, saving model to /home/ubuntu/deep_learn/client_data/mod_weight.hdf5
Epoch 3/4
7569/7569 [==============================] - 3394s 448ms/step - loss: 0.0049 - binary_crossentropy: 0.0026 - jaccard_coef_int: 0.9982  

Epoch 00003: loss improved from 0.00492 to 0.00488, saving model to /home/ubuntu/deep_learn/client_data/mod_weight.hdf5
Epoch 4/4
7569/7569 [==============================] - 3394s 448ms/step - loss: 0.0074 - binary_crossentropy: 0.0042 - jaccard_coef_int: 0.9975  

Epoch 00004: loss did not improve
Traceback (most recent call last):
  File "image_rec.py", line 291, in <module>
    train_model()
  File "image_rec.py", line 208, in train_model
    dl_model.fit_generator(train_generator(training_imgs,training_masks,batch_size),steps_per_epoch=1,epochs=1,workers=1)
  File "image_rec.py", line 274, in train_generator
    image_datagen.fit(train_images, augment=True, seed=seed)
  File "/home/ubuntu/pyvirt_test/local/lib/python2.7/site-packages/keras/preprocessing/image.py", line 753, in fit
    x = np.copy(x)
  File "/home/ubuntu/pyvirt_test/local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 1505, in copy
    return array(a, order=order, copy=True)
MemoryError

解决方案

it seems your problem is due to the data is too huge. I can see two solutions. The first one is run your code in a distributed system by means of spark, I guess you do not have this support, so let us move on to the other.

The second one is which I think is viable. I would slice the data and I would try feeding the model incrementally. We can do this with Dask. This library can slice the data and save in objects which then you can retrieve reading from disk, only in the part you want.

If you have a image which size is an matrix of 100x100, we can retrieve each array without the needed to load the 100 arrays in memory. We can load array by array in memory (releasing the previous one), which would be the input in your Neural Network.

To do this, you can to transform your np.array to dask array and assign the partitions. For example:

>>> k = np.random.randn(10,10) # Matrix 10x10
>>> import dask.array as da
>>> k2 = da.from_array(k,chunks = 3)
dask.array<array, shape=(10, 10), dtype=float64, chunksize=(3, 3)>
>>> k2.to_delayed()
array([[Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 0)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 1)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 2)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 0, 3))],
   [Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 0)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 1)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 2)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 1, 3))],
   [Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 0)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 1)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 2)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 2, 3))],
   [Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 0)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 1)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 2)),
    Delayed(('array-a08c1d25b900d497cdcd233a7c5aa108', 3, 3))]],
  dtype=object)

Here, you can see how the data is saved in objects, and then you can retrieve in parts to feed your model.

To implement this solution you must introduce a loop in your function which call each partition and feed the NN to get the incremental trainning.

For more information, see Dask documentation

这篇关于使用 Keras ImageDataGenerator 时出现内存错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆