使用Keras卷积网络的内存问题 [英] Memory Issues Using Keras Convolutional Network
问题描述
对于使用大数据的ML来说我是一个新手,之前我曾使用过Keras通用的卷积示例进行狗/猫分类,但是当对我的图像集应用类似的方法时,我遇到了内存问题.
I am very new to ML using Big Data and I have played with Keras generic convolutional examples for the dog/cat classification before, however when applying a similar approach to my set of images, I run into memory issues.
我的数据集包含非常长的图像,大小为10048 x1687像素.为了避免出现内存问题,我使用的批量大小为1,一次向模型中馈入一个图像.
My dataset consists of very long images that are 10048 x1687 pixels in size. To circumvent the memory issues, I am using a batch size of 1, feeding in one image at a time to the model.
该模型具有两个卷积层,每个卷积层之后是最大池,它们共同使扁平层在完全连接的层之前大约有290,000个输入.
The model has two convolutional layers, each followed by max-pooling which together make the flattened layer roughly 290,000 inputs right before the fully-connected layer.
但是,运行后立即,内存使用量就达到了其极限(8Gb).
Immediately after running however, Memory usage chokes at its limit (8Gb).
所以我的问题如下:
1)在Python中本地处理这种大小的计算的最佳方法是什么(无云利用率)?我还需要使用其他Python库吗?
推荐答案
查看yield
在python中的作用以及生成器的概念.您无需在一开始就加载所有数据.您应该将batch_size
设置得足够小,以免出现内存错误.
您的生成器如下所示:
Check out what yield
does in python and the idea of generators. You do not need to load all of your data at the beginning. You should make your batch_size
just small enough that you do not get memory errors.
Your generator can look like this:
def generator(fileobj, labels, memory_one_pic=1024, batch_size):
start = 0
end = start + batch_size
while True:
X_batch = fileobj.read(memory_one_pic*batch_size)
y_batch = labels[start:end]
start += batch_size
end += batch_size
if not X_batch:
break
if start >= amount_of_datasets:
start = 0
end = batch_size
yield (X_batch, y_batch)
...之后,当您已经准备好架构时...
...later when you already have your architecture ready...
train_generator = generator(open('traindata.csv','rb'), labels, batch_size)
train_steps = amount_of_datasets//batch_size + 1
model.fit_generator(generator=train_generator,
steps_per_epoch=train_steps,
epochs=epochs)
您还应该阅读有关batch_normalization
的信息,它基本上有助于更快,更准确地学习.
You should also read about batch_normalization
, which basically helps to learn faster and with better accuracy.
这篇关于使用Keras卷积网络的内存问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!