减少用于特征生成的预训练深度学习模型的大小 [英] reduce size of pretrained deep learning model for feature generation
问题描述
我正在Keras中使用预训练的模型来生成一组图像的特征:
I am using an pretrained model in Keras to generate features for a set of images:
model = InceptionV3(weights='imagenet', include_top=False)
train_data = model.predict(data).reshape(data.shape[0],-1)
但是,我有很多图像,Imagenet模型为每个图像输出131072个特征(列).
However, I have a lot of images and the Imagenet model outputs 131072 features (columns) for each image.
对于20万张图像,我会得到一个(200000, 131072)
数组,该数组太大而无法放入内存.
With 200k images I would get an array of (200000, 131072)
which is too large to fit into memory.
更重要的是,我需要将此数组保存到磁盘,另存为.npy
或.h5py
More importantly, I need to save this array to disk and it would take 100 GB of space when saved as .npy
or .h5py
我可以通过只提供大约1000张图像的批并将它们保存到磁盘来解决内存问题,而不是磁盘空间问题.
I could circumvent the memory problem by feeding only batches of like 1000 images and saving them to disk, but not the disk space problem.
如何缩小模型而又不丢失太多信息?
How can I make the model smaller without losing too much information?
更新
作为答案的建议,我也在模型中包括了下一层:
as the answer suggested I include the next layer in the model as well:
base_model = InceptionV3(weights='imagenet')
model = Model(input=base_model.input, output=base_model.get_layer('avg_pool').output)
这会将输出减小为(200000, 2048)
更新2 :
另一个有趣的解决方案可能是bcolz
软件包,以减少numpy数组的大小. https://github.com /Blosc/bcolz
another interesting solution may be the bcolz
package to reduce size of numpy arrays https://github.com/Blosc/bcolz
推荐答案
我至少看到了两种解决您的问题的方法:
I see at least two solutions to your problem:
- 应用
model = AveragePooling2D((8, 8), strides=(8, 8))(model)
,其中model
是您加载的InceptionV3
对象(无顶部).这是InceptionV3
体系结构中的下一步-可以很容易地假设-这些功能仍然保留着许多歧视性线索. - 对数据样本进行某种降维(例如
PCA
),并降低所有数据的降维以获得合理的文件大小.
- Apply a
model = AveragePooling2D((8, 8), strides=(8, 8))(model)
wheremodel
is anInceptionV3
object you loaded (without top). This is the next step inInceptionV3
architecture - so one may easily assume - that these features still hold loads of discriminatory clues. - Apply a some kind of dimensionality reduction (e.g. like
PCA
) on a sample of data and reduce the dimensionality of all data to get the reasonable file size.
这篇关于减少用于特征生成的预训练深度学习模型的大小的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!