减少用于特征生成的预训练深度学习模型的大小 [英] reduce size of pretrained deep learning model for feature generation

查看：230 发布时间：2020/5/18 21:07:56 numpy machine-learning neural-network deep-learning keras

本文介绍了减少用于特征生成的预训练深度学习模型的大小的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我正在Keras中使用预训练的模型来生成一组图像的特征:

I am using an pretrained model in Keras to generate features for a set of images:

model = InceptionV3(weights='imagenet', include_top=False)
train_data = model.predict(data).reshape(data.shape[0],-1)

但是，我有很多图像，Imagenet模型为每个图像输出131072个特征(列).

However, I have a lot of images and the Imagenet model outputs 131072 features (columns) for each image.

对于20万张图像，我会得到一个(200000, 131072)数组，该数组太大而无法放入内存.

With 200k images I would get an array of (200000, 131072) which is too large to fit into memory.

更重要的是，我需要将此数组保存到磁盘，另存为.npy或.h5py

More importantly, I need to save this array to disk and it would take 100 GB of space when saved as .npy or .h5py

我可以通过只提供大约1000张图像的批并将它们保存到磁盘来解决内存问题，而不是磁盘空间问题.

I could circumvent the memory problem by feeding only batches of like 1000 images and saving them to disk, but not the disk space problem.

如何缩小模型而又不丢失太多信息?

How can I make the model smaller without losing too much information?

更新

作为答案的建议，我也在模型中包括了下一层:

as the answer suggested I include the next layer in the model as well:

base_model = InceptionV3(weights='imagenet')
model = Model(input=base_model.input, output=base_model.get_layer('avg_pool').output)

这会将输出减小为(200000, 2048)

更新2 :

另一个有趣的解决方案可能是bcolz软件包，以减少numpy数组的大小. https://github.com /Blosc/bcolz

another interesting solution may be the bcolz package to reduce size of numpy arrays https://github.com/Blosc/bcolz