使用keras功能API构建(预训练)CNN + LSTM网络 [英] Build (pre-trained) CNN+LSTM network with keras functional API

查看:172
本文介绍了使用keras功能API构建(预训练)CNN + LSTM网络的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在预先训练的CNN(VGG)之上构建LSTM,以对视频序列进行分类. LSTM将具有VGG的最后一个FC层提取的功能.

I want to build an LSTM on top of pre-trained CNN (VGG) to classify a video sequence. The LSTM will be fed with the features extracted by the last FC layer of VGG.

架构类似于:

我写了代码:

def build_LSTM_CNN_net()
      from keras.applications.vgg16 import VGG16
      from keras.models import Model
      from keras.layers import Dense, Input, Flatten
      from keras.layers.pooling import GlobalAveragePooling2D, GlobalAveragePooling1D
      from keras.layers.recurrent import LSTM
      from keras.layers.wrappers import TimeDistributed
      from keras.optimizers import Nadam
    
    
      from keras.applications.vgg16 import VGG16

      num_classes = 5
      frames = Input(shape=(5, 224, 224, 3))
      base_in = Input(shape=(224,224,3))
    
      base_model = VGG16(weights='imagenet',
                  include_top=False,
                  input_shape=(224,224,3))
    
      x = Flatten()(base_model.output)
      x = Dense(128, activation='relu')(x)
      x = TimeDistributed(Flatten())(x)
      x = LSTM(units = 256, return_sequences=False, dropout=0.2)(x)
      x = Dense(self.nb_classes, activation='softmax')(x)
    
lstm_cnn = build_LSTM_CNN_net()
keras.utils.plot_model(lstm_cnn, "lstm_cnn.png", show_shapes=True)

但是得到了错误:

ValueError: `TimeDistributed` Layer should be passed an `input_shape ` with at least 3 dimensions, received: [None, 128]

为什么会这样,我该如何解决?

Why is this happening, how can I fix it?

谢谢

推荐答案

此处是构建用于对视频序列进行分类的模型的正确方法.请注意,我将一个Model实例包装到TimeDistributed中.先前已构建此模型以分别从每个帧中提取特征.在第二部分中,我们处理帧序列

here the correct way to build a model to classify video sequences. Note that I wrap into TimeDistributed a model instance. This model was previously build to extract features from each frame individually. In the second part, we deal the frame sequences

frames, channels, rows, columns = 5,3,224,224

video = Input(shape=(frames,
                     rows,
                     columns,
                     channels))
cnn_base = VGG16(input_shape=(rows,
                              columns,
                              channels),
                 weights="imagenet",
                 include_top=False)
cnn_base.trainable = False

cnn_out = GlobalAveragePooling2D()(cnn_base.output)
cnn = Model(cnn_base.input, cnn_out)
encoded_frames = TimeDistributed(cnn)(video)
encoded_sequence = LSTM(256)(encoded_frames)
hidden_layer = Dense(1024, activation="relu")(encoded_sequence)
outputs = Dense(10, activation="softmax")(hidden_layer)

model = Model(video, outputs)
model.summary()

如果您想使用VGG 1x4096 emb表示形式,则只需执行以下操作:

if you want to use the VGG 1x4096 emb representation you can simply do:

frames, channels, rows, columns = 5,3,224,224

video = Input(shape=(frames,
                     rows,
                     columns,
                     channels))
cnn_base = VGG16(input_shape=(rows,
                              columns,
                              channels),
                 weights="imagenet",
                 include_top=True) #<=== include_top=True
cnn_base.trainable = False

cnn = Model(cnn_base.input, cnn_base.layers[-3].output) # -3 is the 4096 layer
encoded_frames = TimeDistributed(cnn)(video)
encoded_sequence = LSTM(256)(encoded_frames)
hidden_layer = Dense(1024, activation="relu")(encoded_sequence)
outputs = Dense(10, activation="softmax")(hidden_layer)

model = Model(video, outputs)
model.summary()

这篇关于使用keras功能API构建(预训练)CNN + LSTM网络的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆