转移学习以进行视频分类 [英] Transfer learning for video classification

查看:80
本文介绍了转移学习以进行视频分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用预先训练的模型来训练视频分类模型?我的数据集形状为(4000,10,150,150,1),我尝试使用Conv2D TimeDistributed对人类动作识别进行分类. 我可以在不进行转学的情况下进行培训,但是准确性较差. 我尝试过的:

How can I use pre-trained models to train video classification model? My dataset shape is (4000,10,150,150,1), I try to classify human action recognition with Conv2D TimeDistributed. I can train without transfer learning but I get a poor accuracy. What I have tried:

from keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
                  include_top=False,
                  input_shape=(150, 150, 3))

model = models.Sequential()
model.add(conv_base)
model.add(TimeDistributed(Conv2D(96, (3, 3), padding='same',
                        input_shape=x_train.shape[1:])))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(Conv2D(128, (3, 3))))
model.add(TimeDistributed(Activation('relu')))
model.add(TimeDistributed(MaxPooling2D(pool_size=(2, 2))))
model.add(TimeDistributed(Dropout(0.35)))
.
.
.
.

但是我得到了ValueError: strides should be of length 1, 1 or 3 but was 2
有人有主意吗?

But I got ValueError: strides should be of length 1, 1 or 3 but was 2
Someone has any idea?

推荐答案

我假设每个视频有10帧.这是一个简单的模型,对每个帧使用VGG16功能(GloabAveragePooling),并使用LSTM对帧序列进行分类.

I'm assuming you have 10 frames for each video. It's a simple model which uses VGG16 features (GloabAveragePooling) for each frame, and LSTM to classify the frame sequences.

您可以通过添加更多层,更改超参数进行试验.

You can experiment by adding a few more layers, changing hyperparameters.

N.B:您的模型中存在许多不一致之处,包括直接将5维数据传递给需要4维数据的VGG16.

N.B: There are many inconsistencies in your model including passing 5-d data to VGG16 directly which expects 4-d data.

from tensorflow.keras.layers import *
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.optimizers import Adam
import tensorflow as tf
import numpy as np

from tensorflow.keras.applications import VGG16
conv_base = VGG16(weights='imagenet',
                  include_top=False,
                  input_shape=(150, 150, 3))

IMG_SIZE=(150,150,3)
num_class = 3

def create_base():
  conv_base = VGG16(weights='imagenet',
                  include_top=False,
                  input_shape=(150, 150, 3))
  x = GlobalAveragePooling2D()(conv_base.output)
  base_model = Model(conv_base.input, x)
  return base_model

conv_base = create_base()

ip = Input(shape=(10,150,150,3))
t_conv = TimeDistributed(conv_base)(ip) # vgg16 feature extractor

t_lstm = LSTM(10, return_sequences=False)(t_conv)

f_softmax = Dense(num_class, activation='softmax')(t_lstm)

model = Model(ip, f_softmax)

model.summary()

Model: "model_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_32 (InputLayer)        [(None, 10, 150, 150, 3)] 0         
_________________________________________________________________
time_distributed_4 (TimeDist (None, 10, 512)           14714688  
_________________________________________________________________
lstm_1 (LSTM)                (None, 10)                20920     
_________________________________________________________________
dense (Dense)                (None, 3)                 33        
=================================================================
Total params: 14,735,641
Trainable params: 14,735,641
Non-trainable params: 0
________________________

这篇关于转移学习以进行视频分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆