如何为1dCNN + LSTM网络(Keras)设置输入形状? [英] How to setup input shape for 1dCNN+LSTM network (Keras)?

查看:626
本文介绍了如何为1dCNN + LSTM网络(Keras)设置输入形状?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下想法可以实现:

I have the following idea to implement:

                       Input -> CNN-> LSTM -> Dense -> Output

输入有100个时间步,每个步都有一个64维特征向量

The Input has 100 time steps, each step has a 64-dimensional feature vector

Conv1D 层将在每个时间步提取特征. CNN层包含64个滤镜,每个滤镜的长度为16个抽头.然后,maxpooling层将提取每个卷积输出的单个最大值,因此每个时间步将提取总共64个特征.

A Conv1D layer will extract features at each time step. The CNN layer contains 64 filters, each has length 16 taps. Then, a maxpooling layer will extract the single maximum value of each convolutional output, so a total of 64 features will be extracted at each time step.

然后,将CNN层的输出馈入具有64个神经元的LSTM层.重复次数与输入的时间步相同,即100个时间步. LSTM层应返回一个64维输出序列(序列的长度==时间步数== 100,因此应该有100 * 64 = 6400个数字).

Then, the output of the CNN layer will be fed into an LSTM layer with 64 neurons. Number of recurrence is the same as time step of input, which is 100 time steps. The LSTM layer should return a sequence of 64-dimensional output (the length of sequence == number of time steps == 100, so there should be 100*64=6400 numbers).

input = Input(shape=(100,64), dtype='float', name='mfcc_input')
CNN_out = TimeDistributed(Conv1D(64, 16, activation='relu'))(mfcc_input)
CNN_out = BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True)(CNN_out)
CNN_out = TimeDistributed(MaxPooling1D(pool_size=(64-16+1), strides=None, padding='valid'))(CNN_out)

LSTM_out = LSTM(64,return_sequences=True)(CNN_out)

... (more code) ...

但这不起作用.第二行报告" 列表索引超出范围 ",我不知道发生了什么.

But this doesn't work. The second line reports "list index out of range" and I don't understand what's going on.

我是Keras的新手,所以如果有人可以帮助我,我将深表谢意.

I'm new to Keras, so I appreciate sincerely if anyone could help me with it.

此图片说明了如何将CNN应用于每个时间步长

推荐答案

问题出在您的输入上.您输入的形状为(100, 64),其中第一个维度是时间步长.因此,忽略该输入,您的输入的形状为(64)Conv1D.

The problem is with your input. Your input is of shape (100, 64) in which the first dimension is the timesteps. So ignoring that, your input is of shape (64) to a Conv1D.

现在,请参考 Keras Conv1D文档,其中指出输入应为3D张量(batch_size, steps, input_dim).忽略batch_size,您的输入应该是2D张量(steps, input_dim).

Now, refer to the Keras Conv1D documentation, which states that the input should be a 3D tensor (batch_size, steps, input_dim). Ignoring the batch_size, your input should be a 2D tensor (steps, input_dim).

因此,您要提供一维张量输入,其中输入的预期大小是2D张量.例如,如果您以单词的形式向Conv1D提供自然语言输入,则句子中有64个单词,并且假设每个单词都使用长度为50的矢量编码,则输入应为(64, 50).

So, you are providing 1D tensor input, where the expected size of the input is a 2D tensor. For example, if you are providing Natural Language input to the Conv1D in form of words, then there are 64 words in your sentence and supposing each word is encoded with a vector of length 50, your input should be (64, 50).

此外,请确保按照下面的代码为LSTM提供正确的输入.

Also, make sure that you are feeding the right input to LSTM as given in the code below.

因此,正确的代码应该是

So, the correct code should be

embedding_size = 50  # Set this accordingingly
mfcc_input = Input(shape=(100, 64, embedding_size), dtype='float', name='mfcc_input')
CNN_out = TimeDistributed(Conv1D(64, 16, activation='relu'))(mfcc_input)
CNN_out = BatchNormalization(axis=-1, momentum=0.99, epsilon=0.001, center=True, scale=True)(CNN_out)
CNN_out = TimeDistributed(MaxPooling1D(pool_size=(64-16+1), strides=None, padding='valid'))(CNN_out)

# Directly feeding CNN_out to LSTM will also raise Error, since the 3rd dimension is 1, you need to purge it as
CNN_out = Reshape((int(CNN_out.shape[1]), int(CNN_out.shape[3])))(CNN_out)

LSTM_out = LSTM(64,return_sequences=True)(CNN_out)

... (more code) ...

这篇关于如何为1dCNN + LSTM网络(Keras)设置输入形状?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆