TimeDistributed with LSTM in keyword spotter [英] TimeDistributed with LSTM in keyword spotter

查看：30 发布时间：2021/9/5 19:59:30 python tensorflow keras lstm speech-recognition

本文介绍了TimeDistributed with LSTM in keyword spotter的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在开发一个关键字观察器，它处理音频输入并根据类似于此处显示的语音命令列表返回音频类:https://www.tensorflow.org/tutorials/audio/simple_audio

I am working on a keyword spotter that processes an audio input and returns the class of the audio based on a list of speech commands similar to what is shown here: https://www.tensorflow.org/tutorials/audio/simple_audio

我希望能够处理多帧音频，而不是仅处理 1 秒的音频作为输入，例如 5 个时间步长和 10 毫秒步长，并将它们输入到机器学习模型中.

Instead of processing only 1 second of audio as input, I would like to be able to process multiple frames of audio, say 5 time steps with a 10ms step and feed them into the machine learning model.

本质上，这相当于在我的网络之上添加一个 TimeDistributed 层.我尝试做的第二件事是在将我的隐藏层映射到输出类的密集层之前添加一个 LSTM 层.

In essence, this amounts to adding a TimeDistributed layer on top of my network. The second thing I am trying to do is to add an LSTM layer prior to the dense layer that maps my hidden layers to the output classes.

我的问题:如何有效地更改下面的代码以添加一个 TimeDistributed 层，该层采用多个时间步长和一个 LSTM 层.

My question: How can I effectively change the code below to add a TimeDistributed layer that takes in multiple time steps and an LSTM layer.

启动代码:

model = models.Sequential([
    layers.Input(shape=input_shape),
    preprocessing.Resizing(32, 32), 
    norm_layer,
    layers.Conv2D(32, 3, activation='relu'),
    layers.Conv2D(64, 3, activation='relu'),
    layers.MaxPooling2D(),
    layers.Dropout(0.25),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(num_labels),
])

模型摘要:

Input shape: (124, 129, 1)
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
resizing (Resizing)          (None, 32, 32, 1)         0         
_________________________________________________________________
normalization (Normalization (None, 32, 32, 1)         3         
_________________________________________________________________
conv2d (Conv2D)              (None, 30, 30, 32)        320       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 28, 28, 64)        18496     
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 64)        0         
_________________________________________________________________
dropout (Dropout)            (None, 14, 14, 64)        0         
_________________________________________________________________
flatten (Flatten)            (None, 12544)             0         
_________________________________________________________________
dense (Dense)                (None, 128)               1605760   
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 8)                 1032      
=================================================================
Total params: 1,625,611
Trainable params: 1,625,608
Non-trainable params: 3
_________________________________________________________________

尝试 1:添加 LSTM 层

model = models.Sequential([
    layers.Input(shape=input_shape),
    preprocessing.Resizing(32, 32), 
    norm_layer,
    layers.Conv2D(32, 3, activation='relu'),
    layers.Conv2D(64, 3, activation='relu'),
    layers.MaxPooling2D(),
    layers.Dropout(0.25),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Flatten(),
    layers.LSTM(32, activation='relu', input_shape=(1,128,98)),
    layers.Dense(num_labels),
])

错误:ValueError:层 lstm_5 的输入 0 与层不兼容:预期 ndim=3，发现 ndim=2.收到完整形状:[无，128]

Attempt2:添加一个 TimeDistributed 层:

Attempt2: Adding a TimeDistributed layer:

model = models.Sequential([
    layers.Input(shape=input_shape),
    preprocessing.Resizing(32, 32), 
    norm_layer,
    TimeDistributed(layers.Conv2D(32, 3, activation='relu'), input_shape=(None, 32, 32, 1)),
    TimeDistributed(layers.Conv2D(64, 3, activation='relu'), input_shape=(None, 30, 30, 1)),
    TimeDistributed(layers.MaxPooling2D()),
    TimeDistributed(layers.Dropout(0.25)),
    TimeDistributed(layers.Flatten()),
    TimeDistributed(layers.Dense(128, activation='relu')),
    TimeDistributed(layers.Dropout(0.5)),
    TimeDistributed(layers.Flatten()),
    layers.Dense(num_labels),
])

错误:ValueError:层 conv2d_43 的输入 0 与层不兼容:预期 min_ndim=4，发现 ndim=3.收到完整形状:[无，32，1]

我知道我的尺寸有问题.我不知道如何继续.

I understand there is a problem with my dimensions. I am not sure how to proceed.

TimeDistributed with LSTM in keyword spotter [英] TimeDistributed with LSTM in keyword spotter

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

TimeDistributed with LSTM in keyword spotter [英] TimeDistributed with LSTM in keyword spotter

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭