使用 Tensorflow 的 Connectionist Temporal Classification (CTC) 实现 [英] Using Tensorflow's Connectionist Temporal Classification (CTC) implementation

查看：29 发布时间：2021/9/5 19:08:31 tensorflow speech-recognition end-to-end ctc

本文介绍了使用 Tensorflow 的 Connectionist Temporal Classification (CTC) 实现的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图在 contrib 包 (tf.contrib.ctc.ctc_loss) 下使用 Tensorflow 的 CTC 实现，但没有成功.

I'm trying to use the Tensorflow's CTC implementation under contrib package (tf.contrib.ctc.ctc_loss) without success.

首先，有人知道我在哪里可以阅读好的分步教程吗?Tensorflow 的文档在这个主题上非常糟糕.
我是否必须向 ctc_loss 提供带有交错空白标签的标签?
即使使用长度为 1 的训练数据集超过 200 个时期，我也无法过度拟合我的网络.:(
如何使用 tf.edit_distance 计算标签错误率?

这是我的代码:

with graph.as_default():

  max_length = X_train.shape[1]
  frame_size = X_train.shape[2]
  max_target_length = y_train.shape[1]

  # Batch size x time steps x data width
  data = tf.placeholder(tf.float32, [None, max_length, frame_size])
  data_length = tf.placeholder(tf.int32, [None])

  #  Batch size x max_target_length
  target_dense = tf.placeholder(tf.int32, [None, max_target_length])
  target_length = tf.placeholder(tf.int32, [None])

  #  Generating sparse tensor representation of target
  target = ctc_label_dense_to_sparse(target_dense, target_length)

  # Applying LSTM, returning output for each timestep (y_rnn1, 
  # [batch_size, max_time, cell.output_size]) and the final state of shape
  # [batch_size, cell.state_size]
  y_rnn1, h_rnn1 = tf.nn.dynamic_rnn(
    tf.nn.rnn_cell.LSTMCell(num_hidden, state_is_tuple=True, num_proj=num_classes), #  num_proj=num_classes
    data,
    dtype=tf.float32,
    sequence_length=data_length,
  )

  #  For sequence labelling, we want a prediction for each timestamp. 
  #  However, we share the weights for the softmax layer across all timesteps. 
  #  How do we do that? By flattening the first two dimensions of the output tensor. 
  #  This way time steps look the same as examples in the batch to the weight matrix. 
  #  Afterwards, we reshape back to the desired shape


  # Reshaping
  logits = tf.transpose(y_rnn1, perm=(1, 0, 2))

  #  Get the loss by calculating ctc_loss
  #  Also calculates
  #  the gradient.  This class performs the softmax operation for you, so    inputs
  #  should be e.g. linear projections of outputs by an LSTM.
  loss = tf.reduce_mean(tf.contrib.ctc.ctc_loss(logits, target, data_length))

  #  Define our optimizer with learning rate
  optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(loss)

  #  Decoding using beam search
  decoded, log_probabilities = tf.contrib.ctc.ctc_beam_search_decoder(logits, data_length, beam_width=10, top_paths=1)

谢谢！

更新(06/29/2016)

谢谢@jihyeon-seo！所以，我们在 RNN 的输入端有类似 [num_batch, max_time_step, num_features] 的东西.我们使用 dynamic_rnn 执行给定输入的循环计算，输出形状为 [num_batch, max_time_step, num_hidden] 的张量.之后，我们需要在每个 tilmestep 中做一个权重共享的仿射投影，所以我们必须重塑为 [num_batch*max_time_step, num_hidden]，乘以形状为 [num_hidden, num_classes] 的权重矩阵，求和一个偏差，撤消reshape，transpose(所以我们会有[max_time_steps, num_batch, num_classes] 作为ctc loss的输入)，这个结果就是ctc_loss函数的输入.我做的一切都正确吗?

Thank you, @jihyeon-seo! So, we have at input of RNN something like [num_batch, max_time_step, num_features]. We use the dynamic_rnn to perform the recurrent calculations given the input, outputting a tensor of shape [num_batch, max_time_step, num_hidden]. After that, we need to do an affine projection in each tilmestep with weight sharing, so we've to reshape to [num_batch*max_time_step, num_hidden], multiply by a weight matrix of shape [num_hidden, num_classes], sum a bias undo the reshape, transpose (so we will have [max_time_steps, num_batch, num_classes] for ctc loss input), and this result will be the input of ctc_loss function. Did I do everything correct?

这是代码:

    cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)

    h_rnn1, self.last_state = tf.nn.dynamic_rnn(cell, self.input_data, self.sequence_length, dtype=tf.float32)

    #  Reshaping to share weights accross timesteps
    x_fc1 = tf.reshape(h_rnn1, [-1, num_hidden])

    self._logits = tf.matmul(x_fc1, self._W_fc1) + self._b_fc1

    #  Reshaping
    self._logits = tf.reshape(self._logits, [max_length, -1, num_classes])

    #  Calculating loss
    loss = tf.contrib.ctc.ctc_loss(self._logits, self._targets, self.sequence_length)

    self.cost = tf.reduce_mean(loss)

更新 (07/11/2016)

谢谢@Xiv.这是修复错误后的代码:

Thank you @Xiv. Here is the code after the bug fix:

    cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True)

    h_rnn1, self.last_state = tf.nn.dynamic_rnn(cell, self.input_data, self.sequence_length, dtype=tf.float32)

    #  Reshaping to share weights accross timesteps
    x_fc1 = tf.reshape(h_rnn1, [-1, num_hidden])

    self._logits = tf.matmul(x_fc1, self._W_fc1) + self._b_fc1

    #  Reshaping
    self._logits = tf.reshape(self._logits, [-1, max_length, num_classes])
    self._logits = tf.transpose(self._logits, (1,0,2))

    #  Calculating loss
    loss = tf.contrib.ctc.ctc_loss(self._logits, self._targets, self.sequence_length)

    self.cost = tf.reduce_mean(loss)

更新 (07/25/16)

我在 GitHub 上发布我的代码的一部分，使用一个话语.放心使用！:)

I published on GitHub part of my code, working with one utterance. Feel free to use! :)

使用 Tensorflow 的 Connectionist Temporal Classification (CTC) 实现 [英] Using Tensorflow's Connectionist Temporal Classification (CTC) implementation

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 Tensorflow 的 Connectionist Temporal Classification (CTC) 实现 [英] Using Tensorflow&#39;s Connectionist Temporal Classification (CTC) implementation

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

使用 Tensorflow 的 Connectionist Temporal Classification (CTC) 实现 [英] Using Tensorflow's Connectionist Temporal Classification (CTC) implementation

登录关闭