Tensorflow:联合培训CNN + LSTM [英] Tensorflow: jointly training CNN + LSTM
问题描述
有很多关于如何在TF中单独使用LSTM的示例,但是我找不到关于如何联合训练CNN + LSTM的任何好的示例。
从我看来,如何进行这样的培训并不是很简单,我可以在这里想到一些选择:
There are quite a few examples on how to use LSTMs alone in TF, but I couldn't find any good examples on how to train CNN + LSTM jointly. From what I see, it is not quite straightforward how to do such training, and I can think of a few options here:
- 首先,我认为最简单的解决方案(或最原始的解决方案)是独立训练CNN以学习特征,然后在不更新CNN零件的情况下针对CNN特征训练LSTM,因为一个人可能必须提取并保存这些特征以numpy格式输入,然后以TF格式将其输入LSTM。但是在这种情况下,可能必须使用标记不同的数据集进行CNN的预训练,这消除了端到端训练的优势,即学习LSTM最终目标的功能(除了必须拥有这些功能的事实)首先是其他标签)。
- 第二个选择是将批量
维度(4-d张量)中的所有时间片串联起来,将其馈送到CNN,然后以某种方式将这些
功能重新打包为5 -d张量再次用于训练LSTM,然后应用成本函数。我主要担心的是,是否有可能这样做。同样,处理可变长度序列变得有些棘手。例如,在预测方案中,您一次只能输入一帧。因此,如果这是进行联合培训的正确方法,我将非常高兴看到一些示例。除此之外,此解决方案看起来更像是黑客,因此,如果有更好的方法,那么如果有人可以共享它,那就太好了。
谢谢!
推荐答案
对于联合训练,您可以考虑使用tf.map_fn,如文档 https://www.tensorflow.org/api_docs/python/tf/map_fn 。
For joint training, you can consider using tf.map_fn as described in the documentation https://www.tensorflow.org/api_docs/python/tf/map_fn.
让我们假设CNN是按照此处所述类似的方式构建的 https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10.py 。
Lets assume that the CNN is built along similar lines as described here https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10.py.
def joint_inference(sequence):
inference_fn = lambda image: inference(image)
logit_sequence = tf.map_fn(inference_fn, sequence, dtype=tf.float32, swap_memory=True)
lstm_cell = tf.contrib.rnn.LSTMCell(128)
output_state, intermediate_state = tf.nn.dynamic_rnn(cell=lstm_cell, inputs=logit_sequence)
projection_function = lambda state: tf.contrib.layers.linear(state, num_outputs=num_classes, activation_fn=tf.nn.sigmoid)
projection_logits = tf.map_fn(projection_function, output_state)
return projection_logits
警告:您可能必须按照此处所述查看设备放置 https://www.tensorflow.org/tutorials/using_gpu 如果您的模型大于gpu可以分配的内存。
Warning: You might have to look into device placement as described here https://www.tensorflow.org/tutorials/using_gpu if your model is larger than the memory gpu can allocate.
另一种方法是将其展平视频批处理以创建图像批处理,从CNN进行前向传递并为LSTM调整功能。
An Alternative would be to flatten the video batch to create an image batch, do a forward pass from CNN and reshape the features for LSTM.
这篇关于Tensorflow:联合培训CNN + LSTM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!