与 Tensorflow 中的常规 LSTMCell 相比,使用 CudnnLSTM 进行训练时的不同结果 [英] Different results while training with CudnnLSTM compared to regular LSTMCell in Tensorflow

查看:43
本文介绍了与 Tensorflow 中的常规 LSTMCell 相比,使用 CudnnLSTM 进行训练时的不同结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在 Python 中使用 Tensorflow 训练 LSTM 网络,并想切换到 tf.contrib.cudnn_rnn.CudnnLSTM 以进行更快的训练.我所做的被替换了

I'm training an LSTM network with Tensorflow in Python and wanted to switch to tf.contrib.cudnn_rnn.CudnnLSTM for faster training. What I did is replaced

cells = tf.nn.rnn_cell.LSTMCell(self.num_hidden) 
initial_state = cells.zero_state(self.batch_size, tf.float32)
rnn_outputs, _ = tf.nn.dynamic_rnn(cells, my_inputs, initial_state = initial_state)

lstm = tf.contrib.cudnn_rnn.CudnnLSTM(1, self.num_hidden)
rnn_outputs, _ = lstm(my_inputs)

我正在体验显着的训练加速(超过 10 倍),但同时我的性能指标下降.使用 LSTMCell 时,二元分类的 AUC 为 0.741,使用 CudnnLSTM 时为 0.705.我想知道我是否做错了什么,或者是这两者之间的实现差异,这就是如何在继续使用 CudnnLSTM 的同时恢复我的性能的情况.

I'm experiencing significant training speedup (more than 10x times), but at the same time my performance metric goes down. AUC on a binary classification is 0.741 when using LSTMCell and 0.705 when using CudnnLSTM. I'm wondering if I'm doing something wrong or it's the difference in implementation between those two and it's that's the case how to get my performance back while keep using CudnnLSTM.

训练数据集有 15,337 个不同长度的序列(最多几百个元素),每批中用零填充以使其长度相同.所有代码都相同,包括 TF 数据集 API 管道和所有评估指标.我对每个版本运行了几次,在所有情况下它都围绕这些值收敛.

The training dataset has 15,337 sequences of varying length (up to few hundred elements) that are padded with zeros to be the same length in each batch. All the code is the same including the TF Dataset API pipeline and all evaluation metrics. I ran each version few times and in all cases it converges around those values.

此外,我几乎没有数据集可以插入到完全相同的模型中,并且问题仍然存在.

Moreover, I have few datasets that can be plugged into exactly the same model and the problem persists on all of them.

cudnn_rnn 的tensorflow 代码中 我发现一句话说:

In the tensorflow code for cudnn_rnn I found a sentence saying:

Cudnn LSTM 和 GRU 在数学上与它们的 tf 不同同行.

Cudnn LSTM and GRU are mathematically different from their tf counterparts.

但是没有解释这些差异到底是什么......

But there's no explanation what those differences really are...

推荐答案

看来 tf.contrib.cudnn_rnn.CudnnLSTM 是时间优先的,所以应该提供形状序列 (seq_len, batch_size, embedding_size) 而不是 (batch_size, seq_len, embedding_size),所以你必须转置它(我认为,当涉及到凌乱的 Tensorflow 时不能确定文档,但您可能想对其进行测试.如果您想检查,请参阅下面的链接).

It seems tf.contrib.cudnn_rnn.CudnnLSTM are time-major, so those should be provided with sequence of shape (seq_len, batch_size, embedding_size) instead of (batch_size, seq_len, embedding_size), so you would have to transpose it (I think, can't be sure when it comes to messy Tensorflow documentation, but you may want to test that. See links below if you wish to check it).

有关该主题的更多信息此处(在另一个链接中指向数学差异),除了一件事似乎是错误的:不仅 GRU 是时间主要的,LSTM 也是(正如 这个问题).

More informations on the topic here (in there is another link pointing towards math differences), except one thing seems to be wrong: not only GRU is time-major, LSTM is as well (as pointed by this issue).

我建议反对使用tf.contrib,因为它更加混乱(并且最终会被排除在Tensorflow 2.0版本之外)并坚持使用keras 如果可能的话(因为它将成为即将到来的 Tensorflow 2.0 的主要前端a>) 或 tf.nn,因为它将成为 tf.Estimator API 的一部分(尽管 IMO 的可读性要差得多).

I would advise against using tf.contrib, as it's even messier (and will be, finally, left out of Tensorflow 2.0 releases) and stick to keras if possible (as it will be the main front-end of the upcoming Tensorflow 2.0) or tf.nn, as it's gonna be a part of tf.Estimator API (though it's far less readable IMO).

...或者考虑使用 PyTorch 来省去麻烦,至少在文档中提供了输入形状(及其含义).

... or consider using PyTorch to save yourself the hassle, where input shapes (and their meaning) are provided in the documentation at the very least.

这篇关于与 Tensorflow 中的常规 LSTMCell 相比,使用 CudnnLSTM 进行训练时的不同结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆