将Pytorch LSTM的状态参数转换为Keras LSTM [英] Converting state-parameters of Pytorch LSTM to Keras LSTM

查看:114
本文介绍了将Pytorch LSTM的状态参数转换为Keras LSTM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图将现有的经过训练的PyTorch模型移植到Keras中.

I was trying to port an existing trained PyTorch model into Keras.

在移植过程中,我陷入了LSTM层.

During the porting, I got stuck at LSTM layer.

LSTM网络的Keras实现似乎具有三种状态类型,而Pytorch实现具有四种状态矩阵.

Keras implementation of LSTM network seems to have three state kind of state matrices while Pytorch implementation have four.

例如,对于hidden_​​layers = 64的双向LSTM,input_size = 512&输出size = 128个状态参数,如下所示

For eg, for an Bidirectional LSTM with hidden_layers=64, input_size=512 & output size=128 state parameters where as follows

Keras LSTM的状态参数

[<tf.Variable 'bidirectional_1/forward_lstm_1/kernel:0' shape=(512, 256) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/forward_lstm_1/recurrent_kernel:0' shape=(64, 256) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/forward_lstm_1/bias:0' shape=(256,) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/backward_lstm_1/kernel:0' shape=(512, 256) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/backward_lstm_1/recurrent_kernel:0' shape=(64, 256) dtype=float32_ref>,
 <tf.Variable 'bidirectional_1/backward_lstm_1/bias:0' shape=(256,) dtype=float32_ref>]

PyTorch LSTM的状态参数

 ['rnn.0.rnn.weight_ih_l0', torch.Size([256, 512])],
 ['rnn.0.rnn.weight_hh_l0', torch.Size([256, 64])],
 ['rnn.0.rnn.bias_ih_l0', torch.Size([256])],
 ['rnn.0.rnn.bias_hh_l0', torch.Size([256])],
 ['rnn.0.rnn.weight_ih_l0_reverse', torch.Size([256, 512])],
 ['rnn.0.rnn.weight_hh_l0_reverse', torch.Size([256, 64])],
 ['rnn.0.rnn.bias_ih_l0_reverse', torch.Size([256])],
 ['rnn.0.rnn.bias_hh_l0_reverse', torch.Size([256])],

我试图查看两种实现的代码,但并不太了解.

I tried to look in to the code of both implementation but not able to understand much.

有人可以帮我在Keras中将4组状态参数从PyTorch转换为3组状态参数

Can someone please help me to transform 4-set of state params from PyTorch into 3-set of state params in Keras

推荐答案

它们确实没有什么不同.如果将PyTorch中的两个偏差向量相加,则这些方程将与Keras中实现的方程相同.

They are really not that different. If you sum up the two bias vectors in PyTorch, the equations will be the same as what's implemented in Keras.

这是 PyTorch文档上的LSTM公式:

PyTorch使用两个单独的偏置矢量进行输入变换(下标以i开头)和循环变换(下标以h开头).

PyTorch uses two separate bias vectors for the input transformation (with a subscript starts with i) and recurrent transformation (with a subscript starts with h).

在Keras LSTMCell:

        x_i = K.dot(inputs_i, self.kernel_i)
        x_f = K.dot(inputs_f, self.kernel_f)
        x_c = K.dot(inputs_c, self.kernel_c)
        x_o = K.dot(inputs_o, self.kernel_o)
        if self.use_bias:
            x_i = K.bias_add(x_i, self.bias_i)
            x_f = K.bias_add(x_f, self.bias_f)
            x_c = K.bias_add(x_c, self.bias_c)
            x_o = K.bias_add(x_o, self.bias_o)

        if 0 < self.recurrent_dropout < 1.:
            h_tm1_i = h_tm1 * rec_dp_mask[0]
            h_tm1_f = h_tm1 * rec_dp_mask[1]
            h_tm1_c = h_tm1 * rec_dp_mask[2]
            h_tm1_o = h_tm1 * rec_dp_mask[3]
        else:
            h_tm1_i = h_tm1
            h_tm1_f = h_tm1
            h_tm1_c = h_tm1
            h_tm1_o = h_tm1
        i = self.recurrent_activation(x_i + K.dot(h_tm1_i,
                                                  self.recurrent_kernel_i))
        f = self.recurrent_activation(x_f + K.dot(h_tm1_f,
                                                  self.recurrent_kernel_f))
        c = f * c_tm1 + i * self.activation(x_c + K.dot(h_tm1_c,
                                                        self.recurrent_kernel_c))
        o = self.recurrent_activation(x_o + K.dot(h_tm1_o,
                                                  self.recurrent_kernel_o))

在输入转换中仅添加了一个偏差.但是,如果我们将PyTorch中的两个偏差相加,则这些方程式将是等效的.

There's only one bias added in the input transformation. However, the equations would be equivalent if we sum up the two biases in PyTorch.

在cuDNN中实现了双向偏置LSTM(请参见

The two-bias LSTM is what's implemented in cuDNN (see the developer guide). I'm really not that familiar with PyTorch, but I guess that's why they use two bias parameters. In Keras, the CuDNNLSTM layer also has two bias weight vectors.

这篇关于将Pytorch LSTM的状态参数转换为Keras LSTM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆