将Pytorch LSTM的状态参数转换为Keras LSTM [英] Converting state-parameters of Pytorch LSTM to Keras LSTM
问题描述
我试图将现有的经过训练的PyTorch模型移植到Keras中.
I was trying to port an existing trained PyTorch model into Keras.
在移植过程中,我陷入了LSTM层.
During the porting, I got stuck at LSTM layer.
LSTM网络的Keras实现似乎具有三种状态类型,而Pytorch实现具有四种状态矩阵.
Keras implementation of LSTM network seems to have three state kind of state matrices while Pytorch implementation have four.
例如,对于hidden_layers = 64的双向LSTM,input_size = 512&输出size = 128个状态参数,如下所示
For eg, for an Bidirectional LSTM with hidden_layers=64, input_size=512 & output size=128 state parameters where as follows
Keras LSTM的状态参数
[<tf.Variable 'bidirectional_1/forward_lstm_1/kernel:0' shape=(512, 256) dtype=float32_ref>,
<tf.Variable 'bidirectional_1/forward_lstm_1/recurrent_kernel:0' shape=(64, 256) dtype=float32_ref>,
<tf.Variable 'bidirectional_1/forward_lstm_1/bias:0' shape=(256,) dtype=float32_ref>,
<tf.Variable 'bidirectional_1/backward_lstm_1/kernel:0' shape=(512, 256) dtype=float32_ref>,
<tf.Variable 'bidirectional_1/backward_lstm_1/recurrent_kernel:0' shape=(64, 256) dtype=float32_ref>,
<tf.Variable 'bidirectional_1/backward_lstm_1/bias:0' shape=(256,) dtype=float32_ref>]
PyTorch LSTM的状态参数
['rnn.0.rnn.weight_ih_l0', torch.Size([256, 512])],
['rnn.0.rnn.weight_hh_l0', torch.Size([256, 64])],
['rnn.0.rnn.bias_ih_l0', torch.Size([256])],
['rnn.0.rnn.bias_hh_l0', torch.Size([256])],
['rnn.0.rnn.weight_ih_l0_reverse', torch.Size([256, 512])],
['rnn.0.rnn.weight_hh_l0_reverse', torch.Size([256, 64])],
['rnn.0.rnn.bias_ih_l0_reverse', torch.Size([256])],
['rnn.0.rnn.bias_hh_l0_reverse', torch.Size([256])],
我试图查看两种实现的代码,但并不太了解.
I tried to look in to the code of both implementation but not able to understand much.
有人可以帮我在Keras中将4组状态参数从PyTorch转换为3组状态参数
Can someone please help me to transform 4-set of state params from PyTorch into 3-set of state params in Keras
推荐答案
它们确实没有什么不同.如果将PyTorch中的两个偏差向量相加,则这些方程将与Keras中实现的方程相同.
They are really not that different. If you sum up the two bias vectors in PyTorch, the equations will be the same as what's implemented in Keras.
PyTorch使用两个单独的偏置矢量进行输入变换(下标以i
开头)和循环变换(下标以h
开头).
PyTorch uses two separate bias vectors for the input transformation (with a subscript starts with i
) and recurrent transformation (with a subscript starts with h
).
在Keras LSTMCell
:
x_i = K.dot(inputs_i, self.kernel_i)
x_f = K.dot(inputs_f, self.kernel_f)
x_c = K.dot(inputs_c, self.kernel_c)
x_o = K.dot(inputs_o, self.kernel_o)
if self.use_bias:
x_i = K.bias_add(x_i, self.bias_i)
x_f = K.bias_add(x_f, self.bias_f)
x_c = K.bias_add(x_c, self.bias_c)
x_o = K.bias_add(x_o, self.bias_o)
if 0 < self.recurrent_dropout < 1.:
h_tm1_i = h_tm1 * rec_dp_mask[0]
h_tm1_f = h_tm1 * rec_dp_mask[1]
h_tm1_c = h_tm1 * rec_dp_mask[2]
h_tm1_o = h_tm1 * rec_dp_mask[3]
else:
h_tm1_i = h_tm1
h_tm1_f = h_tm1
h_tm1_c = h_tm1
h_tm1_o = h_tm1
i = self.recurrent_activation(x_i + K.dot(h_tm1_i,
self.recurrent_kernel_i))
f = self.recurrent_activation(x_f + K.dot(h_tm1_f,
self.recurrent_kernel_f))
c = f * c_tm1 + i * self.activation(x_c + K.dot(h_tm1_c,
self.recurrent_kernel_c))
o = self.recurrent_activation(x_o + K.dot(h_tm1_o,
self.recurrent_kernel_o))
在输入转换中仅添加了一个偏差.但是,如果我们将PyTorch中的两个偏差相加,则这些方程式将是等效的.
There's only one bias added in the input transformation. However, the equations would be equivalent if we sum up the two biases in PyTorch.
The two-bias LSTM is what's implemented in cuDNN (see the developer guide). I'm really not that familiar with PyTorch, but I guess that's why they use two bias parameters. In Keras, the CuDNNLSTM
layer also has two bias weight vectors.
这篇关于将Pytorch LSTM的状态参数转换为Keras LSTM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!