TensorFlow:获取相对于输入的RNN隐藏状态梯度 [英] tensorflow: Obtain RNN hidden states gradients with respect to input

查看:0
本文介绍了TensorFlow:获取相对于输入的RNN隐藏状态梯度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的模型由嵌入层和SimpleRNN层组成。我已经用model.predict获得了所有步骤的隐藏状态,并将它们与这些步骤进行了对比。我发现隐藏的状态收敛到零,但我不确定我是否能从中推断出什么。因此,绘制它们相对于模型输入的梯度可能会为我提供一些进一步的见解。我需要一些帮助来获取这些渐变。

我的型号:

batch_size = 9600   # batch size can take a smaller value, e.g. 100
inp= Input(batch_shape= (batch_size, input_length), name= 'input') 
emb_out= Embedding(input_dim, output_dim, input_length= input_length, 
                         weights= [Emat], trainable= False, name= 'embedding')(inp)
rnn= SimpleRNN(200, return_sequences= True, return_state= False, stateful= True,
               batch_size= (batch_size, input_length, 100), name= 'simpleRNN')

h0 = tf.random.uniform((batch_size, 200))
rnn_allstates = rnn(emb_out, initial_state=h0)
print(rnn_allstates.shape)   # (9600, 1403, 200)
model_rnn = Model(inputs=inp, outputs= rnn_allstates, name= 'model_rnn')
model_rnn.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])
model_rnn.summary()

>>>
Layer (type)                 Output Shape              Param #   
=================================================================
input (InputLayer)           [(9600, 1403)]            0         
_________________________________________________________________
embedding (Embedding)        (9600, 1403, 100)         4348900   
_________________________________________________________________
simpleRNN (SimpleRNN)        (9600, 1403, 200)         60200     
=================================================================

获取隐藏状态:

rnn_ht = model_rnn.predict(xtr_pad)   # xtr_pad.shape = (9600,1403)
rnn_ht_red= np.mean(rnn_ht, 2)
rnn_ht_red= np.mean(rnn_ht_red,0)
steps= [t for t in range(1403)]
plt.plot(steps, rnn_ht_red, linestyle= 'dotted')

尝试获取渐变:

sess= k.get_session()
# The hidden states tf.Variable shaped (n_samples = 9600, n_steps = 1403, n_units = 200):
states_var= model_rnn.output  
# A list of hidden states variable for all time steps, aggregated over samples and RNN units:
ht_vars= [states_var[:, t, :] for t in range(1403)]        # each item in list has shape (9600, 200)
ht_vars_agg= [tf.reduce_mean(ht,[0,1]) for ht in ht_vars]  # each item in list has shape (), because I wish to obtain a SINGLE gradient value at each time step.

# Create gradient function and feed data:
dhtdx_vars= [k.gradients(ht, model_rnn.input) for ht in ht_vars_agg]
dhtdx= [sess.run(pd, feed_dict={model_rnn.input: xtr_pad} ) for pd in dhtdx_vars  ]

以下错误指向上面的最后一行

TypeError: Fetch argument None has invalid type <class 'NoneType'>

dhtdx_vars中的每个后端渐变项为[None]。当我删除聚合行时,相同的错误仍然存在。

尝试使用渐变磁带也不会在计算的渐变中返回错误。

with tf.GradientTape() as tape:
    x= model_rnn.input
    ht = model_rnn(x)
grad = tape.gradient(ht, model_rnn.input)

提前感谢您的帮助。

推荐答案

问题是tf.GradientTape()不通过整数输入传播渐变。这可能是您获得None渐变的原因。您可以做的是计算关于Embedding层的输出的渐变,如下所示:

import tensorflow as tf

input_length = 1403
inp= tf.keras.layers.Input(shape= (input_length,)) 
emb_out= tf.keras.layers.Embedding(500, 100, input_length= input_length, trainable= False)(inp)
rnn_out = tf.keras.layers.SimpleRNN(200,  return_sequences = True)(emb_out)

model = tf.keras.Model(inputs=inp, outputs=rnn_out)
model.summary()


xte_pad = tf.random.uniform((10, 1403), maxval=500, dtype=tf.int32)
y = tf.random.normal((10, 1403, 200))
ds = tf.data.Dataset.from_tensor_slices((xte_pad, y)).batch(5)

embedding_layer = model.layers[1]
rnn_layer = model.layers[2]
epochs = 1
for epoch in range(epochs):
  for step, (x_batch_train, y_batch_train) in enumerate(ds):
    with tf.GradientTape() as tape:
        embedded_x = embedding_layer(x_batch_train)
        tape.watch(embedded_x)
        y = rnn_layer(embedded_x)

    grads = tape.gradient(y, embedded_x)
    tf.print(grads.shape)
Model: "model_10"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_13 (InputLayer)       [(None, 1403)]            0         
                                                                 
 embedding_12 (Embedding)    (None, 1403, 100)         50000     
                                                                 
 simple_rnn_12 (SimpleRNN)   (None, 1403, 200)         60200     
                                                                 
=================================================================
Total params: 110,200
Trainable params: 60,200
Non-trainable params: 50,000
_________________________________________________________________
TensorShape([5, 1403, 100])
TensorShape([5, 1403, 100])

这篇关于TensorFlow:获取相对于输入的RNN隐藏状态梯度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆