GRU 相同的配置但以两种不同的方式在 tensorflow 中产生两种不同的输出 [英] GRU same configurations but in two different ways produces two different output in tensorflow

查看:62
本文介绍了GRU 相同的配置但以两种不同的方式在 tensorflow 中产生两种不同的输出的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 GRU 在 tensorflow 中做一些序列预测.所以我用两种不同的方式创建了相同的模型,如下所示:

I would like to do some sequence prediction in tensorflow using GRU. so I have created the same model in 2 different ways as follows:

在模型 1 中,我有 2 个 GRU,一个接一个,即 new_state1,第一个 GRU 的最终隐藏状态,作为第二个 GRU 的初始状态.因此,模型依次输出 new_state1new_state2.请注意,这不是 2 层模型,而是只有 1 层.从下面的代码中,我将输入和输出分为两部分,其中 GRU1 占第一部分,第二个 GRU 占第二部分.

In model 1 I have a 2 GRUs, one after the other, that is, the new_state1, the final hidden state of the first GRU, acts as the initial state to the second GRU. Therefore, the model outputs new_state1 and new_state2 consequentially. Note that this is not a 2 layer model, but only 1 layer. From the code below, I divided the input and the output into 2 parts where GRU1 takes the first part, and the second GRU takes the second part.

此外,两个模型的 random_seed 都已设置并固定,以便结果具有可比性.

Also the random_seed is set and fixed for both model so that results can be comparable.

import tensorflow as tf
import numpy as np

cell_size = 32

seq_length = 1000

time_steps1 = 500
time_steps2 = seq_length - time_steps1

x_t = np.arange(1, seq_length + 1)    
x_t_plus_1 = np.arange(2, seq_length + 2)

tf.set_random_seed(123)

m_dtype = tf.float32

input_1 = tf.placeholder(dtype=m_dtype, shape=[None, time_steps1, 1], name="input_1")
input_2 = tf.placeholder(dtype=m_dtype, shape=[None, time_steps2, 1], name="input_2")

labels1 = tf.placeholder(dtype=m_dtype, shape=[None, time_steps1, 1], name="labels_1")
labels2 = tf.placeholder(dtype=m_dtype, shape=[None, time_steps2, 1], name="labels_2")

labels = tf.concat([labels1, labels2], axis=1, name="labels")

initial_state = tf.placeholder(shape=[None, cell_size], dtype=m_dtype, name="initial_state")

def model(input_feat1, input_feat2):
    with tf.variable_scope("GRU"):
        cell1 = tf.nn.rnn_cell.GRUCell(cell_size)
        cell2 = tf.nn.rnn_cell.GRUCell(cell_size)

        with tf.variable_scope("First50"):
            # output1: shape=[1, time_steps1, 32]
            output1, new_state1 = tf.nn.dynamic_rnn(cell1, input_feat1, dtype=m_dtype, initial_state=initial_state)

        with tf.variable_scope("Second50"):
            # output2: shape=[1, time_steps2, 32]
            output2, new_state2 = tf.nn.dynamic_rnn(cell2, input_feat2, dtype=m_dtype, initial_state=new_state1)

        with tf.variable_scope("output"):
            # output shape: [1, time_steps1 + time_steps2, 32] => [1, 100, 32]
            output = tf.concat([output1, output2], axis=1)

            output = tf.reshape(output, shape=[-1, cell_size])
            output = tf.layers.dense(output, units=1)
            output = tf.reshape(output, shape=[1, time_steps1 + time_steps2, 1])

        with tf.variable_scope("outputs_1_2_reshaped"):
            output1 = tf.slice(input_=output, begin=[0, 0, 0], size=[-1, time_steps1, -1])
            output2 = tf.slice(input_=output, begin=[0, time_steps1, 0], size=[-1, time_steps2, 1])

            print(output.get_shape().as_list(), "1")
            print(output1.get_shape().as_list(), "2")
            print(output2.get_shape().as_list(), "3")

            return output, output1, output2, initial_state, new_state1, new_state2

output, output1, output2, initial_state, new_state1, new_state2 = model(input_1, input_2)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    to_run_list = [new_state1, new_state2]

    in1 = np.reshape(x_t[:time_steps1], newshape=(1, time_steps1, 1))
    in2 = np.reshape(x_t[time_steps1:], newshape=(1, time_steps2, 1))
    l1 = np.reshape(x_t_plus_1[:time_steps1], newshape=(1, time_steps1, 1))
    l2 = np.reshape(x_t_plus_1[time_steps1:], newshape=(1, time_steps2, 1))
    i_s = np.zeros([1, cell_size])

    new_s1, new_s2 = sess.run(to_run_list, feed_dict={input_1: in1,
                                                              input_2: in2,
                                                              labels1: l1,
                                                              labels2: l2,
                                                              initial_state: i_s})

    print(np.shape(new_s1), np.shape(new_s2))

    print(np.mean(new_s1), np.mean(new_s2))
    print(np.sum(new_s1), np.sum(new_s2))

在这个模型中,我没有使用 2 个不同的 GRU,而是创建了一个,我将输入和标签也分成了 2 个不同的部分,并使用了 for 循环来迭代我的输入数据集.然后获取最终状态并将其反馈到与初始状态相同的模型中.

In this model, Instead of having 2 different GRU, I created one, and I divided the input and labels into 2 different parts as well, and I used a for loop to iterate over my input dataset. Then the final state is taken and fed back into the same model as initial state.

请注意,模型 1 和模型 2 的第一个初始状态都是零.

Note that both model1 and model2 have the very first initial state of zeros.

import tensorflow as tf
import numpy as np

cell_size = 32

seq_length = 1000

time_steps = 500

x_t = np.arange(1, seq_length + 1)    
x_t_plus_1 = np.arange(2, seq_length + 2)

tf.set_random_seed(123)

m_dtype = tf.float32

inputs = tf.placeholder(dtype=m_dtype, shape=[None, time_steps, 1], name="inputs")

labels = tf.placeholder(dtype=m_dtype, shape=[None, time_steps, 1], name="labels")

initial_state = tf.placeholder(shape=[None, cell_size], dtype=m_dtype, name="initial_state")

grads_initial_state = tf.placeholder(dtype=m_dtype, shape=[None, cell_size], name="prev_grads")

this_is_last_batch = tf.placeholder(dtype=tf.bool, name="this_is_last_batch")

def model(input_feat):
    with tf.variable_scope("GRU"):
        cell = tf.nn.rnn_cell.GRUCell(cell_size)

        with tf.variable_scope("cell"):
            # output1: shape=[1, time_steps, 32]
            output, new_state = tf.nn.dynamic_rnn(cell, input_feat, dtype=m_dtype, initial_state=initial_state)

        with tf.variable_scope("output"):

            output = tf.reshape(output, shape=[-1, cell_size])
            output = tf.layers.dense(output, units=1)
            output = tf.reshape(output, shape=[1, time_steps, 1])

            print(output.get_shape().as_list(), "1")

            return output, new_state

output, new_state = model(inputs)

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)

    # 1000 // 500 = 2
    num_iterations = seq_length // time_steps
    print("num_iterations:", num_iterations)

    final_states = []
    to_run_list = [grads_wrt_initial_state, new_state]

    for i in range(num_iterations):

        current_xt = x_t[i * time_steps: (i + 1)*time_steps]
        current_xt_plus_1 = x_t_plus_1[i*time_steps: (i + 1)*time_steps]

        in1 = np.reshape(current_xt, newshape=(1, time_steps, 1))
        l1 = np.reshape(current_xt_plus_1, newshape=(1, time_steps, 1))
        i_s = np.zeros([1, cell_size])

        if i == 0:
            new_s = sess.run(new_state, feed_dict={inputs: in1,
                                                   labels: l1,
                                                   initial_state: i_s})
            final_states.append(new_s)
            print("---->", np.mean(final_states[-1]), np.sum(final_states[-1]), i)
        else:
            new_s = sess.run(new_state, feed_dict={inputs: in1,
                                                   labels: l1,
                                                   initial_state: final_states[-1]})
            final_states.append(new_s)
            print("---->", np.mean(final_states[-1]), np.sum(final_states[-1]), i)

最后,在打印出model1中new_state1new_state2的统计信息后,它们与new_state不同,每次迭代后,在模型2.

Finally, after printing out the statistics of new_state1 and new_state2 in model1, they were different from the new_state, after each iteration, in model2.

我想知道如何解决这个问题以及为什么会这样.

I would like to know how to fix this problem and why is that happening.

我发现两个文件中gru的权重值是不同的

现在如何在设置随机种子后在 2 个不同的文件中重现相同的结果?

Now how can I reproduce the same results in 2 the different files even after setting the random seed?

非常感谢任何帮助!!!

Any help is much appreciated!!!

推荐答案

所以要在不同的文件中重现相同的结果,tf.set_random_seed() 是不够的.我发现我们还需要为 gru 单元的 intializers 以及 initializers 中的权重设置种子.code>dense 输出层(这至少是根据我的模型);所以现在单元格的定义是:

so to reproduce the same results in different files, tf.set_random_seed() is not enough. I figured out that we need to also set the seed for the intializers of the gru cells as well as the initializers of the weights in the dense layer at the output (this is at least acccording to my model); so the definition of the cell is now:

cell1 = tf.nn.rnn_cell.GRUCell(cell_size, kernel_initializer=tf.glorot_normal_initializer(seed=123, dtype=m_dtype))

对于密集层:

output = tf.layers.dense(output, units=1, kernel_initializer=tf.glorot_uniform_initializer(seed=123, dtype=m_dtype))

请注意,只要我们为它设置了种子的 dtype,任何其他初始化器都可以使用.

Note that any other initializer could be used as long as we set the seed the dtype for it.

这篇关于GRU 相同的配置但以两种不同的方式在 tensorflow 中产生两种不同的输出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆