Tensorflow LSTM状态和权重的默认初始化? [英] Default Initialization for Tensorflow LSTM states and weights?

查看:1475
本文介绍了Tensorflow LSTM状态和权重的默认初始化?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Tensorflow中使用LSTM单元.

I am using the LSTM cell in Tensorflow.

lstm_cell = tf.contrib.rnn.BasicLSTMCell(lstm_units)

我想知道权重和状态如何初始化,或者Tensorflow中LSTM单元(状态和权重)的默认初始化程序是什么?

I was wondering how the weights and states are initialized or rather what the default initializer is for LSTM cells (states and weights) in Tensorflow?

有没有一种简便的方法可以手动设置Initializer?

And is there an easy way to manually set an Initializer?

注意:对于tf.get_variable(),我可以从文档.

Note: For tf.get_variable() the glorot_uniform_initializer is used as far as I could find out from the documentation.

推荐答案

首先,LSTM的权重(ANN的常用参数集)之间存在差异,默认情况下,它们也由Glorot初始化或也称为Xavier初始化程序(如问题中所述).

First of all, there is a difference between the weights of a LSTM (the usual parameter set of a ANN), which are by default also initialized by the Glorot or also known as the Xavier initializer (as mentioned in the question).

另一个方面是单元状态和输入LSTM的初始循环状态.这些由通常表示为initial_state的矩阵初始化.

A different aspect is the cell state and the state of the initial recurrent input to the LSTM. Those are initialized by a matrix usually denoted as initial_state.

剩下的问题是如何初始化initial_state:

Leaving us with the question, how to initialize this initial_state:

  1. 如果初始化的影响较小,则零状态初始化是一种很好的做法

初始化RNN状态的默认方法是使用零状态.这通常效果很好,特别是对于序列到序列的任务(例如语言建模)而言,其中受初始状态显着影响的输出比例很小.

The default approach to initializing the state of an RNN is to use a zero state. This often works well, particularly for sequence-to-sequence tasks like language modeling where the proportion of outputs that are significantly impacted by the initial state is small.

  1. 每批中的零状态初始化可能会导致过拟合

每批的零初始化将导致以下结果:序列到序列模型的早期步骤(即,状态重置后的步骤)的损失将大于后续步骤的损失,因为损失较少历史.因此,它们在学习期间对梯度的贡献将相对更高.但是,如果所有状态重置都与零状态关联,则模型可以(并且将)学习如何对此进行精确补偿.随着状态重置与总观测值的比率增加,模型参数将越来越多地调整为该零状态,这可能会影响以后的时间步长.

Zero Initialization for each batch will lead to the following: Losses at the early steps of a sequence-to-sequence model (i.e., those immediately after a state reset) will be larger than those at later steps, because there is less history. Thus, their contribution to the gradient during learning will be relatively higher. But if all state resets are associated with a zero-state, the model can (and will) learn how to compensate for precisely this. As the ratio of state resets to total observations increases, the model parameters will become increasingly tuned to this zero state, which may affect performance on later time steps.

  1. 我们还有其他选择吗?

一个简单的解决方案是使初始状态嘈杂(以减少第一步的损耗).在此处查找详细信息和其他想法

One simple solution is to make the initial state noisy (to decrease the loss for the first time step). Look here for details and other ideas

这篇关于Tensorflow LSTM状态和权重的默认初始化?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆