延迟的罪恶回声-无法在Keras中重现Tensorflow结果 [英] Delayed echo of sin - cannot reproduce Tensorflow result in Keras

查看:86
本文介绍了延迟的罪恶回声-无法在Keras中重现Tensorflow结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Keras进行LSTM实验,运气很小甚至没有.有时我决定缩减到最基本的问题,以便最终取得一些积极的成果.
但是,即使存在最简单的问题,我也发现Keras无法收敛,而在Tensorflow中实施相同的问题却给出了稳定的结果.

I am experimenting with LSTMs in Keras with little to no luck. At some moment I decided to scale back to the most basic problems in order finally achieve some positive result.
However, even with simplest problems I find that Keras is unable to converge while the implementation of the same problem in Tensorflow gives stable result.

我不愿意只转而使用Tensorflow,却不理解为什么Keras在我尝试解决的任何问题上总是意见分歧.

I am unwilling to just switch to Tensorflow without understanding why Keras keeps diverging on any problem I attempt.

我的问题是延迟正弦回波的多对多序列预测,例如以下示例:
蓝线是网络输入序列,红色虚线是预期的输出.
repo 的启发,也从中创建了可行的Tensorflow解决方案. 我的代码的相关摘录如下,并且我的最小可重复示例的完整版本可在此处此处.

My problem is a many-to-many sequence prediction of delayed sin echo, example below:
Blue line is a network input sequence, red dotted line is an expected output.
The experiment was inspired by this repo and workable Tensorflow solution was also created from it too. The relevant excerpts from the my code are below, and full version of my minimal reproducible example is available here.

Keras模型:

model = Sequential()
model.add(LSTM(n_hidden,
               input_shape=(n_steps, n_input),
               return_sequences=True))
model.add(TimeDistributed(Dense(n_input, activation='linear')))
model.compile(loss=custom_loss,
              optimizer=keras.optimizers.Adam(lr=learning_rate),
              metrics=[])

Tensorflow模型:

Tensorflow model:

x = tf.placeholder(tf.float32, [None, n_steps, n_input])
y = tf.placeholder(tf.float32, [None, n_steps])

weights = {
    'out': tf.Variable(tf.random_normal([n_hidden, n_steps], seed = SEED))
}
biases = {
    'out': tf.Variable(tf.random_normal([n_steps], seed = SEED))
}
lstm = rnn.LSTMCell(n_hidden, forget_bias=1.0)
outputs, states = tf.nn.dynamic_rnn(lstm, inputs=x,
                                    dtype=tf.float32,
                                    time_major=False)

h = tf.transpose(outputs, [1, 0, 2])
pred = tf.nn.bias_add(tf.matmul(h[-1], weights['out']), biases['out'])
individual_losses = tf.reduce_sum(tf.squared_difference(pred, y),
                                  reduction_indices=1)
loss = tf.reduce_mean(individual_losses)
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate) \
  .minimize(loss)

我声称代码的其他部分(data_generationtraining)是完全相同的.但是,使用Keras的学习进度很早就停滞了,并且得出的预测并不令人满意. logloss的库和示例预测的图形附在下面:

I claim that other parts of code (data_generation, training) are completely identical. But learning progress with Keras stalls early and yields unsatisfactory predictions. Graphs of logloss for both libraries and example predictions are attached below:

Tensorflow训练模型的Logloss:

Logloss for Tensorflow-trained model:

Keras训练模型的Logloss: 从图形中读取并不容易,但是Tensorflow达到target_loss=0.15并在大约10k批处理后提早停止.但是Keras用光了所有达到loss的13k批次,大约只有1.5.在Keras运行10万批的另一个实验中,1.0并没有进一步停滞.

Logloss for Keras-trained model: It's not easy to read from graph, but Tensorflow reaches target_loss=0.15 and stops early after about 10k batches. But Keras uses up all 13k batches reaching loss about only 1.5. In a separate experiment where Keras was running for 100k batches it went no further stalling around 1.0.

下图包含:黑色线-模型输入信号,绿色虚线-地面真相输出,红色线-获取的模型输出.

Figures below contain: black line - model input signal, green dotted line - ground truth output, red line - acquired model output.

Tensorflow训练模型的预测:
Keras训练模型的预测: 亲爱的同事,感谢您的建议和见解!

Predictions of Tensorflow-trained model:
Predictions of Keras-trained model: Thank you for suggestions and insights, dear colleagues!

推荐答案

好,我设法解决了这个问题. Keras的实现现在也稳步收敛到一个明智的解决方案:

Ok, I have managed to solve this. Keras implementation now converges steadily to a sensible solution too:

这些模型实际上并不完全相同.您可以格外小心地检查问题中的Tensorflow模型版本,并亲自确认以下列出了实际的Keras等效项,而不是问题中所述的内容:

The models were in fact not identical. You may inspect with extra caution the Tensorflow model version from the question and verify for yourself that actual Keras equivalent is listed below, and isn't what stated in the question:

model = Sequential()
model.add(LSTM(n_hidden,
               input_shape=(n_steps, n_input),
               return_sequences=False))
model.add(Dense(n_steps, input_shape=(n_hidden,), activation='linear'))
model.compile(loss=custom_loss,
              optimizer=keras.optimizers.Adam(lr=learning_rate),
              metrics=[])

我会详细说明.这里可行的解决方案使用LSTM吐出的最后一个大小为n_hidden的列作为中间激活,然后馈入Dense层.
因此,在某种程度上,这里的实际预测是由常规感知器做出的.

I will elaborate. Workable solution here uses that last column of size n_hidden spat out by LSTM as an intermediate activation then fed to the Dense layer.
So, in a way, the actual prediction here is made by the regular perceptron.

一个额外的要点注释-原始Keras解决方案中的错误来源从问题附带的推论示例中已经很明显.我们在那里看到,较早的时间戳完全失败,而较晚的时间戳则接近完美.这些早期的时间戳对应于LSTM刚在新窗口上初始化且没有上下文的情况下的状态.

One extra take away note - source of mistake in the original Keras solution is already evident from the inference examples attached to question. We see there that earlier timestamps fail utterly, while later timestamps are near perfect. These earlier timestamps correspond to the states of LSTM when it were just initialized on new window and clueless of context.

这篇关于延迟的罪恶回声-无法在Keras中重现Tensorflow结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆