训练损失在12个时期之后增加 [英] Training loss increases after 12 epochs

查看:43
本文介绍了训练损失在12个时期之后增加的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个模型,可以在7-14个时期后学习以几乎100%的准确度进行分类(二进制分类),但是在达到最小损失0.0004之后,在下一个时期它跳升至7.5(这意味着有50%的机会正确地进行分类,相同的机会完全是纯机会),然后在接下来的所有时期都保持在7附近.

I have a model that learns to classify (binary classification) almost at 100% accuracy after 7-14 epochs but after reaching the minimum loss of 0.0004, in the next epoch it jumps to as much as 7.5 (which means it has a 50% chance of classifying correctly, same has pure chance) and then stays near 7. for all subsequent epochs.

我使用的是亚当优化器,应该注意学习率.

I use adam optimiser which should take care of the learning rate.

如何防止培训损失增加?

How can I prevent the training loss from increasing?

对于SGD优化程序而言,这种巨大的增长不会发生.

This huge jump doesn't happen for SGD optimiser.

inputs = Input(shape=(X_train.shape[1],))
Dx = Dense(32, activation="relu")(inputs)
Dx = Dense(32, activation="relu")(Dx)
for i in range(20):
    Dx = Dense(32, activation="relu")(Dx)
Dx = Dense(1, activation="sigmoid")(Dx)
D = Model(input=[inputs], output=[Dx])
D.compile(loss="binary_crossentropy", optimizer="adam")

D.fit(X_train, y_train, nb_epoch=20)

推荐答案

对于完全连接的体系结构,您的网络非常深入.您很可能被消失或爆炸梯度所打,即乘积引起的数值问题重复的非常小或非常大的数字.我建议使用较浅但较宽的网络,在我的经验中,像2-3层这样的密集层通常就足够了.如果您希望使用更深层次的体系结构,可以尝试使用跳过连接.

Your network is quite deep for a fully connected architecture. Most likely you have been hit by a vanishing- or exploding gradient, i.e. numerical problems caused by multiplying very small or very large numbers repeatedly. I'd recommend a shallower but wider network, with dense layers something like 2-3 layers is often enough in my experience. If you prefer working with the deeper architecture you could try out something like skip connections.

这篇关于训练损失在12个时期之后增加的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆