训练损失在12个时期之后增加 [英] Training loss increases after 12 epochs
问题描述
我有一个模型,可以在7-14个时期后学习以几乎100%的准确度进行分类(二进制分类),但是在达到最小损失0.0004之后,在下一个时期它跳升至7.5(这意味着有50%的机会正确地进行分类,相同的机会完全是纯机会),然后在接下来的所有时期都保持在7附近.
I have a model that learns to classify (binary classification) almost at 100% accuracy after 7-14 epochs but after reaching the minimum loss of 0.0004, in the next epoch it jumps to as much as 7.5 (which means it has a 50% chance of classifying correctly, same has pure chance) and then stays near 7. for all subsequent epochs.
我使用的是亚当优化器,应该注意学习率.
I use adam optimiser which should take care of the learning rate.
如何防止培训损失增加?
How can I prevent the training loss from increasing?
对于SGD优化程序而言,这种巨大的增长不会发生.
This huge jump doesn't happen for SGD optimiser.
inputs = Input(shape=(X_train.shape[1],))
Dx = Dense(32, activation="relu")(inputs)
Dx = Dense(32, activation="relu")(Dx)
for i in range(20):
Dx = Dense(32, activation="relu")(Dx)
Dx = Dense(1, activation="sigmoid")(Dx)
D = Model(input=[inputs], output=[Dx])
D.compile(loss="binary_crossentropy", optimizer="adam")
D.fit(X_train, y_train, nb_epoch=20)
推荐答案
对于完全连接的体系结构,您的网络非常深入.您很可能被消失或爆炸梯度所打,即乘积引起的数值问题重复的非常小或非常大的数字.我建议使用较浅但较宽的网络,在我的经验中,像2-3层这样的密集层通常就足够了.如果您希望使用更深层次的体系结构,可以尝试使用跳过连接.
Your network is quite deep for a fully connected architecture. Most likely you have been hit by a vanishing- or exploding gradient, i.e. numerical problems caused by multiplying very small or very large numbers repeatedly. I'd recommend a shallower but wider network, with dense layers something like 2-3 layers is often enough in my experience. If you prefer working with the deeper architecture you could try out something like skip connections.
这篇关于训练损失在12个时期之后增加的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!