pytorch的交叉熵损失与"categorical_crossentropy"不同吗?喀拉拉邦? [英] is crossentropy loss of pytorch different than "categorical_crossentropy" of keras?

查看:598
本文介绍了pytorch的交叉熵损失与"categorical_crossentropy"不同吗?喀拉拉邦?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试模仿喀拉拉邦的pytorch神经网络.

I am trying to mimic a pytorch neural network in keras.

我相信我的神经网络的keras版本与pytorch中的版本非常接近,但是在训练过程中,我发现pytorch网络的损耗值远低于keras网络的损耗值.我想知道这是因为我没有在keras中正确复制pytorch网络,还是两个框架中的损耗计算不同.

I am confident that my keras version of the neural network is very close to the one in pytorch but during training, I see that the loss value of the pytorch network are much lower than the loss values of the keras network. I wonder if this is because I have not properly copied the pytorch network in keras or the loss computation is different in the two framework.

火炬损失的定义:

loss_function = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=args.lr, momentum=0.9, weight_decay=5e-4)

Keras损失定义:

Keras loss definition:

sgd = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)
resnet.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['categorical_accuracy'])

请注意,keras网络中的所有层均已通过L2正则化实现 kernel_regularizer = regularizers.l2(5e-4),此外,我还使用了 he_uniform 初始化,根据来源,相信pytorch中的默认设置代码.

Note that all the layers in the keras network have been implemented with L2 regularization kernel_regularizer=regularizers.l2(5e-4), also I used he_uniform initialization which I believe is default in pytorch, according to the source code.

两个网络的批处理大小相同: 128 .

The batch size for the two networks are the same: 128.

在pytorch版本中,我得到的损失值在 4.1209 附近,并减少到 0.5 附近.在喀拉拉邦,它开始于30左右,并减少到 2.5 .

In the pytorch version, I get loss values around 4.1209 which decreases to around 0.5. In keras it starts around 30 and decreases to 2.5.

推荐答案

PyTorch CrossEntropyLoss 接受每个类别的非规范化分数,即不接受概率(

PyTorch CrossEntropyLoss accepts unnormalized scores for each class i.e., not probability (source). Keras categorical_crossentropy by default uses from_logits=False which means it assumes y_pred contains probabilities (not raw scores) (source).

在PyTorch中,如果使用 CrossEntropyLoss ,则不应在最后使用softmax/Sigmoid层.在keras中,可以使用它,也可以不使用它,但应相应地设置 from_logits .

In PyTorch, if you use CrossEntropyLoss, you should not use the softmax/sigmoid layer at the end. In keras you can use it or not use it but set the from_logits accordingly.

这篇关于pytorch的交叉熵损失与"categorical_crossentropy"不同吗?喀拉拉邦?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆