pytorch的交叉熵损失与"categorical_crossentropy"不同吗?喀拉拉邦? [英] is crossentropy loss of pytorch different than "categorical_crossentropy" of keras?
问题描述
我正在尝试模仿喀拉拉邦的pytorch神经网络.
I am trying to mimic a pytorch neural network in keras.
我相信我的神经网络的keras版本与pytorch中的版本非常接近,但是在训练过程中,我发现pytorch网络的损耗值远低于keras网络的损耗值.我想知道这是因为我没有在keras中正确复制pytorch网络,还是两个框架中的损耗计算不同.
I am confident that my keras version of the neural network is very close to the one in pytorch but during training, I see that the loss value of the pytorch network are much lower than the loss values of the keras network. I wonder if this is because I have not properly copied the pytorch network in keras or the loss computation is different in the two framework.
火炬损失的定义:
loss_function = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=args.lr, momentum=0.9, weight_decay=5e-4)
Keras损失定义:
Keras loss definition:
sgd = optimizers.SGD(lr=.1, momentum=0.9, nesterov=True)
resnet.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['categorical_accuracy'])
请注意,keras网络中的所有层均已通过L2正则化实现 kernel_regularizer = regularizers.l2(5e-4)
,此外,我还使用了 he_uniform
初始化,根据来源,相信pytorch中的默认设置代码.
Note that all the layers in the keras network have been implemented with L2 regularization kernel_regularizer=regularizers.l2(5e-4)
, also I used he_uniform
initialization which I believe is default in pytorch, according to the source code.
两个网络的批处理大小相同: 128
.
The batch size for the two networks are the same: 128
.
在pytorch版本中,我得到的损失值在 4.1209
附近,并减少到 0.5
附近.在喀拉拉邦,它开始于30左右,并减少到 2.5
.
In the pytorch version, I get loss values around 4.1209
which decreases to around 0.5
. In keras it starts around 30 and decreases to 2.5
.
推荐答案
PyTorch CrossEntropyLoss
接受每个类别的非规范化分数,即不接受概率(源).
PyTorch CrossEntropyLoss
accepts unnormalized scores for each class i.e., not probability (source). Keras categorical_crossentropy
by default uses from_logits=False
which means it assumes y_pred
contains probabilities (not raw scores) (source).
在PyTorch中,如果使用 CrossEntropyLoss
,则不应在最后使用softmax/Sigmoid层.在keras中,可以使用它,也可以不使用它,但应相应地设置 from_logits
.
In PyTorch, if you use CrossEntropyLoss
, you should not use the softmax/sigmoid layer at the end. In keras you can use it or not use it but set the from_logits
accordingly.
这篇关于pytorch的交叉熵损失与"categorical_crossentropy"不同吗?喀拉拉邦?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!