需要帮助来了解SigmoidCrossEntropyLossLayer的Caffe代码,以解决多标签丢失问题 [英] Need help understanding the Caffe code for SigmoidCrossEntropyLossLayer for multi-label loss
问题描述
在了解Caffe函数 SigmoidCrossEntropyLossLayer
时,我需要帮助,这是逻辑激活的交叉熵错误.
I need help in understanding the Caffe function, SigmoidCrossEntropyLossLayer
, which is the cross-entropy error with logistic activation.
基本上,具有N个独立目标的单个示例的交叉熵误差表示为:
Basically, the cross-entropy error for a single example with N independent targets is denoted as:
- sum-over-N( t[i] * log(x[i]) + (1 - t[i]) * log(1 - x[i] )
其中,t
是目标,为0或1,而x
为输出,由i
索引. x
当然需要进行物流激活.
where t
is the target, 0 or 1, and x
is the output, indexed by i
. x
, of course goes through a logistic activation.
更快进行交叉熵计算的代数技巧将计算减少为:
An algebraic trick for quicker cross-entropy calculation reduces the computation to:
-t[i] * x[i] + log(1 + exp(x[i]))
,您可以在第3部分此处进行验证.
and you can verify that from Section 3 here.
问题是,以上内容如何转换为以下损失计算代码:
The question is, how is the above translated to the loss calculating code below:
loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
谢谢.
为方便起见,以下复制了该功能.
The function is reproduced below for convenience.
template <typename Dtype>
void SigmoidCrossEntropyLossLayer<Dtype>::Forward_cpu(
const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
// The forward pass computes the sigmoid outputs.
sigmoid_bottom_vec_[0] = bottom[0];
sigmoid_layer_->Forward(sigmoid_bottom_vec_, sigmoid_top_vec_);
// Compute the loss (negative log likelihood)
// Stable version of loss computation from input data
const Dtype* input_data = bottom[0]->cpu_data();
const Dtype* target = bottom[1]->cpu_data();
int valid_count = 0;
Dtype loss = 0;
for (int i = 0; i < bottom[0]->count(); ++i) {
const int target_value = static_cast<int>(target[i]);
if (has_ignore_label_ && target_value == ignore_label_) {
continue;
}
loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
++valid_count;
}
normalizer_ = get_normalizer(normalization_, valid_count);
top[0]->mutable_cpu_data()[0] = loss / normalizer_;
}
推荐答案
在表达式log(1 + exp(x[i]))
中,如果x[i]
很大,您可能会遇到数值不稳定性.为了克服这种数值不稳定性,可以像这样对S型函数进行缩放:
In the expression log(1 + exp(x[i]))
you might encounter numerical instability in case x[i]
is very large. To overcome this numerical instability, one scales the sigmoid function like this:
sig(x) = exp(x)/(1+exp(x))
= [exp(x)*exp(-x(x>=0))]/[(1+exp(x))*exp(-x(x>=0))]
现在,如果将sig(x)
的新的稳定表达式插入损失中,最终将得到与caffe使用的表达式相同的表达式.
Now, if you plug the new and stable expression for sig(x)
into the loss you'll end up with the same expression as caffe is using.
享受!
这篇关于需要帮助来了解SigmoidCrossEntropyLossLayer的Caffe代码,以解决多标签丢失问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!