需要帮助来了解SigmoidCrossEntropyLossLayer的Caffe代码,以解决多标签丢失问题 [英] Need help understanding the Caffe code for SigmoidCrossEntropyLossLayer for multi-label loss

查看:102
本文介绍了需要帮助来了解SigmoidCrossEntropyLossLayer的Caffe代码,以解决多标签丢失问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在了解Caffe函数 SigmoidCrossEntropyLossLayer 时,我需要帮助,这是逻辑激活的交叉熵错误.

I need help in understanding the Caffe function, SigmoidCrossEntropyLossLayer, which is the cross-entropy error with logistic activation.

基本上,具有N个独立目标的单个示例的交叉熵误差表示为:

Basically, the cross-entropy error for a single example with N independent targets is denoted as:

 - sum-over-N( t[i] * log(x[i]) + (1 - t[i]) * log(1 - x[i] ) 

其中,t是目标,为0或1,而x为输出,由i索引. x当然需要进行物流激活.

where t is the target, 0 or 1, and x is the output, indexed by i. x, of course goes through a logistic activation.

更快进行交叉熵计算的代数技巧将计算减少为:

An algebraic trick for quicker cross-entropy calculation reduces the computation to:

 -t[i] * x[i] + log(1 + exp(x[i])) 

,您可以在第3部分此处进行验证.

and you can verify that from Section 3 here.

问题是,以上内容如何转换为以下损失计算代码:

The question is, how is the above translated to the loss calculating code below:

   loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
        log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));

谢谢.

为方便起见,以下复制了该功能.

The function is reproduced below for convenience.

   template <typename Dtype>
    void SigmoidCrossEntropyLossLayer<Dtype>::Forward_cpu(
        const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
      // The forward pass computes the sigmoid outputs.                                                                                                                                                                                    
      sigmoid_bottom_vec_[0] = bottom[0];
      sigmoid_layer_->Forward(sigmoid_bottom_vec_, sigmoid_top_vec_);
      // Compute the loss (negative log likelihood)                                                                                                                                                                                        
      // Stable version of loss computation from input data                                                                                                                                                                                
      const Dtype* input_data = bottom[0]->cpu_data();
      const Dtype* target = bottom[1]->cpu_data();
      int valid_count = 0;
      Dtype loss = 0;
      for (int i = 0; i < bottom[0]->count(); ++i) {
        const int target_value = static_cast<int>(target[i]);
        if (has_ignore_label_ && target_value == ignore_label_) {
          continue;
        }
        loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
            log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));
        ++valid_count;
      }
      normalizer_ = get_normalizer(normalization_, valid_count);
      top[0]->mutable_cpu_data()[0] = loss / normalizer_;
    }

推荐答案

在表达式log(1 + exp(x[i]))中,如果x[i]很大,您可能会遇到数值不稳定性.为了克服这种数值不稳定性,可以像这样对S型函数进行缩放:

In the expression log(1 + exp(x[i])) you might encounter numerical instability in case x[i] is very large. To overcome this numerical instability, one scales the sigmoid function like this:

 sig(x) = exp(x)/(1+exp(x)) 
        = [exp(x)*exp(-x(x>=0))]/[(1+exp(x))*exp(-x(x>=0))]

现在,如果将sig(x)的新的稳定表达式插入损失中,最终将得到与caffe使用的表达式相同的表达式.

Now, if you plug the new and stable expression for sig(x) into the loss you'll end up with the same expression as caffe is using.

享受!

这篇关于需要帮助来了解SigmoidCrossEntropyLossLayer的Caffe代码,以解决多标签丢失问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆