Caffe中的浮动多标签回归-损失结果 [英] Float Multi-label Regression in Caffe - loss results

查看:100
本文介绍了Caffe中的浮动多标签回归-损失结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经为回归问题训练了NN.我的数据类型是HDF5_DATA,由.jpg图片(3X256X256)和float-label数组(3个标签)组成.数据集创建脚本:

import h5py, os
import caffe
import numpy as np

SIZE = 256 # images size
with open( '/home/path/trainingTintText.txt', 'r' ) as T :
    lines = T.readlines()

X = np.zeros( (len(lines), 3, SIZE, SIZE), dtype='f4' )
labels = np.zeros( (len(lines),3), dtype='f4' )

for i,l in enumerate(lines):
    sp = l.split(' ')
    img = caffe.io.load_image( sp[0] )
    img = caffe.io.resize( img, (SIZE, SIZE, 3) )
    transposed_img = img.transpose((2,0,1))[::-1,:,:] # RGB->BGR
    X[i] = transposed_img*255
    print X[i]
    labels[i,0] = float(sp[1])
    labels[i,1] = float(sp[2])
    labels[i,2] = float(sp[3])

with h5py.File('/home/path/train.h5','w') as H:
    H.create_dataset('data', data=X)
    H.create_dataset('label', data=labels)

with open('/home/path/train_h5_list.txt','w') as L:
    L.write( '/home/path/train.h5' )

这是(不完整的)体系结构:

name: "NN"

layers {
  name: "NNd"
  top: "data"
  top: "label"
  type: HDF5_DATA
  hdf5_data_param {
   source: "/home/path/train_h5_list.txt"
   batch_size: 64
  }
    include: { phase: TRAIN }

}

layers {
  name: "data"
  type: HDF5_DATA
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "/home/path/train_h5_list.txt"
    batch_size: 100

  }
  include: { phase: TEST }
}

layers {
  name: "conv1"
  type: CONVOLUTION
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 32
    kernel_size: 11
    stride: 2

    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}


layers {
  name: "ip2"
  type: INNER_PRODUCT
  bottom: "ip1"
  top: "ip2"
  inner_product_param {
    num_output: 3

    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}

layers {
  name: "relu22"
  type: RELU
  bottom: "ip2"
  top: "ip2"
}

layers {
  name: "loss"
  type: EUCLIDEAN_LOSS
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

当我训练神经网络时,我得到非常高的损耗值:

I1117 08:15:57.707001  2767 solver.cpp:337] Iteration 0, Testing net (#0)
I1117 08:15:57.707033  2767 net.cpp:684] Ignoring source layer fkp
I1117 08:15:59.111842  2767 solver.cpp:404]     Test net output #0: loss = 256.672 (* 1 = 256.672 loss)
I1117 08:15:59.275205  2767 solver.cpp:228] Iteration 0, loss = 278.909
I1117 08:15:59.275255  2767 solver.cpp:244]     Train net output #0: loss = 278.909 (* 1 = 278.909 loss)
I1117 08:15:59.275276  2767 sgd_solver.cpp:106] Iteration 0, lr = 0.01
I1117 08:16:57.115145  2767 solver.cpp:337] Iteration 100, Testing net (#0)
I1117 08:16:57.115486  2767 net.cpp:684] Ignoring source layer fkp
I1117 08:16:58.884704  2767 solver.cpp:404]     Test net output #0: loss = 238.257 (* 1 = 238.257 loss)
I1117 08:16:59.026926  2767 solver.cpp:228] Iteration 100, loss = 191.836
I1117 08:16:59.026971  2767 solver.cpp:244]     Train net output #0: loss = 191.836 (* 1 = 191.836 loss)
I1117 08:16:59.026993  2767 sgd_solver.cpp:106] Iteration 100, lr = 0.01
I1117 08:17:56.890614  2767 solver.cpp:337] Iteration 200, Testing net (#0)
I1117 08:17:56.890880  2767 net.cpp:684] Ignoring source layer fkp
I1117 08:17:58.665057  2767 solver.cpp:404]     Test net output #0: loss = 208.236 (* 1 = 208.236 loss)
I1117 08:17:58.809150  2767 solver.cpp:228] Iteration 200, loss = 136.422
I1117 08:17:58.809248  2767 solver.cpp:244]     Train net output #0: loss = 136.422 (* 1 = 136.422 loss)

当我将图像和标签阵列除以255时,损失非常低(与0整齐).造成这些损失的原因是什么?难道我做错了什么?谢谢

解决方案

对于欧几里得损失,这只是意料之中的.如果将所有标签除以256并重新训练,则欧几里得损失应小256倍.这并不意味着将标签除以256会使网络在预测标签方面变得更好.您刚刚更改了比例"(单位").

尤其是,欧几里得损失大致为 L = sqrt(( x 1 - y 1 ) 2 +( x 2 - y 2 ) 2 ),其中 x 是正确答案, y 是神经网络的输出.假设您将每个 x 除以256,然后重新训练.神经网络将学习将其输出 y 除以256.这将如何影响欧几里得损失 L ?好吧,如果您进行数学运算,您会发现 L 缩小了256倍.

这就像试图以英尺为单位预测距离与以码为单位预测距离之间的区别.后者将涉及除以3.从概念上讲,网络的整体准确性将保持不变;但是欧几里得损失将除以三分之一,因为您已将单位从码改为米. 0.1英尺的平均误差将对应于0.0333码的平均误差;但是从概念上讲即使"0.0333"看起来比"0.1"小,它的精度也相同".

将图像除以256应该无关紧要.它会将标签除以256,从而导致损失函数减少.

I have trained NN for Regression problem. my data type is HDF5_DATA that made of .jpg images (3X256X256) and float-label array (3 labels). Data-Set create script:

import h5py, os
import caffe
import numpy as np

SIZE = 256 # images size
with open( '/home/path/trainingTintText.txt', 'r' ) as T :
    lines = T.readlines()

X = np.zeros( (len(lines), 3, SIZE, SIZE), dtype='f4' )
labels = np.zeros( (len(lines),3), dtype='f4' )

for i,l in enumerate(lines):
    sp = l.split(' ')
    img = caffe.io.load_image( sp[0] )
    img = caffe.io.resize( img, (SIZE, SIZE, 3) )
    transposed_img = img.transpose((2,0,1))[::-1,:,:] # RGB->BGR
    X[i] = transposed_img*255
    print X[i]
    labels[i,0] = float(sp[1])
    labels[i,1] = float(sp[2])
    labels[i,2] = float(sp[3])

with h5py.File('/home/path/train.h5','w') as H:
    H.create_dataset('data', data=X)
    H.create_dataset('label', data=labels)

with open('/home/path/train_h5_list.txt','w') as L:
    L.write( '/home/path/train.h5' )

this is (not fullish) architecture:

name: "NN"

layers {
  name: "NNd"
  top: "data"
  top: "label"
  type: HDF5_DATA
  hdf5_data_param {
   source: "/home/path/train_h5_list.txt"
   batch_size: 64
  }
    include: { phase: TRAIN }

}

layers {
  name: "data"
  type: HDF5_DATA
  top: "data"
  top: "label"
  hdf5_data_param {
    source: "/home/path/train_h5_list.txt"
    batch_size: 100

  }
  include: { phase: TEST }
}

layers {
  name: "conv1"
  type: CONVOLUTION
  bottom: "data"
  top: "conv1"
  convolution_param {
    num_output: 32
    kernel_size: 11
    stride: 2

    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}


layers {
  name: "ip2"
  type: INNER_PRODUCT
  bottom: "ip1"
  top: "ip2"
  inner_product_param {
    num_output: 3

    bias_filler {
      type: "constant"
      value: 0.1
    }
  }
}

layers {
  name: "relu22"
  type: RELU
  bottom: "ip2"
  top: "ip2"
}

layers {
  name: "loss"
  type: EUCLIDEAN_LOSS
  bottom: "ip2"
  bottom: "label"
  top: "loss"
}

when I train the NN I got very high loss values:

I1117 08:15:57.707001  2767 solver.cpp:337] Iteration 0, Testing net (#0)
I1117 08:15:57.707033  2767 net.cpp:684] Ignoring source layer fkp
I1117 08:15:59.111842  2767 solver.cpp:404]     Test net output #0: loss = 256.672 (* 1 = 256.672 loss)
I1117 08:15:59.275205  2767 solver.cpp:228] Iteration 0, loss = 278.909
I1117 08:15:59.275255  2767 solver.cpp:244]     Train net output #0: loss = 278.909 (* 1 = 278.909 loss)
I1117 08:15:59.275276  2767 sgd_solver.cpp:106] Iteration 0, lr = 0.01
I1117 08:16:57.115145  2767 solver.cpp:337] Iteration 100, Testing net (#0)
I1117 08:16:57.115486  2767 net.cpp:684] Ignoring source layer fkp
I1117 08:16:58.884704  2767 solver.cpp:404]     Test net output #0: loss = 238.257 (* 1 = 238.257 loss)
I1117 08:16:59.026926  2767 solver.cpp:228] Iteration 100, loss = 191.836
I1117 08:16:59.026971  2767 solver.cpp:244]     Train net output #0: loss = 191.836 (* 1 = 191.836 loss)
I1117 08:16:59.026993  2767 sgd_solver.cpp:106] Iteration 100, lr = 0.01
I1117 08:17:56.890614  2767 solver.cpp:337] Iteration 200, Testing net (#0)
I1117 08:17:56.890880  2767 net.cpp:684] Ignoring source layer fkp
I1117 08:17:58.665057  2767 solver.cpp:404]     Test net output #0: loss = 208.236 (* 1 = 208.236 loss)
I1117 08:17:58.809150  2767 solver.cpp:228] Iteration 200, loss = 136.422
I1117 08:17:58.809248  2767 solver.cpp:244]     Train net output #0: loss = 136.422 (* 1 = 136.422 loss)

when I divide the images and the label arrays by 255 I got very low loss results (neat to 0). what is the reason for those loss results? am I doing something wrong? thanks

解决方案

With the Euclidean loss, this is only to be expected. The Euclidean loss should be smaller by a factor of 256 if you divide all of the labels by 256 and re-train. It doesn't mean that dividing the labels by 256 makes the network become any better at predicting the labels; you've just changed the "scale" (the "units").

In particular, the Euclidean loss is (roughly) L = sqrt((x1 -y1)2 + (x2 -y2)2), where x is the correct answer and y is the output from the neural network. Suppose you divide every x by 256, then re-train. The neural network will learn to divide its output y by 256. How will this affect the Euclidean loss L? Well, if you work through the math, you'll find that L shrinks by a factor of 256.

It'd be like the difference between trying to predict a distance in feet, vs a distance in yards. The latter would involve dividing by 3. Conceptually, the overall accuracy of the network would remain the same; but the Euclidean loss would be divided by a factor of three, because you've changed the units from yards to meters. An average error of 0.1 feet would correspond to an average error of 0.0333 yards; but conceptually yield the "same" accuracy, even though 0.0333 looks like a smaller number than 0.1.

Dividing the images by 256 should be irrelevant. It's dividing the labels by 256 that caused the reduction in the loss function.

这篇关于Caffe中的浮动多标签回归-损失结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆