使用Caffe的多类别和多标签图像分类 [英] Multi-class and multi-label image classification using Caffe

本文介绍了使用Caffe的多类别和多标签图像分类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在caffe中创建一个单一的多类多标签网络配置.

I'm trying to create a single multi-class and multi-label net configuration in caffe.

比方说狗的分类:狗是大还是小? (班级)是什么颜色的? (上课)有领吗? (标签)

Let's say classification of dogs: Is the dog small or large? (class) What color is it? (class) is it have a collar? (label)

使用咖啡可以做这件事吗? 这样做的正确方法是什么?

Is this thing possible using caffe? What is the proper way to do so?

只是想了解实际的方法. 创建2个.text文件(一个用于训练,一个用于验证)后,其中包含图像的所有标签,例如:

Just trying to understand the practical way.. After creating 2 .text files (one for training and one for validation) containing all the tags of the images, for example:

/train/img/1.png 0 4 18
/train/img/2.png 1 7 17 33
/train/img/3.png 0 4 17

运行py脚本:

import h5py, os
import caffe
import numpy as np

SIZE = 227 # fixed size to all images
with open( 'train.txt', 'r' ) as T :
    lines = T.readlines()
# If you do not have enough memory split data into
# multiple batches and generate multiple separate h5 files
X = np.zeros( (len(lines), 3, SIZE, SIZE), dtype='f4' ) 
y = np.zeros( (len(lines),1), dtype='f4' )
for i,l in enumerate(lines):
    sp = l.split(' ')
    img = caffe.io.load_image( sp[0] )
    img = caffe.io.resize( img, (SIZE, SIZE, 3) ) # resize to fixed size
    # you may apply other input transformations here...
    # Note that the transformation should take img from size-by-size-by-3 and transpose it to 3-by-size-by-size
    # for example
    transposed_img = img.transpose((2,0,1))[::-1,:,:] # RGB->BGR
    X[i] = transposed_img
    y[i] = float(sp[1])
with h5py.File('train.h5','w') as H:
    H.create_dataset( 'X', data=X ) # note the name X given to the dataset!
    H.create_dataset( 'y', data=y ) # note the name y given to the dataset!
with open('train_h5_list.txt','w') as L:
    L.write( 'train.h5' ) # list all h5 files you are going to use

并创建train.h5和val.h5(X数据集是否包含图像,Y数据集是否包含标签?)

And creating train.h5 and val.h5 (is X data set containing the images and Y contain the labels?).

从以下位置替换我的网络输入层:

Replace my network input layers from:

layers { 
 name: "data" 
 type: DATA 
 top:  "data" 
 top:  "label" 
 data_param { 
   source: "/home/gal/digits/digits/jobs/20181010-191058-21ab/train_db" 
   backend: LMDB 
   batch_size: 64 
 } 
 transform_param { 
    crop_size: 227 
    mean_file: "/home/gal/digits/digits/jobs/20181010-191058-21ab/mean.binaryproto" 
    mirror: true 
  } 
  include: { phase: TRAIN } 
} 
layers { 
 name: "data" 
 type: DATA 
 top:  "data" 
 top:  "label" 
 data_param { 
   source: "/home/gal/digits/digits/jobs/20181010-191058-21ab/val_db"  
   backend: LMDB 
   batch_size: 64
 } 
 transform_param { 
    crop_size: 227 
    mean_file: "/home/gal/digits/digits/jobs/20181010-191058-21ab/mean.binaryproto" 
    mirror: true 
  } 
  include: { phase: TEST } 
} 

layer {
  type: "HDF5Data"
  top: "X" # same name as given in create_dataset!
  top: "y"
  hdf5_data_param {
    source: "train_h5_list.txt" # do not give the h5 files directly, but the list.
    batch_size: 32
  }
  include { phase:TRAIN }
}

layer {
  type: "HDF5Data"
  top: "X" # same name as given in create_dataset!
  top: "y"
  hdf5_data_param {
    source: "val_h5_list.txt" # do not give the h5 files directly, but the list.
    batch_size: 32
  }
  include { phase:TEST }
}

我猜HDF5不需要mean.binaryproto吗?

I guess HDF5 doesn't need a mean.binaryproto?

接下来,如何改变输出层以输出多个标签概率? 我想我需要交叉熵层而不是softmax吗? 这是当前的输出层:

Next, how the output layer should change in order to output multiple label probabilities? I guess I need cross- entropy layer instead of softmax? This is the current output layers:

layers {
  bottom: "prob"
  bottom: "label"
  top: "loss"
  name: "loss"
  type: SOFTMAX_LOSS
  loss_weight: 1
}
layers {
  name: "accuracy"
  type: ACCURACY
  bottom: "prob"
  bottom: "label"
  top: "accuracy"
  include: { phase: TEST }
}

推荐答案

均值减法

虽然lmdb输入数据层可以为您处理各种输入转换,但"HDF5Data"层不支持此功能.
因此,在创建hdf5文件时,您必须处理所有输入转换(尤其是均值减法).
看看您的代码在何处显示

Mean subtraction

While lmdb input data layer is able to handle various input transformations for you, "HDF5Data" layer does not support this functionality.
Therefore, you must take care of all input transformations (in particular mean subtraction) when you create your hdf5 files.
See where your code says

# you may apply other input transformations here...

多个标签

尽管.txt为每个图像列出了几个标签,但是您只将第一个标签保存到hdf5文件中.如果要使用这些标签,则必须将它们喂入网络.
您的示例中立即出现的一个问题是,每个训练图像没有固定数量的标签-为什么?什么意思?
假设每个图像都有三个标签(在.txt文件中):

Multiple labels

Although your .txt lists several labels for each image, you only save the first one to hdf5 file. If you want to use these labels you have to feed them to the net.
An issue that immediately rise from your example is that you do not have a fixed number of labels for each training image -- why? what does it mean?
Assuming you have three labels for each image (in .txt files):

<文件名><狗大小><狗的颜色><有领>

< filename > < dog size > < dog color > < has collar >

然后,您的hdf5中可以包含y_sizey_colory_collar(而不是单个y).

Then you can have y_size, y_color and y_collar (instead of a single y) in your hdf5.

y_size[i] = float(spl[1])
y_color[i] = float(spl[2])
y_collar[i] = float(spl[3])

您的输入数据层将相应具有更多的"top":

Your input data layer will have more "top"s accordingly:

layer {
  type: "HDF5Data"
  top: "X" # same name as given in create_dataset!
  top: "y_size"
  top: "y_color"
  top: "y_collar"
  hdf5_data_param {
    source: "train_h5_list.txt" # do not give the h5 files directly, but the list.
    batch_size: 32
  }
  include { phase:TRAIN }
}

预测

当前,您的网络仅预测单个标签(带有top: "prob"的图层).您需要使用网络来预测所有三个标签,因此需要添加计算top: "prob_size"top: "prob_color"top: "prob_collar"的层(每个"prob_*"的不同层).
一旦对每个标签都有预测,就需要损失(同样,每个标签也要损失).

Prediction

Currently your net only predict a single label (layer with top: "prob"). You need your net to predict all three labels, therefore you need to add layers that compute top: "prob_size", top: "prob_color" and top: "prob_collar" (different layer for each "prob_*").
Once you have prediction for each label, you need a loss (again, a loss for each label).

这篇关于使用Caffe的多类别和多标签图像分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆