修改多个输入的Caffe C ++预测代码 [英] Modifying the Caffe C++ prediction code for multiple inputs

查看:92
本文介绍了修改多个输入的Caffe C ++预测代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我实现了 Caffe C ++示例的修改版本,效果非常好,速度非常慢,因为它只能一张一张地接受图像.理想情况下,我想将200张图像的向量传递给Caffe,并为每张图像返回最佳预测.我从Fanglin Wang那里得到了伟大的帮助,并实施了他的一些建议,但是仍难以解决如何从每个图像中检索最佳结果的问题.

现在将Classify方法传递给cv::Mat对象(变量input_channels)的向量,该对象是灰度浮点图像的向量.我消除了代码中的预处理方法,因为我不需要将这些图像转换为浮点数或减去均值图像.我也一直试图摆脱N变量,因为我只想返回每个图像的最高预测值和概率.<​​/p>

#include "Classifier.h"
using namespace caffe;
using std::string;

Classifier::Classifier(const string& model_file, const string& trained_file, const string& label_file) {
#ifdef CPU_ONLY
  Caffe::set_mode(Caffe::CPU);
#else
  Caffe::set_mode(Caffe::GPU);
#endif

  /* Load the network. */
  net_.reset(new Net<float>(model_file, TEST));
  net_->CopyTrainedLayersFrom(trained_file);

  Blob<float>* input_layer = net_->input_blobs()[0];
  num_channels_ = input_layer->channels();
  input_geometry_ = cv::Size(input_layer->width(), input_layer->height());

  /* Load labels. */
  std::ifstream labels(label_file.c_str());
  CHECK(labels) << "Unable to open labels file " << label_file;
  string line;
  while (std::getline(labels, line))
    labels_.push_back(string(line));

  Blob<float>* output_layer = net_->output_blobs()[0];
  CHECK_EQ(labels_.size(), output_layer->channels())
    << "Number of labels is different from the output layer dimension.";
}

static bool PairCompare(const std::pair<float, int>& lhs, const std::pair<float, int>& rhs) {
  return lhs.first > rhs.first;
}

/* Return the indices of the top N values of vector v. */
static std::vector<int> Argmax(const std::vector<float>& v, int N) {
  std::vector<std::pair<float, int> > pairs;
  for (size_t i = 0; i < v.size(); ++i)
    pairs.push_back(std::make_pair(v[i], i));
  std::partial_sort(pairs.begin(), pairs.begin() + N, pairs.end(), PairCompare);

  std::vector<int> result;
  for (int i = 0; i < N; ++i)
    result.push_back(pairs[i].second);
  return result;
}

/* Return the top N predictions. */
std::vector<Prediction> Classifier::Classify(const std::vector<cv::Mat> &input_channels) {
  std::vector<float> output = Predict(input_channels);

    std::vector<int> maxN = Argmax(output, 1);
    int idx = maxN[0];
    predictions.push_back(std::make_pair(labels_[idx], output[idx]));
    return predictions;
}

std::vector<float> Classifier::Predict(const std::vector<cv::Mat> &input_channels, int num_images) {
  Blob<float>* input_layer = net_->input_blobs()[0];
  input_layer->Reshape(num_images, num_channels_,
                       input_geometry_.height, input_geometry_.width);
  /* Forward dimension change to all layers. */
  net_->Reshape();

  WrapInputLayer(&input_channels);

  net_->ForwardPrefilled();

  /* Copy the output layer to a std::vector */
  Blob<float>* output_layer = net_->output_blobs()[0];
  const float* begin = output_layer->cpu_data();
  const float* end = begin + num_images * output_layer->channels();
  return std::vector<float>(begin, end);
}

/* Wrap the input layer of the network in separate cv::Mat objects (one per channel). This way we save one memcpy operation and we don't need to rely on cudaMemcpy2D. The last preprocessing operation will write the separate channels directly to the input layer. */
void Classifier::WrapInputLayer(std::vector<cv::Mat>* input_channels) {
  Blob<float>* input_layer = net_->input_blobs()[0];

  int width = input_layer->width();
  int height = input_layer->height();
  float* input_data = input_layer->mutable_cpu_data();
  for (int i = 0; i < input_layer->channels() * num_images; ++i) {
    cv::Mat channel(height, width, CV_32FC1, input_data);
    input_channels->push_back(channel);
    input_data += width * height;
  }
}

更新

非常感谢您的帮助,Shai,我进行了您建议的更改,但似乎遇到了一些我无法解决的奇怪编译问题(我设法解决了一些问题).

这些是我所做的更改:

头文件:

#ifndef __CLASSIFIER_H__
#define __CLASSIFIER_H__

#include <caffe/caffe.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <algorithm>
#include <iosfwd>
#include <memory>
#include <string>
#include <utility>
#include <vector>


using namespace caffe;  // NOLINT(build/namespaces)
using std::string;

/* Pair (label, confidence) representing a prediction. */
typedef std::pair<string, float> Prediction;

class Classifier {
 public:
  Classifier(const string& model_file,
             const string& trained_file,
             const string& label_file);

  std::vector< std::pair<int,float> > Classify(const std::vector<cv::Mat>& img);

 private:

  std::vector< std::vector<float> > Predict(const std::vector<cv::Mat>& img, int nImages);

  void WrapInputLayer(std::vector<cv::Mat>* input_channels, int nImages);

  void Preprocess(const std::vector<cv::Mat>& img,
                  std::vector<cv::Mat>* input_channels, int nImages);

 private:
  shared_ptr<Net<float> > net_;
  cv::Size input_geometry_;
  int num_channels_;
  std::vector<string> labels_;
};

#endif /* __CLASSIFIER_H__ */

类文件:

#define CPU_ONLY
#include "Classifier.h"

using namespace caffe;  // NOLINT(build/namespaces)
using std::string;

Classifier::Classifier(const string& model_file,
                       const string& trained_file,
                       const string& label_file) {
#ifdef CPU_ONLY
  Caffe::set_mode(Caffe::CPU);
#else
  Caffe::set_mode(Caffe::GPU);
#endif

  /* Load the network. */
  net_.reset(new Net<float>(model_file, TEST));
  net_->CopyTrainedLayersFrom(trained_file);

  CHECK_EQ(net_->num_inputs(), 1) << "Network should have exactly one input.";
  CHECK_EQ(net_->num_outputs(), 1) << "Network should have exactly one output.";

  Blob<float>* input_layer = net_->input_blobs()[0];
  num_channels_ = input_layer->channels();
  CHECK(num_channels_ == 3 || num_channels_ == 1)
    << "Input layer should have 1 or 3 channels.";
  input_geometry_ = cv::Size(input_layer->width(), input_layer->height());

  /* Load labels. */
  std::ifstream labels(label_file.c_str());
  CHECK(labels) << "Unable to open labels file " << label_file;
  string line;
  while (std::getline(labels, line))
    labels_.push_back(string(line));

  Blob<float>* output_layer = net_->output_blobs()[0];
  CHECK_EQ(labels_.size(), output_layer->channels())
    << "Number of labels is different from the output layer dimension.";
}

static bool PairCompare(const std::pair<float, int>& lhs,
                        const std::pair<float, int>& rhs) {
  return lhs.first > rhs.first;
}

/* Return the indices of the top N values of vector v. */
static std::vector<int> Argmax(const std::vector<float>& v, int N) {
  std::vector<std::pair<float, int> > pairs;
  for (size_t i = 0; i < v.size(); ++i)
    pairs.push_back(std::make_pair(v[i], i));
  std::partial_sort(pairs.begin(), pairs.begin() + N, pairs.end(), PairCompare);

  std::vector<int> result;
  for (int i = 0; i < N; ++i)
    result.push_back(pairs[i].second);
  return result;
}

std::vector< std::pair<int,float> > Classifier::Classify(const std::vector<cv::Mat>& img) {
  std::vector< std::vector<float> > output = Predict(img, img.size());

  std::vector< std::pair<int,float> > predictions;
  for ( int i = 0 ; i < output.size(); i++ ) {
    std::vector<int> maxN = Argmax(output[i], 1);
    int idx = maxN[0];
    predictions.push_back(std::make_pair(labels_[idx], output[idx]));
  }
  return predictions;
}

std::vector< std::vector<float> > Classifier::Predict(const std::vector<cv::Mat>& img, int nImages) {
  Blob<float>* input_layer = net_->input_blobs()[0];
  input_layer->Reshape(nImages, num_channels_,
                       input_geometry_.height, input_geometry_.width);
  /* Forward dimension change to all layers. */
  net_->Reshape();

  std::vector<cv::Mat> input_channels;
  WrapInputLayer(&input_channels, nImages);

  Preprocess(img, &input_channels, nImages);

  net_->ForwardPrefilled();

  /* Copy the output layer to a std::vector */

  Blob<float>* output_layer = net_->output_blobs()[0];
  std::vector <std::vector<float> > ret;
  for (int i = 0; i < nImages; i++) {
    const float* begin = output_layer->cpu_data() + i*output_layer->channels();
    const float* end = begin + output_layer->channels();
    ret.push_back( std::vector<float>(begin, end) );
  }
  return ret;
}

/* Wrap the input layer of the network in separate cv::Mat objects
 * (one per channel). This way we save one memcpy operation and we
 * don't need to rely on cudaMemcpy2D. The last preprocessing
 * operation will write the separate channels directly to the input
 * layer. */
void Classifier::WrapInputLayer(std::vector<cv::Mat>* input_channels, int nImages) {
  Blob<float>* input_layer = net_->input_blobs()[0];

  int width = input_layer->width();
  int height = input_layer->height();
  float* input_data = input_layer->mutable_cpu_data();
  for (int i = 0; i < input_layer->channels()* nImages; ++i) {
    cv::Mat channel(height, width, CV_32FC1, input_data);
    input_channels->push_back(channel);
    input_data += width * height;
  }
}

void Classifier::Preprocess(const std::vector<cv::Mat>& img,
                            std::vector<cv::Mat>* input_channels, int nImages) {
  for (int i = 0; i < nImages; i++) {
      vector<cv::Mat> channels;
      cv::split(img[i], channels);
      for (int j = 0; j < channels.size(); j++){
           channels[j].copyTo((*input_channels)[i*num_channels_[0]+j]);
      }
  }
}

解决方案

如果我正确理解了您的问题,则输入n图像,期望n(label, prob),但是只能得到一对. >

我相信这些修改可以为您解决问题:

  1. Classifier::Predict应该返回一个vector< vector<float> >,它是每个输入图像的概率的 vector .这是大小为output_layer->channels()的向量的大小为nvector:

    std::vector< std::vecot<float> > 
    Classifier::Predict(const std::vector<cv::Mat> &input_channels, 
                        int num_images) {
      // same code here...
    
      /* changes here: Copy the output layer to a std::vector */
      Blob<float>* output_layer = net_->output_blobs()[0];
      std::vector< std::vector<float> > ret;
      for ( int i = 0 ; i < num_images ; i++ ) {
          const float* begin = output_layer->cpu_data() + i*output_layer->channels();
          const float* end = begin + output_layer->channels();
          ret.push_back( std::vector<float>(begin, end) );
      }
      return ret;
    }
    

  2. Classifier::Classify中,您需要独立处理每个vector<float>Argmax:

     std::vector< std::pair<int,float> > 
     Classifier::Classify(const std::vector<cv::Mat> &input_channels) {
    
       std::vector< std::vector<float> > output = Predict(input_channels);
    
       std::vector< std::pair<int,float> > predictions;
       for ( int i = 0 ; i < output.size(); i++ ) {
           std::vector<int> maxN = Argmax(output[i], 1);
           int idx = maxN[0];
           predictions.push_back(std::make_pair(labels_[idx], output[idx]));
       }
       return predictions;
     }
    

I implemented a modified version of the Caffe C++ example and while it works really well, it's incredibly slow because it only accepts images one by one. Ideally I'd like to pass Caffe a vector of 200 images and return the best prediction for each one. I received some great help from Fanglin Wang and implemented some of his recommendations, but am still having some trouble working out how to retrieve the best result from each image.

The Classify method is now passed a vector of cv::Mat objects (variable input_channels) which is a vector of grayscale floating point images. I've eliminated the preprocessing method in the code because I don't need to convert these images to floating point or subtract the mean image. I've also been trying to get rid of the N variable because I only want to return the top prediction and probability for each image.

#include "Classifier.h"
using namespace caffe;
using std::string;

Classifier::Classifier(const string& model_file, const string& trained_file, const string& label_file) {
#ifdef CPU_ONLY
  Caffe::set_mode(Caffe::CPU);
#else
  Caffe::set_mode(Caffe::GPU);
#endif

  /* Load the network. */
  net_.reset(new Net<float>(model_file, TEST));
  net_->CopyTrainedLayersFrom(trained_file);

  Blob<float>* input_layer = net_->input_blobs()[0];
  num_channels_ = input_layer->channels();
  input_geometry_ = cv::Size(input_layer->width(), input_layer->height());

  /* Load labels. */
  std::ifstream labels(label_file.c_str());
  CHECK(labels) << "Unable to open labels file " << label_file;
  string line;
  while (std::getline(labels, line))
    labels_.push_back(string(line));

  Blob<float>* output_layer = net_->output_blobs()[0];
  CHECK_EQ(labels_.size(), output_layer->channels())
    << "Number of labels is different from the output layer dimension.";
}

static bool PairCompare(const std::pair<float, int>& lhs, const std::pair<float, int>& rhs) {
  return lhs.first > rhs.first;
}

/* Return the indices of the top N values of vector v. */
static std::vector<int> Argmax(const std::vector<float>& v, int N) {
  std::vector<std::pair<float, int> > pairs;
  for (size_t i = 0; i < v.size(); ++i)
    pairs.push_back(std::make_pair(v[i], i));
  std::partial_sort(pairs.begin(), pairs.begin() + N, pairs.end(), PairCompare);

  std::vector<int> result;
  for (int i = 0; i < N; ++i)
    result.push_back(pairs[i].second);
  return result;
}

/* Return the top N predictions. */
std::vector<Prediction> Classifier::Classify(const std::vector<cv::Mat> &input_channels) {
  std::vector<float> output = Predict(input_channels);

    std::vector<int> maxN = Argmax(output, 1);
    int idx = maxN[0];
    predictions.push_back(std::make_pair(labels_[idx], output[idx]));
    return predictions;
}

std::vector<float> Classifier::Predict(const std::vector<cv::Mat> &input_channels, int num_images) {
  Blob<float>* input_layer = net_->input_blobs()[0];
  input_layer->Reshape(num_images, num_channels_,
                       input_geometry_.height, input_geometry_.width);
  /* Forward dimension change to all layers. */
  net_->Reshape();

  WrapInputLayer(&input_channels);

  net_->ForwardPrefilled();

  /* Copy the output layer to a std::vector */
  Blob<float>* output_layer = net_->output_blobs()[0];
  const float* begin = output_layer->cpu_data();
  const float* end = begin + num_images * output_layer->channels();
  return std::vector<float>(begin, end);
}

/* Wrap the input layer of the network in separate cv::Mat objects (one per channel). This way we save one memcpy operation and we don't need to rely on cudaMemcpy2D. The last preprocessing operation will write the separate channels directly to the input layer. */
void Classifier::WrapInputLayer(std::vector<cv::Mat>* input_channels) {
  Blob<float>* input_layer = net_->input_blobs()[0];

  int width = input_layer->width();
  int height = input_layer->height();
  float* input_data = input_layer->mutable_cpu_data();
  for (int i = 0; i < input_layer->channels() * num_images; ++i) {
    cv::Mat channel(height, width, CV_32FC1, input_data);
    input_channels->push_back(channel);
    input_data += width * height;
  }
}

UPDATE

Thank-you so much for your help Shai, I made the changes you recommended but seem to be getting some strange compilation issues I can't work out (I managed to sort out a few of the issues).

These are the changes I made:

Header File:

#ifndef __CLASSIFIER_H__
#define __CLASSIFIER_H__

#include <caffe/caffe.hpp>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <algorithm>
#include <iosfwd>
#include <memory>
#include <string>
#include <utility>
#include <vector>


using namespace caffe;  // NOLINT(build/namespaces)
using std::string;

/* Pair (label, confidence) representing a prediction. */
typedef std::pair<string, float> Prediction;

class Classifier {
 public:
  Classifier(const string& model_file,
             const string& trained_file,
             const string& label_file);

  std::vector< std::pair<int,float> > Classify(const std::vector<cv::Mat>& img);

 private:

  std::vector< std::vector<float> > Predict(const std::vector<cv::Mat>& img, int nImages);

  void WrapInputLayer(std::vector<cv::Mat>* input_channels, int nImages);

  void Preprocess(const std::vector<cv::Mat>& img,
                  std::vector<cv::Mat>* input_channels, int nImages);

 private:
  shared_ptr<Net<float> > net_;
  cv::Size input_geometry_;
  int num_channels_;
  std::vector<string> labels_;
};

#endif /* __CLASSIFIER_H__ */

Class File:

#define CPU_ONLY
#include "Classifier.h"

using namespace caffe;  // NOLINT(build/namespaces)
using std::string;

Classifier::Classifier(const string& model_file,
                       const string& trained_file,
                       const string& label_file) {
#ifdef CPU_ONLY
  Caffe::set_mode(Caffe::CPU);
#else
  Caffe::set_mode(Caffe::GPU);
#endif

  /* Load the network. */
  net_.reset(new Net<float>(model_file, TEST));
  net_->CopyTrainedLayersFrom(trained_file);

  CHECK_EQ(net_->num_inputs(), 1) << "Network should have exactly one input.";
  CHECK_EQ(net_->num_outputs(), 1) << "Network should have exactly one output.";

  Blob<float>* input_layer = net_->input_blobs()[0];
  num_channels_ = input_layer->channels();
  CHECK(num_channels_ == 3 || num_channels_ == 1)
    << "Input layer should have 1 or 3 channels.";
  input_geometry_ = cv::Size(input_layer->width(), input_layer->height());

  /* Load labels. */
  std::ifstream labels(label_file.c_str());
  CHECK(labels) << "Unable to open labels file " << label_file;
  string line;
  while (std::getline(labels, line))
    labels_.push_back(string(line));

  Blob<float>* output_layer = net_->output_blobs()[0];
  CHECK_EQ(labels_.size(), output_layer->channels())
    << "Number of labels is different from the output layer dimension.";
}

static bool PairCompare(const std::pair<float, int>& lhs,
                        const std::pair<float, int>& rhs) {
  return lhs.first > rhs.first;
}

/* Return the indices of the top N values of vector v. */
static std::vector<int> Argmax(const std::vector<float>& v, int N) {
  std::vector<std::pair<float, int> > pairs;
  for (size_t i = 0; i < v.size(); ++i)
    pairs.push_back(std::make_pair(v[i], i));
  std::partial_sort(pairs.begin(), pairs.begin() + N, pairs.end(), PairCompare);

  std::vector<int> result;
  for (int i = 0; i < N; ++i)
    result.push_back(pairs[i].second);
  return result;
}

std::vector< std::pair<int,float> > Classifier::Classify(const std::vector<cv::Mat>& img) {
  std::vector< std::vector<float> > output = Predict(img, img.size());

  std::vector< std::pair<int,float> > predictions;
  for ( int i = 0 ; i < output.size(); i++ ) {
    std::vector<int> maxN = Argmax(output[i], 1);
    int idx = maxN[0];
    predictions.push_back(std::make_pair(labels_[idx], output[idx]));
  }
  return predictions;
}

std::vector< std::vector<float> > Classifier::Predict(const std::vector<cv::Mat>& img, int nImages) {
  Blob<float>* input_layer = net_->input_blobs()[0];
  input_layer->Reshape(nImages, num_channels_,
                       input_geometry_.height, input_geometry_.width);
  /* Forward dimension change to all layers. */
  net_->Reshape();

  std::vector<cv::Mat> input_channels;
  WrapInputLayer(&input_channels, nImages);

  Preprocess(img, &input_channels, nImages);

  net_->ForwardPrefilled();

  /* Copy the output layer to a std::vector */

  Blob<float>* output_layer = net_->output_blobs()[0];
  std::vector <std::vector<float> > ret;
  for (int i = 0; i < nImages; i++) {
    const float* begin = output_layer->cpu_data() + i*output_layer->channels();
    const float* end = begin + output_layer->channels();
    ret.push_back( std::vector<float>(begin, end) );
  }
  return ret;
}

/* Wrap the input layer of the network in separate cv::Mat objects
 * (one per channel). This way we save one memcpy operation and we
 * don't need to rely on cudaMemcpy2D. The last preprocessing
 * operation will write the separate channels directly to the input
 * layer. */
void Classifier::WrapInputLayer(std::vector<cv::Mat>* input_channels, int nImages) {
  Blob<float>* input_layer = net_->input_blobs()[0];

  int width = input_layer->width();
  int height = input_layer->height();
  float* input_data = input_layer->mutable_cpu_data();
  for (int i = 0; i < input_layer->channels()* nImages; ++i) {
    cv::Mat channel(height, width, CV_32FC1, input_data);
    input_channels->push_back(channel);
    input_data += width * height;
  }
}

void Classifier::Preprocess(const std::vector<cv::Mat>& img,
                            std::vector<cv::Mat>* input_channels, int nImages) {
  for (int i = 0; i < nImages; i++) {
      vector<cv::Mat> channels;
      cv::split(img[i], channels);
      for (int j = 0; j < channels.size(); j++){
           channels[j].copyTo((*input_channels)[i*num_channels_[0]+j]);
      }
  }
}

解决方案

If I understand your problem correctly, you input n images, expecting n pairs of (label, prob), but getting only one such pair.

I believe these modifications should do the trick for you:

  1. Classifier::Predict should return a vector< vector<float> >, that is a vector of probabilities per input image. That is a vector of size n of vectors of size output_layer->channels():

    std::vector< std::vecot<float> > 
    Classifier::Predict(const std::vector<cv::Mat> &input_channels, 
                        int num_images) {
      // same code here...
    
      /* changes here: Copy the output layer to a std::vector */
      Blob<float>* output_layer = net_->output_blobs()[0];
      std::vector< std::vector<float> > ret;
      for ( int i = 0 ; i < num_images ; i++ ) {
          const float* begin = output_layer->cpu_data() + i*output_layer->channels();
          const float* end = begin + output_layer->channels();
          ret.push_back( std::vector<float>(begin, end) );
      }
      return ret;
    }
    

  2. In Classifier::Classify you need to process each vector<float> through Argmax independantly:

     std::vector< std::pair<int,float> > 
     Classifier::Classify(const std::vector<cv::Mat> &input_channels) {
    
       std::vector< std::vector<float> > output = Predict(input_channels);
    
       std::vector< std::pair<int,float> > predictions;
       for ( int i = 0 ; i < output.size(); i++ ) {
           std::vector<int> maxN = Argmax(output[i], 1);
           int idx = maxN[0];
           predictions.push_back(std::make_pair(labels_[idx], output[idx]));
       }
       return predictions;
     }
    

这篇关于修改多个输入的Caffe C ++预测代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆