如何使用SVM从视频检测对象 [英] How to detect object from video using SVM

查看:150
本文介绍了如何使用SVM从视频检测对象的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的训练例如车辆的数据集的代码,当它完全训练,我想它从视频(.avi)预测数据(车辆),如何预测训练的数据视频和如何添加那部分?,我想当车辆显示在视频中它计数为1和cout的对象被检测到,如果第二辆车来它增加计数as 2

  IplImage * img2; 
cout<<Vector quantization ...<< endl;
collectclasscentroids();
vector< Mat> descriptors = bowTrainer.getDescriptors();
int count = 0;
for(vector< Mat> :: iterator iter = descriptors.begin(); iter!= descriptors.end(); iter ++)
{
count + = iter-> rows;
}
cout<<Clustering<< count<<features<< endl;
//选择集群的质心作为字典的单词
Mat dictionary = bowTrainer.cluster();
bowDE.setVocabulary(dictionary);
cout<以每个图像的BOW的形式提取直方图<< endl;
Mat标签(0,1,CV_32FC1);
Mat trainingData(0,dictionarySize,CV_32FC1);
int k = 0;
vector< KeyPoint> keypoint1;
Mat bowDescriptor1;
//针对每个图像以弓的形式提取直方图
for(j = 1; j <= 4; j ++)
for(i = 1; i <= 60; i ++)
{
sprintf(ch,%s%d%s%d%s,train /,j,(,i,)。
const char * imageName = ch;
img2 = cvLoadImage(imageName,0);
detector.detect(img2,keypoint1);
bowDE.compute(img2,keypoint1,bowDescriptor1);
trainingData.push_back(bowDescriptor1);
labels.push_back((float)j);
}
//设置SVM参数
CvSVMParams params;
params.kernel_type = CvSVM :: RBF;
params.svm_type = CvSVM :: C_SVC;
params.gamma = 0.50625000000000009;
params.C = 312.50000000000000;
params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER,100,0.000001);
CvSVM svm;



printf(%s\\\
,Training SVM classifier);

bool res = svm.train(trainingData,labels,cv :: Mat(),cv :: Mat(),params);

cout<<<Processing evaluation data ...<< endl;


Mat groundTruth(0,1,CV_32FC1);
Mat evalData(0,dictionarySize,CV_32FC1);
k = 0;
vector< KeyPoint>关键点2;
Mat bowDescriptor2;


结果(0,1,CV_32FC1);;
for(j = 1; j <= 4; j ++)
for(i = 1; i <= 60; i ++)
{
sprintf %s%d%s%d%s,eval /,j,(,i,)。
const char * imageName = ch;
img2 = cvLoadImage(imageName,0);
detector.detect(img2,keypoint2);
bowDE.compute(img2,keypoint2,bowDescriptor2);
evalData.push_back(bowDescriptor2);
groundTruth.push_back((float)j);
float response = svm.predict(bowDescriptor2);
results.push_back(response);
}



//计算不匹配类的数量
double errorRate =(double)countNonZero(groundTruth- results)/ evalData.rows;

问题是从视频,我想知道如何从视频预测,意味着像我想检测车辆从电影,就像它应该显示1当它从电影找到车辆



对于不明白问题的人:



我想在上述代码中播放电影

  VideoCapture cap(movie.avi); //movie.avi已删除背景

假设我有一个包含车辆的训练数据, movie.avi包含5辆车,因此它应该检测到从movie.avi车辆,并给我 5 作为输出



如何在上述代码中执行此部分

解决方案

查看您的代码设置

  params.svm_type = CvSVM :: C_SVC; 

似乎你训练你的分类器有两个以上的类。在交通场景中的典型示例可以是汽车/行人/自行车/ ...但是,你要求一种方法来检测汽车。没有对你的训练数据和你的视频的描述,很难说,如果你的想法是有意义的。我猜以前的答案是假设如下:



你循环通过每个帧,并想输出该帧中的汽车数量。因此,帧可以包含多个汽车,例如5.如果将整个帧作为分类器的输入,它可能响应汽车,即使设置可能有点偏离概念。您无法使用此方法可靠地检索汽车数量。



相反,建议您尝试使用滑动窗口方法 。这意味着,例如,您循环框架的每个像素,并将围绕像素的区域(称为子窗口感兴趣区域)作为输入分类器。假设固定的标度,子窗口可以具有150×50像素的大小以及您的训练数据。您可以在训练数据中修正汽车的比例,但在现实世界的视频中,汽车的大小会有所不同。为了找到不同尺度的汽车,让我们说它是训练数据的两倍,典型的方法是缩放图像(例如以2的因子),并重复滑动窗口方法。 p>

通过对所有相关尺度重复此操作,您将得到一个算法,为每个像素位置提供一个算法,并且每次缩放您的分类器的结果。这意味着你有三个循环,换句话说,有三个维度(图像宽度,图像高度,比例)。这被最好地理解为三维金字塔。 为什么是金字塔?你可能会问。因为每次图像缩放(例如2),图像变得更小(/更大),下一个尺度是不同尺寸的图像(例如一半的尺寸)。



像素位置表示汽车的位置,刻度表示汽车的尺寸。现在,如果你有一个N类分类器,这个金字塔中的每个插槽将包含一个数字(1,...,N)指示类。如果你有一个二元分类器(汽车/没有汽车),那么你将最终得到每个插槽包含0或1.即使在这个简单的情况下,你会被诱惑简单地计数1,并输出计数为车的数量,你仍然有问题,可能有多个响应同一辆车。因此,它会更好,如果你有一个汽车检测器,连续反应在0和1之间,然后你可以找到最大值在这个金字塔。每个最大值将指示一辆汽车。这种检测成功地用于角特征,您可以在所谓的尺度空间金字塔中检测感兴趣的角。



无论你是将问题简化为二进制分类问题(汽车/没有汽车),或者如果你坚持更困难的任务,区分多个类(汽车/动物 pedestrian/ ...),你仍然有在每个框架的规模和位置的问题要解决。


This is my code for training the dataset of for example vehicles , when it train fully , i want it to predict the data(vehicle) from video(.avi) , how to predict trained data from video and how to add that part in it ? , i want that when the vehicle is shown in the video it count it as 1 and cout that the object is detected and if second vehicle come it increment the count as 2

    IplImage *img2;
    cout<<"Vector quantization..."<<endl;
    collectclasscentroids();
    vector<Mat> descriptors = bowTrainer.getDescriptors();
    int count=0;
    for(vector<Mat>::iterator iter=descriptors.begin();iter!=descriptors.end();iter++)
    {
       count += iter->rows;
    }
    cout<<"Clustering "<<count<<" features"<<endl;
    //choosing cluster's centroids as dictionary's words
    Mat dictionary = bowTrainer.cluster();
    bowDE.setVocabulary(dictionary);
    cout<<"extracting histograms in the form of BOW for each image "<<endl;
    Mat labels(0, 1, CV_32FC1);
    Mat trainingData(0, dictionarySize, CV_32FC1);
    int k = 0;
    vector<KeyPoint> keypoint1;
    Mat bowDescriptor1;
    //extracting histogram in the form of bow for each image 
   for(j = 1; j <= 4; j++)
    for(i = 1; i <= 60; i++)
            {
              sprintf( ch,"%s%d%s%d%s","train/",j," (",i,").jpg");
              const char* imageName = ch;
              img2 = cvLoadImage(imageName, 0); 
              detector.detect(img2, keypoint1);
              bowDE.compute(img2, keypoint1, bowDescriptor1);
              trainingData.push_back(bowDescriptor1);
              labels.push_back((float) j);
             }
    //Setting up SVM parameters
    CvSVMParams params;
    params.kernel_type = CvSVM::RBF;
    params.svm_type = CvSVM::C_SVC;
    params.gamma = 0.50625000000000009;
    params.C = 312.50000000000000;
    params.term_crit = cvTermCriteria(CV_TERMCRIT_ITER, 100, 0.000001);
    CvSVM svm;



    printf("%s\n", "Training SVM classifier");

    bool res = svm.train(trainingData, labels, cv::Mat(), cv::Mat(), params);

    cout<<"Processing evaluation data..."<<endl;


    Mat groundTruth(0, 1, CV_32FC1);
    Mat evalData(0, dictionarySize, CV_32FC1);
    k = 0;
    vector<KeyPoint> keypoint2;
    Mat bowDescriptor2;


    Mat results(0, 1, CV_32FC1);;
    for(j = 1; j <= 4; j++)
      for(i = 1; i <= 60; i++)
         {
           sprintf( ch, "%s%d%s%d%s", "eval/", j, " (",i,").jpg");
           const char* imageName = ch;
           img2 = cvLoadImage(imageName,0);
           detector.detect(img2, keypoint2);
           bowDE.compute(img2, keypoint2, bowDescriptor2);
           evalData.push_back(bowDescriptor2);
           groundTruth.push_back((float) j);
           float response = svm.predict(bowDescriptor2);
           results.push_back(response);
         }



    //calculate the number of unmatched classes 
    double errorRate = (double) countNonZero(groundTruth- results) / evalData.rows;

The question isThis code is not predicting from video , i want to know how to predict it from the video , mean like i want to detect the vehicle from movie , like it should show 1 when it find the vehicle from movie

For those who didn't understand the question :

I want to play a movie in above code

VideoCapture cap("movie.avi"); //movie.avi is with deleted background

Suppose i have a trained data which contain vehicle's , and "movie.avi" contain 5 vehicles , so it should detect that vehicles from the movie.avi and give me 5 as output

How to do this part in the above code

解决方案

From looking at your code setup

params.svm_type = CvSVM::C_SVC;

it appears that you train your classifier with more than two classes. A typical example in traffic scenario could be cars/pedestrians/bikes/... However, you were asking for a way to detect cars only. Without a description of your training data and your video it's hard to tell, if your idea makes sense. I guess what the previous answers are assuming is the following:

You loop through each frame and want to output the number of cars in that frame. Thus, a frame may contain multiple cars, say 5. If you take the whole frame as input to the classifier, it might respond "car", even if the setup might be a little off, conceptually. You cannot retrieve the number of cars reliably with this approach.

Instead, the suggestion is to try a sliding-window approach. This means, for example, you loop over each pixel of the frame and take the region around the pixel (called sub-window or region-of-interest) as input to the classifier. Assuming a fixed scale, the sub-window could have a size of 150x50px as well as your training data would. You might fixate the scale of the cars in your training data, but in real-world videos, the cars will be of different size. In order to find a car of different scale, let's say it's two-times as large as in the training data, the typical approach is to scale the image (say with a factor of 2) and repeat the sliding-window approach.

By repeating this for all relevant scales you end up with an algorithm that gives you for each pixel location and each scale the result of your classifier. This means you have three loops, or, in other words, there are three dimensions (image width, image height, scale). This is best understood as a three-dimensional pyramid. "Why a pyramid?" you might ask. Because each time the image is scaled (say 2) the image gets smaller (/larger) and the next scale is an image of different size (for eample half the size).

The pixel locations indicate the position of the car and the scale indicates the size of it. Now, if you have an N-class classifier, each slot in this pyramid will contain a number (1,...,N) indicating the class. If you had a binary classifier (car/no car), then you would end up with each slot containing 0 or 1. Even in this simple case, where you would be tempted to simply count the number of 1 and output the count as the number of cars, you still have the problem that there could be multiple responses for the same car. Thus, it would be better if you had a car detector that gives continous responses between 0 and 1 and then you could find maxima in this pyramid. Each maximum would indicate a single car. This kind of detection is successfully used with corner features, where you detect corners of interest in a so-called scale-space pyramid.

To summarize, no matter if you are simplifying the problem to a binary classification problem ("car"/"no car"), or if you are sticking to the more difficult task of distinguishing between multiple classes ("car"/"animal"/"pedestrian"/...), you still have the problem of scale and location in each frame to solve.

这篇关于如何使用SVM从视频检测对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆