使用openmp& amp;并行化for循环替换push_back [英] Parallelizing a for loop using openmp & replacing push_back

查看:1167
本文介绍了使用openmp& amp;并行化for循环替换push_back的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



  std:

我想并行化下面这段代码,但是对于openmp和创建并行代码来说, :矢量< DMatch> good_matches;如果(matches_RM [i] .distance <3 * min_dist){
good_matches.push_back(matches_RM [i]一世]);




$ b $ p



  std :: vector< DMatch> good_matches; 
#pragma omp parallel for
for(int i = 0; i< descriptors_A.rows; i ++){
if(matches_RM [i] .distance< 3 * min_dist){
good_matches [i] = matches_RM [i];












  std :: vector< DMatch> good_matches; 
cv :: DMatch temp;
#pragma omp parallel for
for(int i = 0; i< descriptors_A.rows; i ++){
if(matches_RM [i] .distance< 3 * min_dist){
temp = matches_RM [i];
good_matches [i] = temp;
//还有good_matches.push_back(temp);

$ / code>

我也试过了

  #omp parallel critical 
good_matches.push_back(matches_RM [i]);

这个条款可行,但是不会加快速度。可能是这个for循环不能加快,但如果可以的话,它会很好。我也想加快速度,以及

pre $ std :: vector< Point2f>目标,现场;
for(int i = 0; i< good_matches.size(); i ++){
obj.push_back(keypoints_A [good_matches [i] .queryIdx] .pt);
scene.push_back(keypoints_B [good_matches [i] .trainIdx] .pt);





如果这个问题已经回答,并且非常感谢任何人帮助。

解决方案

一个可能性可能是为每个线程使用私有向量并在最后将它们组合在一起:

 #include  

#include< algorithm>
#include< iterator>
#包含< iostream>
#include< vector>

使用namespace std;

int main()
{
vector< int> global_vector;
vector<矢量<诠释> >缓冲区;

#pragma omp parallel
{
auto nthreads = omp_get_num_threads();
auto id = omp_get_thread_num();
//
//正确设置缓冲区的数量
//
#pragma omp single
{
buffers.resize(nthreads);

//
//每个线程都在其块上运行
//如果订单很重要维护计划static
//
#pragma omp for (ii)如果(ii%2!= 0){//任何其他的条件将会做
缓冲器[ID] .push_back(II);

$ b //
//将缓冲区组合在一起
//
#pragma omp single
{
for( auto& buffer:buffers){
move(buffer.begin(),buffer.end(),back_inserter(global_vector));


$ b //
//打印结果
//
(auto& x:global_vector){
cout<< x<< ENDL;
}
return 0;





$ b实际的加速只取决于每个循环内部完成的工作量。


I'd like to parallelize the following piece of code but am new to openmp and creating parallel code.

std::vector<DMatch> good_matches;
for (int i = 0; i < descriptors_A.rows; i++) {
   if (matches_RM[i].distance < 3 * min_dist) {
      good_matches.push_back(matches_RM[i]);
   }
}

I have tried

std::vector<DMatch> good_matches;
#pragma omp parallel for
for (int i = 0; i < descriptors_A.rows; i++) {
   if (matches_RM[i].distance < 3 * min_dist) {
      good_matches[i] = matches_RM[i];
   }
}

and

std::vector<DMatch> good_matches;
cv::DMatch temp;
#pragma omp parallel for
for (int i = 0; i < descriptors_A.rows; i++) {
   if (matches_RM[i].distance < 3 * min_dist) {
      temp = matches_RM[i];
      good_matches[i] = temp;
      // AND ALSO good_matches.push_back(temp);
   }

I have also tried

#omp parallel critical 
good_matches.push_back(matches_RM[i]);

This clause works but does not speed anything up. It may be the case that this for loop cannot be sped up but it'd be great if it can be. I'd also like to speed this up as well

std::vector<Point2f> obj, scene;
for (int i = 0; i < good_matches.size(); i++) {
   obj.push_back(keypoints_A[good_matches[i].queryIdx].pt);
   scene.push_back(keypoints_B[good_matches[i].trainIdx].pt);
}

Apologies if this question as been answered and thank you very much to anyone who can help.

解决方案

One possibility may be to use private vectors for each thread and combine them in the end:

#include<omp.h>

#include<algorithm>
#include<iterator>
#include<iostream>
#include<vector>

using namespace std;

int main()
{
  vector<int> global_vector;  
  vector< vector<int> > buffers;

  #pragma omp parallel
  {
    auto nthreads = omp_get_num_threads();
    auto id = omp_get_thread_num();
    //
    // Correctly set the number of buffers
    //
  #pragma omp single
    {
      buffers.resize( nthreads );
    }
    //
    // Each thread works on its chunk
    // If order is important maintain schedule static
    //
  #pragma omp for schedule(static)
    for(size_t ii = 0; ii < 100; ++ii) {      
      if( ii % 2 != 0 ) { // Any other condition will do
          buffers[id].push_back(ii);
      }
    }
    //
    // Combine buffers together
    //
    #pragma omp single
    {
      for( auto & buffer : buffers) {
        move(buffer.begin(),buffer.end(),back_inserter(global_vector));
      }
    }
  }
  //
  // Print the result
  //
  for( auto & x : global_vector) {
    cout << x << endl;
  }    
  return 0;
}

The actual speed-up depends only on the amount of work done inside each loop.

这篇关于使用openmp&amp; amp;并行化for循环替换push_back的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆