推力等效于Open MP代码 [英] Thrust equivalent of Open MP code

查看:96
本文介绍了推力等效于Open MP代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要在开放式mp中并行化的代码是蒙特卡洛(Monte Carlo),其归结为如下所示:

int seed = 0;
std::mt19937 rng(seed); 
double result = 0.0;
int N = 1000;

#pragma omp parallel for
for(i=0; x < N; i++)
{
    result += rng()
}
std::cout << result << std::endl;

我想确保随机数生成器的状态在线程之间共享,并且结果的添加是原子的.

是否有一种方法可以用推力:: omp中的某些内容替换此代码.从我到目前为止所做的研究来看,看起来推力:: omp更像是一条指令,它使用多个CPU线程而不是GPU来执行某些标准推力操作.

解决方案

是的,可以使用推力执行类似的操作,并在推力OMP下使用OMP线程在主机CPU上(并行)执行

我在这里显示的大部分内容都来自推力直接系统访问页面,但是有几种类似的方法可以访问OMP后端,具体取决于您是要使用灵活的,可重新定向的代码还是要专门使用一个代码OMP后端(这个专门针对OMP后端).

推力::还原操作可确保您正在寻找的原子性".具体来说,它保证了两个线程不会尝试同时更新单个位置.但是,我认为在多线程OMP应用程序中使用std::mt19937超出了我的回答范围.如果使用您提供的代码创建普通的OMP应用程序,则由于(我认为)由于在多个OMP线程中使用std::mt19937 rng之间的某些交互作用,结果会出现差异.这不是推力可以为您解决的问题.

Thrust还具有随机数生成器,旨在与之配合使用. /p>

The code i'm trying to parallelize in open mp is a Monte Carlo that boils down to something like this:

int seed = 0;
std::mt19937 rng(seed); 
double result = 0.0;
int N = 1000;

#pragma omp parallel for
for(i=0; x < N; i++)
{
    result += rng()
}
std::cout << result << std::endl;

I want to make sure that the state of the random number generator is shared across threads, and the addition to the result is atomic.

Is there a way of replacing this code with something from thrust::omp. From the research that I did so far it looks like thrust::omp is more of a directive to use multiple CPU threads rather than GPU for some standard thrust operations.

解决方案

Yes, it's possible to use thrust to do something similar, with (parallel) execution on the host CPU using OMP threads underneath the thrust OMP backend. Here's one example:

$ cat t535.cpp
#include <random>
#include <iostream>
#include <thrust/system/omp/execution_policy.h>
#include <thrust/system/omp/vector.h>
#include <thrust/reduce.h>

int main(int argc, char *argv[]){
  unsigned N = 1;
  int seed = 0;
  if (argc > 1)  N = atoi(argv[1]);
  if (argc > 2)  seed = atoi(argv[2]);
  std::mt19937 rng(seed);
  unsigned long result = 0;

  thrust::omp::vector<unsigned long> vec(N);
  thrust::generate(thrust::omp::par, vec.begin(), vec.end(), rng);
  result = thrust::reduce(thrust::omp::par, vec.begin(), vec.end());
  std::cout << result << std::endl;
  return 0;
}
$ g++ -std=c++11 -O2 -I/usr/local/cuda/include -o t535 t535.cpp -fopenmp -lgomp
$ time ./t535 100000000
214746750809749347

real    0m0.700s
user    0m2.108s
sys     0m0.600s
$

For this test I used Fedora 20, with CUDA 6.5RC, running on a 4-core Xeon CPU (netting about a 3x speedup based on time results). There are probably some further "optimizations" that could be made for this particular code, but I think they will unnecessarily clutter the idea, and I assume that your actual application is more complicated than just summing random numbers.

Much of what I show here was lifted from the thrust direct system access page but there are several comparable methods to access the OMP backend, depending on whether you want to have a flexible, retargettable code, or you want one that specifically uses the OMP backend (this one specifically targets OMP backend).

The thrust::reduction operation guarantees the "atomicity" you are looking for. Specifically, it guarantees that two threads are not trying to update a single location at the same time. However the use of std::mt19937 in a multithreaded OMP app is outside the scope of my answer, I think. If I create an ordinary OMP app using the code you provided, I observe variability in the results due (I think) to some interaction between the use of the std::mt19937 rng in multiple OMP threads. This is not something thrust can sort out for you.

Thrust also has random number generators, which are designed to work with it.

这篇关于推力等效于Open MP代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆