推力等效于Open MP代码 [英] Thrust equivalent of Open MP code
问题描述
我要在开放式mp中并行化的代码是蒙特卡洛(Monte Carlo),其归结为如下所示:
int seed = 0;
std::mt19937 rng(seed);
double result = 0.0;
int N = 1000;
#pragma omp parallel for
for(i=0; x < N; i++)
{
result += rng()
}
std::cout << result << std::endl;
我想确保随机数生成器的状态在线程之间共享,并且结果的添加是原子的.
是否有一种方法可以用推力:: omp中的某些内容替换此代码.从我到目前为止所做的研究来看,看起来推力:: omp更像是一条指令,它使用多个CPU线程而不是GPU来执行某些标准推力操作.
是的,可以使用推力执行类似的操作,并在推力OMP下使用OMP线程在主机CPU上(并行)执行
我在这里显示的大部分内容都来自推力直接系统访问页面,但是有几种类似的方法可以访问OMP后端,具体取决于您是要使用灵活的,可重新定向的代码还是要专门使用一个代码OMP后端(这个专门针对OMP后端). 推力::还原操作可确保您正在寻找的原子性".具体来说,它保证了两个线程不会尝试同时更新单个位置.但是,我认为在多线程OMP应用程序中使用 Thrust还具有随机数生成器,旨在与之配合使用. /p> The code i'm trying to parallelize in open mp is a Monte Carlo that boils down to something like this: I want to make sure that the state of the random number generator is shared across threads, and the addition to the result is atomic. Is there a way of replacing this code with something from thrust::omp. From the research that I did so far it looks like thrust::omp is more of a directive to use multiple CPU threads rather than GPU for some standard thrust operations. Yes, it's possible to use thrust to do something similar, with (parallel) execution on the host CPU using OMP threads underneath the thrust OMP backend. Here's one example: For this test I used Fedora 20, with CUDA 6.5RC, running on a 4-core Xeon CPU (netting about a 3x speedup based on Much of what I show here was lifted from the thrust direct system access page but there are several comparable methods to access the OMP backend, depending on whether you want to have a flexible, retargettable code, or you want one that specifically uses the OMP backend (this one specifically targets OMP backend). The thrust::reduction operation guarantees the "atomicity" you are looking for. Specifically, it guarantees that two threads are not trying to update a single location at the same time. However the use of Thrust also has random number generators, which are designed to work with it. 这篇关于推力等效于Open MP代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!std::mt19937
超出了我的回答范围.如果使用您提供的代码创建普通的OMP应用程序,则由于(我认为)由于在多个OMP线程中使用std::mt19937
rng之间的某些交互作用,结果会出现差异.这不是推力可以为您解决的问题.int seed = 0;
std::mt19937 rng(seed);
double result = 0.0;
int N = 1000;
#pragma omp parallel for
for(i=0; x < N; i++)
{
result += rng()
}
std::cout << result << std::endl;
$ cat t535.cpp
#include <random>
#include <iostream>
#include <thrust/system/omp/execution_policy.h>
#include <thrust/system/omp/vector.h>
#include <thrust/reduce.h>
int main(int argc, char *argv[]){
unsigned N = 1;
int seed = 0;
if (argc > 1) N = atoi(argv[1]);
if (argc > 2) seed = atoi(argv[2]);
std::mt19937 rng(seed);
unsigned long result = 0;
thrust::omp::vector<unsigned long> vec(N);
thrust::generate(thrust::omp::par, vec.begin(), vec.end(), rng);
result = thrust::reduce(thrust::omp::par, vec.begin(), vec.end());
std::cout << result << std::endl;
return 0;
}
$ g++ -std=c++11 -O2 -I/usr/local/cuda/include -o t535 t535.cpp -fopenmp -lgomp
$ time ./t535 100000000
214746750809749347
real 0m0.700s
user 0m2.108s
sys 0m0.600s
$
time
results). There are probably some further "optimizations" that could be made for this particular code, but I think they will unnecessarily clutter the idea, and I assume that your actual application is more complicated than just summing random numbers.std::mt19937
in a multithreaded OMP app is outside the scope of my answer, I think. If I create an ordinary OMP app using the code you provided, I observe variability in the results due (I think) to some interaction between the use of the std::mt19937
rng in multiple OMP threads. This is not something thrust can sort out for you.