在openacc中将memcpy用于设备阵列 [英] use memcpy for device arrays in openacc

查看:175
本文介绍了在openacc中将memcpy用于设备阵列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请帮忙. 1)我需要使用memcpy来移动在gpu上分配的数组.我不能使用std :: memcpy,因为它没有acc例程"(编译器输出).我的代码是

Please, help. 1) I need to use memcpy for moving the arrays allocated on the gpu. i can not use std::memcpy because it "has no acc routine" (compiler output). My code is

const int GL=100000;
Particle particles[GL];
int cp01[2][GL];
#pragma acc declare create(particles,cp01)
...

我读到cudaMemcpy可以与openacc一起使用.在function_device()(无法填充分配的数组中在GPU上),我从主持人那里致电

i read that cudaMemcpy can be used with openacc. In function_device() (not able to fill the array allocated on the gpu) i call from the host

#pragma acc data copy(cp)
{
  cudaMemcpy(&particles[cp01[0][0]],&particles[cp01[1][0]],cp*sizeof(Particle),cudaMemcpyDeviceToDevice);
}

我使用标题

#include <cuda_runtime.h>

用于使用CUDA.并将项目构建为

for using CUDA. And build the project as

 cmake ../src -DCMAKE_CXX_COMPILER=pgc++ -DCMAKE_CXX_FLAGS="-acc -Minfo=all -Mcuda=llvm"

该程序可以编译,但是无法正常工作,它在控制台行中挂起,没有任何输出. 如何移动在设备上分配的阵列(使用cudaMemcpy或其他方式)?那是否包括足够使用CUDA的内容?我是否正确构建项目(是否需要使用-Mcuda = llvm)? 2)我还有另一个问题:如果有人写

The program compiles, but does not work, it hangs with no output in the console line. How to move arrays allocated on the device (using cudaMemcpy or in some another manner)? Is that one include enough for using CUDA? Do i build the project correctly (using -Mcuda=llvm is necessary or not)? 2) i also have another question: if one writes

#pragma acc parallel loop
for(int i=0; i<N; ++i)
{...}

变量N必须仅在主机上分配,或者也可以在gpu上分配?

the variable N must be allocated on the host only or it may be also on the gpu?

推荐答案

由于"cudaMemcpy"是主机端调用,您希望在其中传递设备指针,因此需要使用"host_data"指令.无需复制"cp",因为您将要使用主机值.另外,请确保主机名"cp01"是最新的.

Since "cudaMemcpy" is a host side call where you want to pass in the device pointers, you'll want to use a "host_data" directive. No need to copy "cp" since you'll want to use the host value. Also make sure the host values of "cp01" are current.

类似以下内容:

#pragma acc host_data use_device(particles) 
  { 
  cudaMemcpy(&particles[cp01[0][0]],&particles[cp01[1] [0]],cp*sizeof(Particle),cudaMemcpyDeviceToDevice); 
  }  

这篇关于在openacc中将memcpy用于设备阵列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆