在OpenACC中正确使用device_type [英] Correct use of device_type in OpenACC
问题描述
我有一个 for
循环,如果目标硬件是NVIDIA,我想将它与OpenACC并行化,或者当目标硬件是AMD时,以串行方式运行它.我尝试了以下方法:
I have a for
loop and I want to parallelize it with OpenACC if the target hardware is NVIDIA, or run it serially when the target hardware is AMD. I tried the following:
#pragma acc loop \
device_type(tesla) parallel \
device_type(radeon) seq
for (int z = 0; z < size_z; ++z)
{
// do stuff...
}
编译为: pgc ++ -std = c ++ 11 -O4 -ta = tesla -Minfo:accel main.cpp
但是在并行化报告中,我得到:< line_number> ;, #pragma acc循环序列
But on the parallelization report I get: <line_number>, #pragma acc loop seq
似乎编译器仅考虑指令的最后一行.知道为什么会这样吗?
It appears that the compiler only takes into account the last line of the directive. Any idea why is this happening?
运行 pgc ++ --version
会显示以下内容:
pgc ++ 16.10-0 x86-64 Linux -tp sandybridge上的64位目标
推荐答案
您正确使用了"device_type",但我们(PGI)仍然缺少一些OpenACC功能,包括通过"device_type"子句定义多个循环时间表.PGI发行说明的第4.4节列出了当前的限制: http://www.pgroup.com/doc/pgirn-x64.pdf
You're using "device_type" correctly but we (PGI) are still missing a few OpenACC features including defining multiple loop schedules via the "device_type" clause. The current limitations are listed in section 4.4 of the PGI release notes: http://www.pgroup.com/doc/pgirn-x64.pdf
这篇关于在OpenACC中正确使用device_type的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!