循环展开在CUDA [英] loop unrolling in CUDA

查看:283
本文介绍了循环展开在CUDA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码使用循环展开:

I have following code using loop unrolling:

#pragma unroll
for (int i=0;i<n;i++)
{
    ....
}

这里如果n是一个定义的常数,一切工作正常。然而,如果n是一个变量,性能大大降低。我注意到大约3次发出和执行的指令。我想我正在寻找一种方法在运行时执行循环展开,可能是不可行的。

here if n is a defined constant, everything works fine. However, if n is a variable, performance dramatically reduced. I noticed roughly 3 times the instructions are issued and executed. I guess I am looking for a way to do loop unrolling at run time, may be that's just not feasible.

推荐答案

CUDA是编译语言。循环展开是一种编译器优化。运行时循环展开意味着某种运行时解释器或动态代码生成。这显然不会发生。

CUDA is a compiled language. Loop unrolling is a compiler optimization. Runtime loop unrolling would imply some sort of runtime interpreter or dynamic code generation. That clearly can't happen.

这样做是有意义的,因为展开的情况执行的命令多于或多于天真的循环,因为编译器将用循环内容的重复来替换循环。如果展开的情况执行较少指令,那意味着编译器预先计算一些或所有循环内容并用常量结果替换代码。

It would make sense that the unrolled case executes as many or more instructions than the naïve loop, because the compiler will replace the loop with repetitions of the loop contents. If the unrolled case executes less instructions, that would imply that the compiler is pre-calculating some or all of the loop contents and replacing code with a constant result.

这取决于循环中包含的内容。

It all depends on what is contained in the loop.

这篇关于循环展开在CUDA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆