如果行程计数不是恒定的,为什么#pragma-unrolled循环的性能会降低? [英] Why does the performance of my #pragma-unrolled loop degrade if the trip count is not constant?

查看:145
本文介绍了如果行程计数不是恒定的,为什么#pragma-unrolled循环的性能会降低?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下使用循环展开的代码:

I have following code using loop unrolling:

#pragma unroll
for (int i=0;i<n;i++)
{
    ....
}

在这里,如果n是定义的常数,则一切正常.但是,如果n是变量,则性能会大大降低.我注意到大约发出了3次指令.我想我正在寻找一种在运行时进行循环展开的方法,这可能不可行.

here if n is a defined constant, everything works fine. However, if n is a variable, performance dramatically reduced. I noticed roughly 3 times the instructions are issued and executed. I guess I am looking for a way to do loop unrolling at run time, may be that's just not feasible.

推荐答案

CUDA是一种编译语言.循环展开是编译器的优化.运行时循环展开将意味着某种类型的运行时解释器或动态代码生成.显然这不可能发生.

CUDA is a compiled language. Loop unrolling is a compiler optimization. Runtime loop unrolling would imply some sort of runtime interpreter or dynamic code generation. That clearly can't happen.

与未用过的循环相比, unrolled 情况执行的指令数量更多或更多,这是有道理的,因为编译器将用循环内容的重复替换循环.如果展开的情况执行 less 指令,则意味着编译器正在预先计算部分或全部循环内容,并用恒定的结果替换代码.

It would make sense that the unrolled case executes as many or more instructions than the naïve loop, because the compiler will replace the loop with repetitions of the loop contents. If the unrolled case executes less instructions, that would imply that the compiler is pre-calculating some or all of the loop contents and replacing code with a constant result.

这完全取决于循环中包含的内容.

It all depends on what is contained in the loop.

这篇关于如果行程计数不是恒定的,为什么#pragma-unrolled循环的性能会降低?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆