CUDA - 可视化分析器和控制流发散 [英] CUDA - Visual Profiler and Control Flow Divergence

查看:286
本文介绍了CUDA - 可视化分析器和控制流发散的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在剖析我的CUDA应用程序,我遇到了一些我不明白的Visual Profiler中存在的控制流发散指标。

I'm profiling my CUDA application, and I've come across something that I don't understand about the "Control Flow Divergence" metric that is present in the Visual Profiler.

根据用户指南:


控制流发散给出了未执行的线程指令的百分比

我在CUDA内核中有以下代码:

I've got the following code in my CUDA kernel:

int var;
var = tex2D(texture, x, y); // texture fetch
if(var < 0) {
    var *= -1;
    results[(blockIdx.x*blockDim.x) + threadIdx.x] = var; // global memory array
}

(我检查了全局内存中的值),但是分析器声明控制流散度为34%。
如果在同一个分支上插入一个printf,那么值将跳转到43%(奇怪的是执行时间也会增加),尽管在stdout上没有发生。
这是否意味着该度量标准考虑了所有内核的指令,即使没有任何线程执行的指令?

Here's what happens: not a single thread enters the branch (I checked the values in global memory), but the profiler states that control flow divergence is 34%. If on that same branch I insert a printf, then the value jumps to 43% (and oddly the execution time increases as well), despite nothing happening on stdout. Does this mean that the metric takes into account all of the kernel's instructions, even the ones not executed by any thread? (so effectively not having warp divergence)

推荐答案

在这两种情况下,Divergent Branches指标为0%解决方案

解决方案

您使用的是什么版本?听起来您使用的是旧版本,因此可能需要更新到更新版本(例如4.2或5.0 - 后者目前是发布候选版本)。

What version are you using? It sounds like you're using an old version so it may be worth updating to a more recent version (e.g. 4.2 or 5.0 - the latter is currently a release candidate).

如果你能够更新到CUDA 5.0的Visual Profiler,那么你通过分析特定的内核,你可以让分析器突出显示你的内核中遭受分歧的特定行(对于非合并的内存访问)。您需要使用debug(-G)或者如果您想要发布发布代码,使用行信息(-lineinfo)来编译您的代码。

If you're able to update to the CUDA 5.0 the Visual Profiler then you by analysing the specific kernel you can have the profiler highlight the specific lines in your kernel that are suffering from divergence (same for non-coalesced memory accesses). You'll need to compile your code with either debug (-G) or, if you want to profile release code, with line info (-lineinfo) for this to work.

这篇关于CUDA - 可视化分析器和控制流发散的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆