计算教授的字段不连贯和连贯的gst / gld? (CUDA / OpenCL) [英] Compute Prof's fields for incoherent and coherent gst/gld? (CUDA/OpenCL)

查看:255
本文介绍了计算教授的字段不连贯和连贯的gst / gld? (CUDA / OpenCL)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是Compute Prof 3.2和一个Geforce GTX 280.我有计算能力1.3然后我相信。

I am using Compute Prof 3.2 and a Geforce GTX 280. I have compute capability 1.3 then I believe.

此文件,似乎表明我应该能够看到这些字段,因为我使用的是1.x计算设备。好吧,我没有看到他们,3.2工具包的用户指南说,我看不到他们,但调用 gst_uncoalesced gst_coalesced

This file, seems to show that I should be able to see these fields since I am using a 1.x compute device. Well I don't see them and the User Guide for 3.2 toolkit says I can't see them, but calls them gst_uncoalesced and gst_coalesced.

总而言之,如果我从全局内存进行非合并读取,我对于如何从分析器中得出结论感到困惑。它看起来不像费马卡会说,但我现在不担心他们。如果任何人可以详细说明情况,我会很感激。

To sum up, I am confused about how I should figure out from the profiler if I am making non-coalesced reads from global memory. It doesn't look like Fermi cards will say either, but I am not worried about them for now. If anybody can elaborate on the situation I would appreciate it.

此外,我被告知要看看我的内核的装配,以计算这个东西,所以任何阐述如何做到这一点也很感激。我只是开始尝试和数字的东西了:)

Also, I've been told to look at the assembly of my kernels to figure this stuff out, so any elaboration on how to do this is appreciated too. I am just starting to try and figure that stuff out too :)

推荐答案

我有类似的问题与分析输出。而在8600(计算能力1.0),它显示了合并和非聚合读/写,它显示只有合并在GTX280。我认为这是由于更好的合并在gtx 280使切割不太清楚(是一个内存读取,除了一个单词不需要解开?)。但是,您只能查看汇总表。你会发现每个内核的负载和存储效率。如果所有访问合并,效率应为1,否则其小于1(0.5,意味着仅使用一半的加载字节)。

I had similar problems with the profiling output. While on a 8600 (compute capability 1.0) it showed both coalesced and uncoalesced reads/writes, it showed only coalesced on GTX280. I assumed that was due to the better coalescing on the gtx 280 making the cut less clear (is a memory read for which all but one word is not needed uncoalesced?). However you can just look into the summary table. There you find a load and a store efficieny for each kernel. If all accesses are coalesced that efficiency should be 1, otherwise its less then one (0.5 meaning that only half of the loaded bytes are used).

当然,由于这并不能帮助你了解你的内部访问是在哪里,所以最好的方法仍然是知道合并的工作原理每个halfwarp被收集到32,64和128字节访问,没有访问的值在该区域内传输无论如何),并分析您的访问模式仍然是走到底的方式。

Of course since that doesn't help you much figuring out where exactly your uncoalesced accesses are inside your kernel, the best way is still knowing how the coalescing works (addresses of each halfwarp are gathered into 32, 64 and 128byte accesses, not accessed values inside that area are transferred anyways) and analysing your accesspatterns is still the way to go in the end.

这篇关于计算教授的字段不连贯和连贯的gst / gld? (CUDA / OpenCL)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆