使用“重叠",“内核时间"和“利用率"来优化内核 [英] Using `overlap`, `kernel time` and `utilization` to optimize one's kernels

查看:60
本文介绍了使用“重叠",“内核时间"和“利用率"来优化内核的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的内核归档文件的利用率为100%,但是kernel time仅为3%,并且有no time overlap between memory copies and kernels.

My kernel archive 100% utilization, but the kernel time is at only 3% and there is no time overlap between memory copies and kernels.

特别是高利用率和低内核时间对我来说毫无意义.

Especially the high utilization and the low kernel time don't make sense to me.

那我应该如何继续优化我的内核?

So how should I proceed in optimizing my kernel?

我已经确定,我只有合并和固定的内存访问权限,如推荐的探查器一样.

I already made sure, that I only have coalesced and pinned memory access, like the profiler recommended.

`Quadro FX 580 utilization = 100.00% (62117.00/62117.00)`

Kernel time = 3.05 % of total GPU time 
Memory copy time = 0.9 % of total GPU time
Kernel taking maximum time = Pinned (0.7% of total GPU time)
Memory copy taking maximum time = memcpyHtoD (0.5% of total GPU time)
There is no time overlap between memory copies and kernels on GPU

更进一步,我没有翘曲序列化,没有分支分支,也没有占用限制因素.

Furtermore I have no warp serialization, no divergent branches, and no occupancy limiting factor.

Kernel details: Grid size: [4 1 1], Block size: [256 1 1]
Register Ratio: 0.9375 ( 7680 / 8192 ) [10 registers per thread]
Shared Memory Ratio: 0.09375 ( 1536 / 16384 ) [60 bytes per Block]
Active Blocks per SM: 3 (Maximum Active Blocks per SM: 8)
Active threads per SM: 768 (Maximum Active threads per SM: 768)
Potential Occupancy: 1 ( 24 / 24 )
Achieved occupancy: 0.333333 (on 4 SMs)
Occupancy limiting factor: None

p.s.我不声称自己写了wundercode,但我只是不知道如何从这里开始.

p.s. I don't claim that I wrote wundercode, but I just don't know how to proceed from here.

推荐答案

看来内核的网格大小太小,无法充分利用SM. 为什么不减少块大小并增加网格大小. 我认为这会有所帮助.

it seems the grid size of your kernel is too small to make full use of SM. why not decrease block size and increase the grid size. i think it will do some help.

这篇关于使用“重叠",“内核时间"和“利用率"来优化内核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆