解释ptxas的详细输出，第一部分 [英] Interpreting the verbose output of ptxas, part I

查看：855 发布时间：2020/5/8 19:04:22 memory cuda gpu-constant-memory ptxas

本文介绍了解释ptxas的详细输出，第一部分的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图了解手写内核的每个CUDA线程的资源使用情况.

I am trying to understand resource usage for each of my CUDA threads for a hand-written kernel.

我用nvcc -arch=sm_20 -ptxas-options=-v将kernel.cu文件编译为kernel.o文件

，我得到以下输出(通过c++filt传递):

and I got the following output (passed through c++filt):

ptxas info    : Compiling entry function 'searchkernel(octree, int*, double, int, double*, double*, double*)' for 'sm_20'
ptxas info    : Function properties for searchkernel(octree, int*, double, int, double*, double*, double*)
    72 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 46 registers, 176 bytes cmem[0], 16 bytes cmem[14]

看看上面的输出，

每个CUDA线程正在使用46个寄存器吗?
没有寄存器溢出到本地存储器吗?

我在理解输出方面也遇到了一些问题.

I am also having some issues with understanding the output.

我的内核正在调用很多__device__函数. IS总和为72个字节 __global__和__device__函数的堆栈帧的内存量是多少?

My kernel is calling a whole lot of __device__ functions. IS 72 bytes the sum-total of the memory for the stack frames of the __global__ and __device__ functions?

0 byte spill stores和0 bytes spill loads

为什么cmem的信息(我假设是恒定内存)以不同的数字重复了两次?在内核中，我没有使用任何常量记忆.这是否意味着编译器要在内幕下告诉GPU使用一些恒定内存?

Why is the information for cmem (which I am assuming is constant memory) repeated twice with different figures? Within the kernel I am not using any constant memory. Does that mean the compiler is, under the hood, going to tell the GPU to use some constant memory?

_{以下问题中继续":解释ptxas的详细输出，第二部分}

解释ptxas的详细输出，第一部分 [英] Interpreting the verbose output of ptxas, part I

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

解释ptxas的详细输出，第一部分 [英] Interpreting the verbose output of ptxas, part I

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭