Nvidia GPU上的OpenCL内核每个线程使用多少寄存器? [英] How much registers per thread does OpenCL kernel use on Nvidia GPU?

查看:146
本文介绍了Nvidia GPU上的OpenCL内核每个线程使用多少寄存器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的第一个问题是如何在Nvidia GPU上获取OpenCL内核代码的寄存器使用信息,因为nvcc编译器使用CUDA内核代码的nvcc --ptxas-options=-v标志提供了相同的信息.

My First Question is How to get registers used information for OpenCL kernel code on Nvidia GPU, as nvcc complier gives the same using nvcc --ptxas-options=-v flag for CUDA kernel code.

在导出GPU_DUMP_DEVICE_KERNEL=3之后,我还从运行程序时生成的.isa file中获得了有关适用于OpenCL内核的AMD GPU的相同信息.我在Nvidia GPU上也尝试过同样的事情,但是没有得到.isa file.我的第二个问题是,为什么Nvidia GPU不生成.isa file?

I also got the same information on AMD GPU for OpenCL kernel, from .isa file generated while running the program, after exporting GPU_DUMP_DEVICE_KERNEL=3. Same thing i also tried on Nvidia GPU but it did not get .isa file. My second question is that why Nvidia GPU not generating .isa file ?

在谷歌搜索之后,我发现在Nvidia GPU上获取寄存器和共享内存的OpenCL内核使用的信息的方法是在clBuildProgram()函数调用中使用cl-nv-verbose字符串标志.然后读取已编译内核代码的二进制"信息. 我的第三个问题是在Nvidia GPU上获取寄存器使用信息的正确方法吗?获得相同的其他方式是什么?

After googling I found the way to get registers and shared memory used information for OpenCL kernel on Nvidia GPU is to use cl-nv-verbose string flag into the clBuildProgram() function call. And then read "binaries" information of complied kernel code. My third question Is it correct way to get registers used information on Nvidia GPU? What are the others way to get same ?

//正在构建程序...

//Building the program...

clBuildProgram(program, 1, &device_id, "-cl-nv-verbose", NULL, NULL);

构建程序后,我在clGetProgramInfo()函数中使用了两个常量CL_PROGRAM_BINARY_SIZES and CL_PROGRAM_BINARIES来获取已编译内核代码的二进制文件.

after building the program i used two constants CL_PROGRAM_BINARY_SIZES and CL_PROGRAM_BINARIES into the clGetProgramInfo() function to get binaries of compiled kernel code.

//正在打印已编译的内核代码的二进制文件...

// Printing Binaries of complied kernel code...

cl_uint program_num_devices, ret;
    size_t t;
    ret = clGetProgramInfo(program, CL_PROGRAM_NUM_DEVICES, sizeof(cl_uint), &program_num_devices, NULL);
    if(program_num_devices == 0) {
            printf("No valid device was found \n");
            return ;
    }
    size_t binary_sizes[program_num_devices];
    char **binaries = (char **) malloc(program_num_devices * sizeof(char* ));
    //first call to get size of ISA binary file...
    ret = clGetProgramInfo(program, CL_PROGRAM_BINARY_SIZES, program_num_devices * sizeof(size_t), &binary_sizes, NULL);
    for(t = 0; t < program_num_devices; t++) {
            binaries[t] = (char *) malloc((binary_sizes[t] + 1) * sizeof(char));
    }
    //second call to get ISA info....
    ret = clGetProgramInfo(program, CL_PROGRAM_BINARIES, program_num_devices * sizeof(size_t), binaries, NULL);
    for(t = 0; t < program_num_devices; t++) {
            binaries[t][binary_sizes[t]] = '\0';
            printf("Binary ISA Info%s : %lu \n", binaries[t], binary_sizes[t]);
    }
    printf("ProgramNumDevices:: %u\n", program_num_devices);
    for(t = 0; t < program_num_devices; t++) {
            free(binaries[t]);
    }

这将打印我已编译的OpenCl内核代码的二进制文件".但是它不显示寄存器和共享内存使用的信息.为什么?

This is printing "binaries" of my complied OpenCl kernel code. But it is not displaying registers and shared memory used information. Why?

请分享一些有用的信息.

Please share some useful informations .

在此先感谢!!!!

推荐答案

通过快速搜索,看起来像是使用-cl-nv-verbose构建程序之后,使用clGetProgramBuildInfo(...,CL_PROGRAM_BUILD_LOG,...)得到的详细输出.

From a quick search, it looks like that after building the program with -cl-nv-verbose, you get the verbose output with clGetProgramBuildInfo(...,CL_PROGRAM_BUILD_LOG,...).

这篇关于Nvidia GPU上的OpenCL内核每个线程使用多少寄存器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆