nvprof没有拾取任何API调用或内核 [英] nvprof not picking up any API calls or kernels
问题描述
我正在尝试使用nvprof在CUDA程序中获得一些基准测试时间,但不幸的是,它似乎并未分析任何API调用或内核。我寻找了一个简单的初学者示例,以确保自己做得对,并在Nvidia开发博客上找到了一个示例:
I'm trying to get some benchmark timings in my CUDA program with nvprof but unfortunately it doesn't seem to be profiling any API calls or kernels. I looked for a simple beginners example to make sure I was doing it right and found one on the Nvidia dev blogs here:
https://devblogs.nvidia.com/parallelforall/how-optimize-data-transfers-cuda-cc/
代码:
int main()
{
const unsigned int N = 1048576;
const unsigned int bytes = N * sizeof(int);
int *h_a = (int*)malloc(bytes);
int *d_a;
cudaMalloc((int**)&d_a, bytes);
memset(h_a, 0, bytes);
cudaMemcpy(d_a, h_a, bytes, cudaMemcpyHostToDevice);
cudaMemcpy(h_a, d_a, bytes, cudaMemcpyDeviceToHost);
return 0;
}
命令行:
-bash-4.2$ nvcc profile.cu -o profile_test
-bash-4.2$ nvprof ./profile_test
所以我逐字逐句地复制了它,并运行了相同的命令行参数。不幸的是,我的结果是相同的:
So I replicated it word for word, line by line, and ran identical command line arguments. Unfortunately my result was the same:
-bash-4.2$ nvprof ./profile_test
==85454== NVPROF is profiling process 85454, command: ./profile_test
==85454== Profiling application: ./profile_test
==85454== Profiling result:
No kernels were profiled.
==85454== API calls:
No API activities were profiled.
我正在运行Nvidia工具包7.5
I am running Nvidia toolkit 7.5
如果有人知道我在做什么错,我将不胜感激知道答案。
If anyone knows what what I'm doing wrong I'd be grateful to know the answer.
----- EDIT - ---
-----EDIT-----
所以我将代码修改为
#include<cuda_profiler_api.h>
int main()
{
cudaProfilerStart();
const unsigned int N = 1048576;
const unsigned int bytes = N * sizeof(int);
int *h_a = (int*)malloc(bytes);
int *d_a;
cudaMalloc((int**)&d_a, bytes);
memset(h_a, 0, bytes);
cudaMemcpy(d_a, h_a, bytes, cudaMemcpyHostToDevice);
cudaMemcpy(h_a, d_a, bytes, cudaMemcpyDeviceToHost);
cudaProfilerStop();
return 0;
}
不幸的是,它没有改变。
Unfortunately it did not change things.
推荐答案
这是统一内存分析的错误,标志
It's a bug with unified memory profiling, the flag
--unified-memory-profiling off ./profile_test
为我解决了所有问题。
resolves all problems for me.
这篇关于nvprof没有拾取任何API调用或内核的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!