cudaMemcpyToSymbol性能 [英] cudaMemcpyToSymbol performance
问题描述
我有一些函数可以在恒定设备内存中加载变量并启动内核函数。
我注意到一个函数第一次在常量内存中加载变量需要0.6秒,但是下一次在常量内存中加载非常快(0.0008秒)。
无论哪种功能是主要功能,这种行为都会发生。
下面的代码示例:
I have some functions that load a variable in constant device memory and launch a kernel function. I noticed that the first time that one function load a variable in constant memory takes 0.6 seconds but the next loads on constant memory are very fast(0.0008 seconds). This behaviour occours regardless of which function is the first in the main. Below an example code:
__constant__ double res1;
__global__kernel1(...) {...}
void function1() {
double resHost = 255 / ((double) size);
CUDA_CHECK_RETURN(cudaMemcpyToSymbol(res1, &resHost, sizeof(double)));
//prepare and launch kernel
}
__constant__ double res2;
__global__kernel2(...) {...}
void function2() {
double resHost = 255 / ((double) size);
CUDA_CHECK_RETURN(cudaMemcpyToSymbol(res2, &resHost, sizeof(double)));
//prepare and launch kernel
}
int main(){
function1(); //takes 0.6 seconds for loading
function2(); // takes 0.0008 seconds for loading
function1(); //takes 0.0008 seconds for loading
return 0;
}
为什么会这样?我可以避免吗?
Why is this happening? Can I avoid it?
推荐答案
为什么会这样?
Why is this happening?
惰性运行时API上下文的建立和设置。
Lazy runtime API context establishment and setup.
我可以避免吗?
Can I avoid it?
不。第一个需要上下文的运行时API调用会导致大量的设置延迟,在您的情况下,这是第一个 cudaMemcpyToSymbol
调用。
No. The first runtime API call to require a context will incur significant setup latency, in your case that is the first cudaMemcpyToSymbol
call.
这篇关于cudaMemcpyToSymbol性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!