cudaMemcpyToSymbol性能 [英] cudaMemcpyToSymbol performance

查看:182
本文介绍了cudaMemcpyToSymbol性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些函数可以在恒定设备内存中加载变量并启动内核函数。
我注意到一个函数第一次在常量内存中加载变量需要0.6秒,但是下一次在常量内存中加载非常快(0.0008秒)。
无论哪种功能是主要功能,这种行为都会发生。
下面的代码示例:

I have some functions that load a variable in constant device memory and launch a kernel function. I noticed that the first time that one function load a variable in constant memory takes 0.6 seconds but the next loads on constant memory are very fast(0.0008 seconds). This behaviour occours regardless of which function is the first in the main. Below an example code:

        __constant__ double res1;

        __global__kernel1(...) {...}

        void function1() {
            double resHost = 255 / ((double) size);
            CUDA_CHECK_RETURN(cudaMemcpyToSymbol(res1, &resHost, sizeof(double)));


            //prepare and launch kernel
        }

        __constant__ double res2;

        __global__kernel2(...) {...}

        void function2() {
            double resHost = 255 / ((double) size);
            CUDA_CHECK_RETURN(cudaMemcpyToSymbol(res2, &resHost, sizeof(double)));


            //prepare and launch kernel
        }

        int main(){
            function1(); //takes 0.6 seconds for loading
            function2(); // takes 0.0008 seconds for loading
            function1(); //takes 0.0008 seconds for loading

            return 0;
        }

为什么会这样?我可以避免吗?

Why is this happening? Can I avoid it?

推荐答案


为什么会这样?

Why is this happening?

惰性运行时API上下文的建立和设置。

Lazy runtime API context establishment and setup.


我可以避免吗?

Can I avoid it?

不。第一个需要上下文的运行时API调用会导致大量的设置延迟,在您的情况下,这是第一个 cudaMemcpyToSymbol 调用。

No. The first runtime API call to require a context will incur significant setup latency, in your case that is the first cudaMemcpyToSymbol call.

这篇关于cudaMemcpyToSymbol性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆