计算内核code的运行时间OpenCL中Ç [英] Calculate run time of kernel code in OpenCL C

查看:444
本文介绍了计算内核code的运行时间OpenCL中Ç的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想衡量我的内核code对各种设备即CPU和GPU的性能(读取运行时间)。内核code,我写的是:

  __内核无效dataParallel(__全球INT * A)
{
    睡眠(10);
    A [0] = 2;
    A [1] = 3;
    A [2] = 5;
    INT PNP; // PNP =可能的下一任
    INT pprime; // previous素
    INT I,J;
    对于(i = 3; I< 500;我++)
    {
        J = 0;
        pprime = A [I-1];
        PNP = pprime + 2;
        而((J<我)及和放大器; A [J]< =开方((浮点)PNP))
        {
            如果(PNP%A [J] == 0)
                {
                    PNP + = 2;
                    J = 0;
                }
            J ++;        }
        A [i] = PNP;    }
}

不过,我已被告知,这是不可能在内核code使用睡眠()。如果这是真的,那么能有人给的原因,如果不是请告诉实现相同的方式。

此外,正如我所说,我想比较我的CPU性能和GPU的,要实现这一目标的途径之一是通过计算内核code的运行时的各种设备上,而如果有另一种方式,我可以得到code开始就在同一时间所有的设备上执行的话,我会只需要列出执行相应的结束时间,将服务宗旨,以及!是否可以?

硬件详细信息:

GPU:AMD的FirePro W7000,NVIDIA TESLA C2075
CPU:英特尔(R)至强(R)CPU X5660 @ 2.80GHZn


解决方案

  

不过,我已被告知,这是不可能在内核code使用睡眠()


这不是它的的无法的;有可能。我不知道。这不是真正的C指定话虽如此,这简直的不是一个好主意的到的时间段已经过去阻止内核的执行。即使在通用编程,这似乎并不像一个好主意。你的函数应尽快完成处理,或将控制权返回给内核,以便它可以找到其他的东西,而它的等待空闲任务做。


  

此外,正如我所说,我想比较我的CPU性能和GPU的,要实现这一目标的途径之一是通过计算内核code的运行时的各种设备上,而如果有另一种方式,我可以得到code开始就在同一时间所有的设备上执行的话,我会只需要列出执行相应的结束时间,将服务宗旨,以及!这可能吗?


当然,类似的东西...但是...我什至不知道为什么你认为注射睡眠(10)到每个任务会帮助你;你有没有解释说,在这里。它似乎并不像您分析code的要求(例如检查其速度的)。你听说过的XY问题?我觉得睡觉是变量Y.,在这种情况下。

我刚才提到的分析。您对剖析学到了什么?他们这样做到底是什么,你的目标是什么,但他们做的,您无需编写任何code 的。 下面是关于使用PERF来分析Linux内核的教程 ...

I want to measure the performance (read runtime) of my kernel code on various devices viz CPU and GPUs. The kernel code that I wrote is:

__kernel void dataParallel(__global int* A)
{  
    sleep(10);
    A[0]=2;
    A[1]=3;
    A[2]=5;
    int pnp;//pnp=probable next prime
    int pprime;//previous prime
    int i,j;
    for(i=3;i<500;i++)
    {
        j=0;
        pprime=A[i-1];
        pnp=pprime+2;
        while((j<i) && A[j]<=sqrt((float)pnp))
        {
            if(pnp%A[j]==0)
                {
                    pnp+=2;
                    j=0;
                }
            j++;

        }
        A[i]=pnp;

    }
}

However I have been told that it is not possible to use sleep() in the kernel code. If that is true then can someone give the reason and if it isn't please tell the way to implement the same.

Also, as I said that I wish to compare the performance of my CPU and the GPUs, one of the ways to achieve that is by computing the run time of the kernel code on the various devices while if there was another way by which I could get the code to start executing on all the devices at the same time then I would just have to list the corresponding end time of execution and that would serve the purpose as well! Is it possible?

Hardware Details:

GPU: AMD FirePro W7000, NVIDIA TESLA C2075 CPU: Intel(R) XEON(R) CPU X5660 @ 2.80GHZn

解决方案

However I have been told that it is not possible to use sleep() in the kernel code.

It's not that it's not possible; it might be. I don't know. That's not really specified in C. Having said that, it's simply not a good idea to block execution of a kernel until a period of time has elapsed. Even in general purpose programming, that doesn't seem like a good idea. Your function should finish processing as soon as possible, or pass control back to the kernel so that it can find something else to do while it's waiting on idle tasks.

Also, as I said that I wish to compare the performance of my CPU and the GPUs, one of the ways to achieve that is by computing the run time of the kernel code on the various devices while if there was another way by which I could get the code to start executing on all the devices at the same time then I would just have to list the corresponding end time of execution and that would serve the purpose as well! Is it possible?

Sure, something like that... but... I'm not even sure why you think injecting sleep(10) into each task will help you; you haven't explained that here. It doesn't seem like a requirement for profiling your code (e.g. checking its speed). Have you ever heard of the XY problem? I think sleep is your Y variable, in this case.

I mentioned profiling just now. Have you learnt about profilers? They do exactly what it is you're aiming to do, except that they do it without you having to write any code. Here's a tutorial on using perf to profile the Linux kernel...

这篇关于计算内核code的运行时间OpenCL中Ç的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆