我的opencl测试没有比CPU快得多 [英] My opencl test does not run much faster than CPU

查看：46 发布时间：2020/5/20 18:52:25 c++ parallel-processing opencl gpu

本文介绍了我的opencl测试没有比CPU快得多的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试测量GPU的执行时间，并将其与CPU进行比较. 我编写了一个simple_add函数来添加短整数向量的所有元素. 内核代码为:

I am trying to measure the execution time of GPU and compare it with CPU. I wrote a simple_add function to add all elements of a short int vector. The Kernel code is:

global const int * A, global const uint * B, global int* C)
    {
        ///------------------------------------------------
        /// Add 16 bits of each
        int AA=A[get_global_id(0)];
        int BB=B[get_global_id(0)];
        int AH=0xFFFF0000 & AA;
        int AL=0x0000FFFF & AA;
        int BH=0xFFFF0000 & BB;
        int BL=0x0000FFFF & BB;
        int CL=(AL+BL)&0x0000FFFF;
        int CH=(AH+BH)&0xFFFF0000;      
        C[get_global_id(0)]=CH|CL;               
     }

我为此功能编写了另一个CPU版本，并在执行了100次时间后测量了它们的执行时间

I wrote another CPU version for this function and after 100 time executions measured their execution time

clock_t before_GPU = clock();
for(int i=0;i<100;i++)
{
  queue.enqueueNDRangeKernel(kernel_add,1,
  cl::NDRange((size_t)(NumberOfAllElements/4)),cl::NDRange(64));
  queue.finish();
 }
 clock_t after_GPU = clock();


 clock_t before_CPU = clock();
 for(int i=0;i<100;i++)
     AddImagesCPU(A,B,C);
  clock_t after_CPU = clock();

调用整个测量函数10次后结果如下:

the result was as below after 10 times calling the whole measurement function:

        CPU time: 1359
        GPU time: 1372
        ----------------
        CPU time: 1336
        GPU time: 1269
        ----------------
        CPU time: 1436
        GPU time: 1255
        ----------------
        CPU time: 1304
        GPU time: 1266
        ----------------
        CPU time: 1305
        GPU time: 1252
        ----------------
        CPU time: 1313
        GPU time: 1255
        ----------------
        CPU time: 1313
        GPU time: 1253
        ----------------
        CPU time: 1384
        GPU time: 1254
        ----------------
        CPU time: 1300
        GPU time: 1254
        ----------------
        CPU time: 1322
        GPU time: 1254
        ----------------

问题是我确实希望GPU比CPU快得多，但事实并非如此.我不明白为什么我的GPU速度没有比CPU高很多.我的代码有什么问题吗? 这是我的GPU属性:

The problem is that I really expected GPU to be much faster than CPU but it was not. I can't understand why my GPU speed is not much higher than CPU. Is there any problem in my codes ?? Here is my GPU properties:

        -----------------------------------------------------
        ------------- Selected Platform Properties-------------:
        NAME:   AMD Accelerated Parallel Processing
        EXTENSION:      cl_khr_icd cl_amd_event_callback cl_amd_offline_devices cl_khr_d3d10_sharing
        VENDOR:         Advanced Micro Devices, Inc.
        VERSION:        OpenCL 1.2 AMD-APP (937.2)
        PROFILE:        FULL_PROFILE
        -----------------------------------------------------
        ------------- Selected Device Properties-------------:
        NAME :  ATI RV730
        TYPE :  4
        VENDOR :        Advanced Micro Devices, Inc.
        PROFILE :       FULL_PROFILE
        VERSION :       OpenCL 1.0 AMD-APP (937.2)
        EXTENSIONS :    cl_khr_gl_sharing cl_amd_device_attribute_query cl_khr_d3d10_sharing
        MAX_COMPUTE_UNITS :     8
        MAX_WORK_GROUP_SIZE :   128
        OPENCL_C_VERSION :      OpenCL C 1.0
        DRIVER_VERSION:         CAL 1.4.1734
        ==========================================================

为了比较这是我的CPU规格:

and just to compare this is my CPU specifications:

        ------------- CPU Properties-------------:
        NAME :          Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz
        TYPE :  2
        VENDOR :        GenuineIntel
        PROFILE :       FULL_PROFILE
        VERSION :       OpenCL 1.2 AMD-APP (937.2)
        MAX_COMPUTE_UNITS :     4
        MAX_WORK_GROUP_SIZE :   1024
        OPENCL_C_VERSION :      OpenCL C 1.2
        DRIVER_VERSION:         2.0 (sse2,avx)
        ==========================================================

我还使用QueryPerformanceCounter测量了挂钟时间，这是结果:

I also measured the wall clock time using QueryPerformanceCounter and here is the results:

            CPU time: 1304449.6  micro-sec
            GPU time: 1401740.82  micro-sec
            ----------------------
            CPU time: 1620076.55  micro-sec
            GPU time: 1310317.64  micro-sec
            ----------------------
            CPU time: 1468520.44  micro-sec
            GPU time: 1317153.63  micro-sec
            ----------------------
            CPU time: 1304367.29  micro-sec
            GPU time: 1251865.14  micro-sec
            ----------------------
            CPU time: 1301589.17  micro-sec
            GPU time: 1252889.4  micro-sec
            ----------------------
            CPU time: 1294750.21  micro-sec
            GPU time: 1257017.41  micro-sec
            ----------------------
            CPU time: 1297506.93  micro-sec
            GPU time: 1252768.9  micro-sec
            ----------------------
            CPU time: 1293511.29  micro-sec
            GPU time: 1252019.88  micro-sec
            ----------------------
            CPU time: 1320753.54  micro-sec
            GPU time: 1248895.73  micro-sec
            ----------------------
            CPU time: 1296486.95  micro-sec
            GPU time: 1255207.91  micro-sec
            ----------------------

同样，我尝试使用opencl分析进行执行.

Again I tried the opencl profiling for execution time.

            queue.enqueueNDRangeKernel(kernel_add,1,
                                    cl::NDRange((size_t)(NumberOfAllElements/4)),
                                    cl::NDRange(64),NULL,&ev);
            ev.wait();
            queue.finish();
            time_start=ev.getProfilingInfo<CL_PROFILING_COMMAND_START>();
            time_end=ev.getProfilingInfo<CL_PROFILING_COMMAND_END>();

一次执行的结果大致相同:

Results for one time execution were more or less the same:

            CPU time: 13335.1815  micro-sec
            GPU time: 11865.111  micro-sec
            ----------------------
            CPU time: 13884.0235  micro-sec
            GPU time: 11663.889  micro-sec
            ----------------------
            CPU time: 19724.7296  micro-sec
            GPU time: 14548.222  micro-sec
            ----------------------
            CPU time: 19945.3199  micro-sec
            GPU time: 15331.111  micro-sec
            ----------------------
            CPU time: 17973.5055  micro-sec
            GPU time: 11641.444  micro-sec
            ----------------------
            CPU time: 12652.6683  micro-sec
            GPU time: 11632  micro-sec
            ----------------------
            CPU time: 18875.292  micro-sec
            GPU time: 14783.111  micro-sec
            ----------------------
            CPU time: 32782.033  micro-sec
            GPU time: 11650.444  micro-sec
            ----------------------
            CPU time: 20462.2257  micro-sec
            GPU time: 11647.778  micro-sec
            ----------------------
            CPU time: 14529.6618  micro-sec
            GPU time: 11860.112  micro-sec

我的opencl测试没有比CPU快得多 [英] My opencl test does not run much faster than CPU

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

我的opencl测试没有比CPU快得多 [英] My opencl test does not run much faster than CPU

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭