GPU没有在Julia集计算中提高性能 [英] GPU gives no performance improvement in Julia set computation

查看：155 发布时间：2020/11/20 0:18:37 cuda gpu gpgpu gpu-programming

本文介绍了GPU没有在Julia集计算中提高性能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试比较CPU和GPU的性能.我有

I am trying to compare performance in CPU and GPU. I have

CPU:英特尔®酷睿™i5 CPU M 480 @ 2.67GHz×4
GPU:NVidia GeForce GT 420M

我可以确认GPU已配置并且可以在CUDA上正常使用.

I can confirm that GPU is configured and works correctly with CUDA.

我正在实现Julia集计算. http://en.wikipedia.org/wiki/Julia_set 基本上对于每个像素，如果坐标在集合中，它将被涂成红色否则把它漆成白色.

I am implementing Julia set computation. http://en.wikipedia.org/wiki/Julia_set Basically for every pixel, if the co-ordinate is in the set it will paint it red else paint it white.

尽管，我在CPU和GPU上都得到了相同的答案，但没有得到性能提升，使用GPU会导致性能下降.

Although, I get identical answer with both CPU and GPU but instead of getting a performance improvement, I get a performance penalty by using GPU.

运行时间

CPU:0.052s
GPU:0.784秒

我知道从设备到主机的数据传输会花费一些时间. 但是，我怎么知道使用GPU是否真的有好处?

I am aware that transferring data from device to host can take up some time. But still, how do I know if use of GPU is actually beneficial?

这是相关的GPU代码

    #include <stdio.h>
    #include <cuda.h>

    __device__ bool isJulia( float x, float y, float maxX_2, float maxY_2 )
    {
        float z_r = 0.8 * (float) (maxX_2 - x) / maxX_2;
        float z_i = 0.8 * (float) (maxY_2 - y) / maxY_2;

        float c_r = -0.8;
        float c_i = 0.156;
        for( int i=1 ; i<100 ; i++ )
        {
        float tmp_r = z_r*z_r - z_i*z_i + c_r;
        float tmp_i = 2*z_r*z_i + c_i;

        z_r = tmp_r;
        z_i = tmp_i;

        if( sqrt( z_r*z_r + z_i*z_i ) > 1000 )
            return false;
        }
        return true;
    }

    __global__ void kernel( unsigned char * im, int dimx, int dimy )
    {
        //int tid = blockIdx.y*gridDim.x + blockIdx.x;
        int tid = blockIdx.x*blockDim.x + threadIdx.x;
        tid *= 3;
        if( isJulia((float)blockIdx.x, (float)threadIdx.x, (float)dimx/2, (float)dimy/2)==true )
        {
        im[tid] = 255;
        im[tid+1] = 0;
        im[tid+2] = 0;
        }
        else
        {
        im[tid] = 255;
        im[tid+1] = 255;
        im[tid+2] = 255;
        }

    }

    int main()
    {
        int dimx=768, dimy=768;

        //on cpu
        unsigned char * im = (unsigned char*) malloc( 3*dimx*dimy );

        //on GPU
        unsigned char * im_dev;

        //allocate mem on GPU
        cudaMalloc( (void**)&im_dev, 3*dimx*dimy ); 

        //launch kernel. 
**for( int z=0 ; z<10000 ; z++ ) // loop for multiple times computation**
{
        kernel<<<dimx,dimy>>>(im_dev, dimx, dimy);
}

        cudaMemcpy( im, im_dev, 3*dimx*dimy, cudaMemcpyDeviceToHost );

        writePPMImage( im, dimx, dimy, 3, "out_gpu.ppm" ); //assume this writes a ppm file

        free( im );
        cudaFree( im_dev );
    }

这是CPU代码

    bool isJulia( float x, float y, float maxX_2, float maxY_2 )
    {
        float z_r = 0.8 * (float) (maxX_2 - x) / maxX_2;
        float z_i = 0.8 * (float) (maxY_2 - y) / maxY_2;

        float c_r = -0.8;
        float c_i = 0.156;
        for( int i=1 ; i<100 ; i++ )
        {
        float tmp_r = z_r*z_r - z_i*z_i + c_r;
        float tmp_i = 2*z_r*z_i + c_i;

        z_r = tmp_r;
        z_i = tmp_i;

        if( sqrt( z_r*z_r + z_i*z_i ) > 1000 )
            return false;
        }
        return true;
    }


    #include <stdlib.h>
    #include <stdio.h>

    int main(void)
    {
      const int dimx = 768, dimy = 768;
      int i, j;

      unsigned char * data = new unsigned char[dimx*dimy*3];

**for( int z=0 ; z<10000 ; z++ ) // loop for multiple times computation**
{
      for (j = 0; j < dimy; ++j)
      {
        for (i = 0; i < dimx; ++i)
        {
          if( isJulia(i,j,dimx/2,dimy/2) == true )
          {
          data[3*j*dimx + 3*i + 0] = (unsigned char)255;  /* red */
          data[3*j*dimx + 3*i + 1] = (unsigned char)0;  /* green */
          data[3*j*dimx + 3*i + 2] = (unsigned char)0;  /* blue */
          }
          else
          {
          data[3*j*dimx + 3*i + 0] = (unsigned char)255;  /* red */
          data[3*j*dimx + 3*i + 1] = (unsigned char)255;  /* green */
          data[3*j*dimx + 3*i + 2] = (unsigned char)255;  /* blue */
          }
        }
      }
}

      writePPMImage( data, dimx, dimy, 3, "out_cpu.ppm" ); //assume this writes a ppm file
      delete [] data


      return 0;
    }

此外，根据@hyde的建议，我循环了仅计算部分以生成10,000张图像.不过，我不愿意写所有这些图像.计算只是我在做什么.

Further, following suggestions from @hyde I have looped the computation-only part to generate 10,000 images. I am not bothering to write all those images though. Computation only is what I am doing.

这是运行时间

CPU:超过10分钟，代码仍在运行
GPU:1m 14.765s

GPU没有在Julia集计算中提高性能 [英] GPU gives no performance improvement in Julia set computation

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

GPU没有在Julia集计算中提高性能 [英] GPU gives no performance improvement in Julia set computation

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭