memset cuArray用于表面记忆 [英] memset cuArray for surface memory

查看:75
本文介绍了memset cuArray用于表面记忆的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您有一个 cuArray 用于绑定表面对象.

Say you have a cuArray for binding a surface object.

形式如下:

// These are inputs to a function really.
cudaArray* d_cuArrSurf
cudaSurfaceObject_t * surfImage;

const cudaExtent extent = make_cudaExtent(width, height, depth);
cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();
cudaMalloc3DArray(&d_cuArrSurf, &channelDesc, extent);

// Bind to Surface
cudaResourceDesc    surfRes;
memset(&surfRes, 0, sizeof(cudaResourceDesc));
surfRes.resType = cudaResourceTypeArray;
surfRes.res.array.array  = d_cuArrSurf;

cudaCreateSurfaceObject(surfImage, &surfRes);

现在,我想将此 cuArray 初始化为零.显然,对于 cuArray 类型的对象,没有 memset .最好的方法是什么?也许有多种选择,有些可能具有更好或更差的功能.这些选项有哪些?

Now, I want to initialize this cuArray to zero. Apparently there is non memset for cuArray type of objects. What would be the best way to do this? Maybe multiple options are possible, and some may have better or worse features. Which are these options?

我能想到

  1. 分配和归零主机内存,然后使用 cudaMemcpy3D()复制它.

创建一个初始化内核,并使用 surf3Dwrite()

create an initialization kernel and write it with surf3Dwrite()

推荐答案

您能否显示这些行的示例?

Would it be possible for you to show an example of those lines?

这是一个粗略的示例,粗略地扩展了上一个粗略的示例:

Here is a rough example, roughly extending the previous rough example:

$ cat t1648.cu
// Includes, system
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>


__device__ float my_common(float *d, int width, unsigned int x, unsigned int y){

// 200 lines of common code...
  return d[y *width +x];
}




////////////////////////////////////////////////////////////////////////////////
// Kernels
////////////////////////////////////////////////////////////////////////////////
//! Write to a cuArray using surface writes
//! @param gIData input data in global memory
////////////////////////////////////////////////////////////////////////////////
__global__ void WriteKernel(float *gIData, int width, int height,
                                       cudaSurfaceObject_t outputSurface)
{
    // calculate surface coordinates
    unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;
    unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;
    unsigned int z = blockIdx.z*blockDim.z + threadIdx.z;
    // read from global memory and write to cuarray (via surface reference)
    surf3Dwrite(my_common(gIData, width, x, y),
                outputSurface, x*4, y, z, cudaBoundaryModeTrap);
}

__global__ void WriteKernel(float *gIData, int width, int height,
                                       float *out)
{
    // calculate coordinates
    unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;
    unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;

    // read from global memory and write to global memory
    out[y*width+x] = my_common(gIData, width, x, y);
}

__global__ void ReadKernel(float tval, cudaSurfaceObject_t outputSurface)
{
    // calculate surface coordinates
    unsigned int x = blockIdx.x*blockDim.x + threadIdx.x;
    unsigned int y = blockIdx.y*blockDim.y + threadIdx.y;
    unsigned int z = blockIdx.z*blockDim.z + threadIdx.z;;
    // read from global memory and write to cuarray (via surface reference)
    float val;
    surf3Dread(&val,
                outputSurface, x*4, y, z, cudaBoundaryModeTrap);
    if (val != tval) printf("oops\n");
}


////////////////////////////////////////////////////////////////////////////////
// Program main
////////////////////////////////////////////////////////////////////////////////
int main(int argc, char **argv)
{
    printf("starting...\n");


    unsigned width = 256;
    unsigned height = 256;
    unsigned depth = 256;
    unsigned int size = depth*width * height * sizeof(float);

    // Allocate device memory for result
    float *dData = NULL;
    cudaMalloc((void **) &dData, size);

    // Allocate array and copy image data
    float *out, *h_out;
    h_out = new float[height*width*depth];
    float tval = 1.0f;
    for (int i = 0; i < height*width*depth; i++) h_out[i] = tval;
    cudaArray* d_cuArrSurf;
    cudaSurfaceObject_t  surfImage;

    const cudaExtent extent = make_cudaExtent(width, height, depth);
    cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc<float>();
    cudaMalloc3DArray(&d_cuArrSurf, &channelDesc, extent);

    // Bind to Surface
    cudaResourceDesc    surfRes;
    memset(&surfRes, 0, sizeof(cudaResourceDesc));
    surfRes.resType = cudaResourceTypeArray;
    surfRes.res.array.array  = d_cuArrSurf;

    cudaCreateSurfaceObject(&surfImage, &surfRes);

    cudaMalloc(&out, size);
    cudaMemcpy(out, h_out, size, cudaMemcpyHostToDevice);
    dim3 dimBlock(8, 8, 8);
    dim3 dimGrid(width / dimBlock.x, height / dimBlock.y, 1);
    // initialize array
    cudaMemcpy3DParms p = {0};
    p.srcPtr = make_cudaPitchedPtr(out, width*sizeof(out[0]), width, height);
    p.srcPos = make_cudaPos(0,0,0);
    p.dstArray = d_cuArrSurf;
    p.dstPos = make_cudaPos(0,0,0);
    p.extent = make_cudaExtent(width, height, 1);
    p.kind   = cudaMemcpyDefault;
    for (int i = 0; i < depth; i++){
      cudaMemcpy3D(&p);
      p.dstPos = make_cudaPos(0,0, i+1);}

    ReadKernel<<<dimGrid, dimBlock>>>(tval, surfImage);
    WriteKernel<<<dimGrid, dimBlock>>>(dData, width, height, surfImage);
    WriteKernel<<<dimGrid, dimBlock>>>(dData, width, height, out);
    cudaDeviceSynchronize();
}
$ nvcc -o t1648 t1648.cu
$ cuda-memcheck ./t1648
========= CUDA-MEMCHECK
starting...
========= ERROR SUMMARY: 0 errors
$

上面的(总)范围是256x256x256.因此,我选择在 cudaMemcpy3D 的256次迭代中进行256x256传输(每个传输范围)(基本上每个z切片).它似乎通过了嗅探测试.

The (total) extent above is 256x256x256. So I chose to do a 256x256 transfer (per-transfer extent) (basically each z-slice) over 256 iterations of cudaMemcpy3D. It seems to pass the sniff test.

在这里,因为",我使用1作为设备内存的初始化值.如果要使其更快并初始化为零,请跳过host-> device副本,而仅使用cudaMemset将线性内存(用于3D传输的源)初始化为零.

I used 1 as my initializing value for device memory here "just because". If you wanted to make this faster and initialize to zero, skip the host->device copy and just use cudaMemset to initialize the linear memory (source for 3D transfer) to zero.

这篇关于memset cuArray用于表面记忆的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆