在设备上的线性内存中的2-D数组上循环时,将float *转换为char * [英] Casting float* to char* while looping over a 2-D array in linear memory on device

查看:163
本文介绍了在设备上的线性内存中的2-D数组上循环时,将float *转换为char *的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在CUDA 4.0编程指南的第21页,有一个例子(下面给出)来说明在设备内存中浮动二维数组的
元素的循环。 2D的尺寸为 width * height

  //主机代码
int width = 64,height = 64;
float * devPtr;
size_t pitch;
cudaMallocPitch(& devPtr,& pitch,
width * sizeof(float),height);
MyKernel<<<< 100,512>>>(devPtr,pitch,width,height);


//设备代码
__global__ void MyKernel(float * devPtr,size_t pitch,int width,int height)
{
for = 0; r {
float * row =(float *)((char *)devPtr + r * pitch);
for(int c = 0; c< width; ++ c)
{
float element = row [c];
}
}
}

$ c> devPtr 设备内存指针已转换为全局内核函数中的字符指针char *有人可以解释这一行。这看起来有点奇怪。

解决方案

这是由于指针算术在C中工作。当您将 x 添加到指针 p ,它不总是添加 x 字节。它添加 x 次 sizeof([指向的类型])

  float * row =(float *)((char *)devPtr + r * pitch); 

通过将 devPtr 转换成 char * ,应用的偏移量( r * pitch * )以1字节为增量。 (因为 char 是一个字节)。如果转换不在那里,应用于devPtr的偏移量将是 r * pitch 乘以4个字节,作为 float

 >  float * devPtr = 1000; 
int r = 4;

现在,让我们省略演员:

  float * result1 =(devPtr + r); 
// result1 = devPtr +(r * sizeof(float))= 1016;

现在,如果我们包含演员:

  float * result2 =(float *)((char *)devPtr + r); 
// result2 = devPtr +(r * sizeof(char))= 1004;


On Page 21 of the CUDA 4.0 programming guide there is an example (given below) to illustrate looping over the elements of a 2D array of floats in device memory. The dimensions of the 2D are width*height

// Host code
int width = 64, height = 64;
float* devPtr;
size_t pitch;
cudaMallocPitch(&devPtr, &pitch,
width * sizeof(float), height);
MyKernel<<<100, 512>>>(devPtr, pitch, width, height);


// Device code
__global__ void MyKernel(float* devPtr, size_t pitch, int width, int height)
{
   for (int r = 0; r < height; ++r) 
    {
       float* row = (float*)((char*)devPtr + r * pitch);
          for (int c = 0; c < width; ++c) 
              {
              float element = row[c];
              }
     }
}

Why has the devPtr device memory pointer been cast to a character pointer ,char*, in the global kernel function? Can someone explain that line please. It looks a bit weird.

解决方案

This is due to the way pointer arithmetic works in C. When you add an integer x to a pointer p, it doesn't always add x bytes. It adds x times sizeof([type that p points to]).

float* row = (float*)((char*)devPtr + r * pitch);

By casting devPtr to a char*, the offset that is applied (r * pitch*) is in number of 1-byte increments. (because a char is one byte). Had the cast not been there, the offset applied to devPtr would be r * pitch times 4 bytes, as a float is four bytes.

For example, if we have:

float* devPtr = 1000;
int r = 4;

Now, let's leave out the cast:

float* result1 = (devPtr + r);
// result1 = devPtr + (r * sizeof(float)) = 1016;

Now, if we include the cast:

float* result2 = (float*)((char*)devPtr + r);
// result2 = devPtr + (r * sizeof(char)) = 1004;

这篇关于在设备上的线性内存中的2-D数组上循环时,将float *转换为char *的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆