2D纹理的节距对齐 [英] Pitch alignment for 2D textures

查看:219
本文介绍了2D纹理的节距对齐的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

2D纹理是CUDA在图像处理应用程序中的一个有用特性。要将螺距线性存储器绑定到2D纹理,存储器必须对齐。 cudaMallocPitch 是对齐内存分配的好选择。在我的设备上, cudaMallocPitch 返回的间距是512的倍数,即内存是512字节对齐。

2D textures are a useful feature of CUDA in image processing applications. To bind pitch linear memory to 2D textures, the memory has to be aligned. cudaMallocPitch is a good option for aligned memory allocation. On my device, the pitch returned by cudaMallocPitch is a multiple of 512, i.e the memory is 512 byte aligned.

设备的实际对齐要求由 cudaDeviceProp :: texturePitchAlignment 决定,它在我的设备上为32个字节。

The actual alignment requirement for the device is determined by cudaDeviceProp::texturePitchAlignment which is 32 bytes on my device.

我的问题是:

如果2D纹理的实际对齐要求是32字节,那么为什么 cudaMallocPitch 返回512字节对齐的内存?

If the actual alignment requirement for 2D textures is 32 bytes, then why does cudaMallocPitch return 512 byte aligned memory?

这不是浪费内存吗?例如,如果我创建一个大小为513 x 100的8位图像,它将占用1024 x 100字节。

Isn't it a waste of memory? For example if I create an 8 bit image of size 513 x 100, it will occupy 1024 x 100 bytes.

我在以下系统上得到此行为:

I get this behaviour on following systems:

1:Asus G53JW + Windows 8 x64 + GeForce GTX 460M + CUDA 5 + Core i7 740QM + 4GB RAM

1: Asus G53JW + Windows 8 x64 + GeForce GTX 460M + CUDA 5 + Core i7 740QM + 4GB RAM

2: Dell Inspiron N5110 + Windows 7 x64 + GeForce GT525M + CUDA 4.2 + Corei7 2630QM + 6GB RAM

2: Dell Inspiron N5110 + Windows 7 x64 + GeForce GT525M + CUDA 4.2 + Corei7 2630QM + 6GB RAM

推荐答案

,但请记住,有两个对齐属性,分配的间距必须满足纹理,一个用于textutr指针,另一个用于纹理行。我怀疑 cudaMallocPitch 尊重前者,由 cudaDeviceProp :: textureAlignment 定义。例如:

This is a slightly speculative answer, but keep in mind that there are two alignment properties which the pitch of an allocation must satisfy for textures, one for the textutr pointer and one for the texture rows. I suspect that cudaMallocPitch is honouring the former, defined by cudaDeviceProp::textureAlignment. For example:

#include <cstdio>

int main(void)
{
    const int ncases = 12;
    const size_t widths[ncases] = { 5, 10, 20, 50, 70, 90, 100,
        200, 500, 700, 900, 1000 };
    const size_t height = 10;

    float *vals[ncases];
    size_t pitches[ncases];

    struct cudaDeviceProp p;
    cudaGetDeviceProperties(&p, 0);
    fprintf(stdout, "Texture alignment = %zd bytes\n",
            p.textureAlignment);
    cudaSetDevice(0);
    cudaFree(0); // establish context

    for(int i=0; i<ncases; i++) {
        cudaMallocPitch((void **)&vals[i], &pitches[i], 
            widths[i], height);
        fprintf(stdout, "width = %zd <=> pitch = %zd \n",
                widths[i], pitches[i]);
    }

    return 0;
}

在GT320M上提供以下功能:

which gives the following on a GT320M:

Texture alignment = 256 bytes
width = 5 <=> pitch = 256 
width = 10 <=> pitch = 256 
width = 20 <=> pitch = 256 
width = 50 <=> pitch = 256 
width = 70 <=> pitch = 256 
width = 90 <=> pitch = 256 
width = 100 <=> pitch = 256 
width = 200 <=> pitch = 256 
width = 500 <=> pitch = 512 
width = 700 <=> pitch = 768 
width = 900 <=> pitch = 1024 
width = 1000 <=> pitch = 1024 

我猜想 cudaDeviceProp :: texturePitchAlignment 适用于数组。

这篇关于2D纹理的节距对齐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆