2D纹理的节距对齐 [英] Pitch alignment for 2D textures
问题描述
2D纹理是CUDA在图像处理应用程序中的一个有用特性。要将螺距线性存储器绑定到2D纹理,存储器必须对齐。 cudaMallocPitch
是对齐内存分配的好选择。在我的设备上, cudaMallocPitch
返回的间距是512的倍数,即内存是512字节对齐。
2D textures are a useful feature of CUDA in image processing applications. To bind pitch linear memory to 2D textures, the memory has to be aligned. cudaMallocPitch
is a good option for aligned memory allocation. On my device, the pitch returned by cudaMallocPitch
is a multiple of 512, i.e the memory is 512 byte aligned.
设备的实际对齐要求由 cudaDeviceProp :: texturePitchAlignment
决定,它在我的设备上为32个字节。
The actual alignment requirement for the device is determined by cudaDeviceProp::texturePitchAlignment
which is 32 bytes on my device.
我的问题是:
如果2D纹理的实际对齐要求是32字节,那么为什么 cudaMallocPitch
返回512字节对齐的内存?
If the actual alignment requirement for 2D textures is 32 bytes, then why does cudaMallocPitch
return 512 byte aligned memory?
这不是浪费内存吗?例如,如果我创建一个大小为513 x 100的8位图像,它将占用1024 x 100字节。
Isn't it a waste of memory? For example if I create an 8 bit image of size 513 x 100, it will occupy 1024 x 100 bytes.
我在以下系统上得到此行为:
I get this behaviour on following systems:
1:Asus G53JW + Windows 8 x64 + GeForce GTX 460M + CUDA 5 + Core i7 740QM + 4GB RAM
1: Asus G53JW + Windows 8 x64 + GeForce GTX 460M + CUDA 5 + Core i7 740QM + 4GB RAM
2: Dell Inspiron N5110 + Windows 7 x64 + GeForce GT525M + CUDA 4.2 + Corei7 2630QM + 6GB RAM
2: Dell Inspiron N5110 + Windows 7 x64 + GeForce GT525M + CUDA 4.2 + Corei7 2630QM + 6GB RAM
推荐答案
,但请记住,有两个对齐属性,分配的间距必须满足纹理,一个用于textutr指针,另一个用于纹理行。我怀疑 cudaMallocPitch
尊重前者,由 cudaDeviceProp :: textureAlignment
定义。例如:
This is a slightly speculative answer, but keep in mind that there are two alignment properties which the pitch of an allocation must satisfy for textures, one for the textutr pointer and one for the texture rows. I suspect that cudaMallocPitch
is honouring the former, defined by cudaDeviceProp::textureAlignment
. For example:
#include <cstdio>
int main(void)
{
const int ncases = 12;
const size_t widths[ncases] = { 5, 10, 20, 50, 70, 90, 100,
200, 500, 700, 900, 1000 };
const size_t height = 10;
float *vals[ncases];
size_t pitches[ncases];
struct cudaDeviceProp p;
cudaGetDeviceProperties(&p, 0);
fprintf(stdout, "Texture alignment = %zd bytes\n",
p.textureAlignment);
cudaSetDevice(0);
cudaFree(0); // establish context
for(int i=0; i<ncases; i++) {
cudaMallocPitch((void **)&vals[i], &pitches[i],
widths[i], height);
fprintf(stdout, "width = %zd <=> pitch = %zd \n",
widths[i], pitches[i]);
}
return 0;
}
在GT320M上提供以下功能:
which gives the following on a GT320M:
Texture alignment = 256 bytes
width = 5 <=> pitch = 256
width = 10 <=> pitch = 256
width = 20 <=> pitch = 256
width = 50 <=> pitch = 256
width = 70 <=> pitch = 256
width = 90 <=> pitch = 256
width = 100 <=> pitch = 256
width = 200 <=> pitch = 256
width = 500 <=> pitch = 512
width = 700 <=> pitch = 768
width = 900 <=> pitch = 1024
width = 1000 <=> pitch = 1024
我猜想 cudaDeviceProp :: texturePitchAlignment
适用于数组。
这篇关于2D纹理的节距对齐的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!