CUFFT |不能弄清楚一个简单的例子 [英] CUFFT | cannot figure out a simple example

查看:493
本文介绍了CUFFT |不能弄清楚一个简单的例子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在努力的整天,试图使一个基本的CUFFT示例工作正常。但我遇到一个小问题,我不能确定。基本上我有一个线性二维数组vx x和y坐标。然后我只是计算一个向前然后向后CUFFT(就地),那么简单。然后我复制回数组vx,通过NX * NY 对其进行标准化,然后显示。

I've been struggling the whole day, trying to make a basic CUFFT example work properly. However i run into a little problem which I cannot identify. Basically I have a linear 2D array vx with x and y coordinates. Then I just calculate a forward then backward CUFFT (in-place), that simple. Then I copy back the array vx, normalize it by NX*NY , then display.

#define NX 32
#define NY 32
#define LX (2*M_PI)
#define LY (2*M_PI)
float *x = new float[NX*NY];
float *y = new float[NX*NY];
float *vx = new float[NX*NY];
for(int j = 0; j < NY; j++){
    for(int i = 0; i < NX; i++){
        x[j*NX + i] = i * LX/NX;
        y[j*NX + i] = j * LY/NY;
        vx[j*NX + i] = cos(x[j*NX + i]);
    }
}
float *d_vx;
CUDA_CHECK(cudaMalloc(&d_vx, NX*NY*sizeof(float)));
CUDA_CHECK(cudaMemcpy(d_vx, vx, NX*NY*sizeof(float), cudaMemcpyHostToDevice));
cufftHandle planr2c;
cufftHandle planc2r;
CUFFT_CHECK(cufftPlan2d(&planr2c, NY, NX, CUFFT_R2C));
CUFFT_CHECK(cufftPlan2d(&planc2r, NY, NX, CUFFT_C2R));
CUFFT_CHECK(cufftSetCompatibilityMode(planr2c, CUFFT_COMPATIBILITY_NATIVE));
CUFFT_CHECK(cufftSetCompatibilityMode(planc2r, CUFFT_COMPATIBILITY_NATIVE));
CUFFT_CHECK(cufftExecR2C(planr2c, (cufftReal *)d_vx, (cufftComplex *)d_vx));
CUFFT_CHECK(cufftExecC2R(planc2r, (cufftComplex *)d_vx, (cufftReal *)d_vx));
CUDA_CHECK(cudaMemcpy(vx, d_vx, NX*NY*sizeof(cufftReal), cudaMemcpyDeviceToHost));
for (int j = 0; j < NY; j++){
    for (int i = 0; i < NX; i++){
        printf("%.3f ", vx[j*NX + i]/(NX*NY));
    }
    printf("\n");
}

当vx定义为cos(x)或sin但是当使用sin(y)或cos(y)时,它给我回正确的函数(sin或cos),但是具有一半幅度(即,在0.5和-0.5之间振荡而不是1和-1)!注意,使用sin(2 * y)或cos(2 * y)(或sin(4 * y),cos(4 * y),...)任何想法?

When vx is defined as cos(x) or sin(x), it works fine, but when using sin(y) or cos(y), it gives me back the correct function (sin or cos) but with half amplitude (that is, oscillating between 0.5 and -0.5 instead of 1 and -1) ! Note that using sin(2*y) or cos(2*y) (or sin(4*y), cos(4*y), ...) works fine. Any idea?

推荐答案

这里的问题是输入和输出的实地到复杂变换是一个复杂类型大小不同于输入的实际数据(它是两倍大)。您没有分配足够的内存来保存实数到复数变换的中间复数结果。引用来自文档:

The problem here is that input and output of an in-place real to complex transform is a complex type whose size isn't the same as the input real data (it is twice as large). You haven't allocated enough memory to hold the intermediate complex results of the real to complex transform. Quoting from the documentation:


cufftExecR2C()(cufftExecD2Z())执行单精度
到复杂,隐式转发,CUFFT
转换计划。 CUFFT使用由
指向的GPU内存作为输入数据的idata参数。此函数将非冗余傅立叶
系数存储在odata数组中。指向idata和odata的指针都是
需要在单精度
变换中与cufftComplex数据类型对齐,而在双精度
变换中则为cufftDoubleComplex数据类型。

cufftExecR2C() (cufftExecD2Z()) executes a single-precision (double-precision) real-to-complex, implicitly forward, CUFFT transform plan. CUFFT uses as input data the GPU memory pointed to by the idata parameter. This function stores the nonredundant Fourier coefficients in the odata array. Pointers to idata and odata are both required to be aligned to cufftComplex data type in single-precision transforms and cufftDoubleComplex data type in double-precision transforms.

解决方案是分配第二个设备缓冲区来保存中间结果,或者放大原位分配,使其足够大以容纳复杂数据。因此,核心转换代码更改为:

The solution is either to allocate a second device buffer to hold the intermediate result or enlarge the in place allocation so it is large enough to hold the complex data. So the core transform code changes to something like:

float *d_vx;
CUDA_CHECK(cudaMalloc(&d_vx, NX*NY*sizeof(cufftComplex)));
CUDA_CHECK(cudaMemcpy(d_vx, vx, NX*NY*sizeof(cufftComplex), cudaMemcpyHostToDevice));
cufftHandle planr2c;
cufftHandle planc2r;
CUFFT_CHECK(cufftPlan2d(&planr2c, NY, NX, CUFFT_R2C));
CUFFT_CHECK(cufftPlan2d(&planc2r, NY, NX, CUFFT_C2R));
CUFFT_CHECK(cufftSetCompatibilityMode(planr2c, CUFFT_COMPATIBILITY_NATIVE));
CUFFT_CHECK(cufftSetCompatibilityMode(planc2r, CUFFT_COMPATIBILITY_NATIVE));
CUFFT_CHECK(cufftExecR2C(planr2c, (cufftReal *)d_vx, d_vx));
CUFFT_CHECK(cufftExecC2R(planc2r, d_vx, (cufftReal *)d_vx));
CUDA_CHECK(cudaMemcpy(vx, d_vx, NX*NY*sizeof(cufftComplex), cudaMemcpyDeviceToHost));

[免责声明:用浏览器编写,不得编译或测试,自行承担风险]

[disclaimer: written in browser, never compiled or tested, use at own risk]

请注意,您需要调整主机代码以匹配输入和数据的大小和类型。

Note you will need to adjust the host code to match the size and type of the input and data.

评论,难道要添加额外的8或10行,把你发布的一个可编译,可运行的示例中,有人试图帮助你可以工作吗?

As a final comment, would it have been that hard to add the additional 8 or 10 lines required to turn what you posted into a compilable, runnable example that someone trying to help you could work with?

这篇关于CUFFT |不能弄清楚一个简单的例子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆