如何在CUDA中分别获取复杂矩阵的实部和虚部? [英] How to get the real and imaginary parts of a complex matrix separately in CUDA?

查看:149
本文介绍了如何在CUDA中分别获取复杂矩阵的实部和虚部?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试获取2D数组的fft.输入是NxM实矩阵,因此输出矩阵也是NxM矩阵(2xNxM输出矩阵,使用属性Hermitian对称性将复杂的2xNxM输出矩阵保存在NxM矩阵中).

I'm trying to get the fft of a 2D array. The input is a NxM real matrix, therefore the output matrix is also a NxM matrix (2xNxM output matrix which is complex is saved in a NxM matrix using the property Hermitian symmetry).

所以我想知道是否有在cuda中提取的方法来分别提取实矩阵和复杂矩阵?在opencv中,split函数负责.因此,我正在cuda中寻找类似的功能,但还找不到.

So i want to know whether there is method to extract in cuda to extract real and complex matrices separately ? In opencv split function does the duty. So I'm looking for a similar function in cuda, but I couldn't find it yet.

下面是我的完整代码

#define NRANK 2
#define BATCH 10

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include <cufft.h>
#include <stdio.h> 

#include <iostream>
#include <vector>

using namespace std;

int main()
    { 

    const size_t NX = 4;
    const size_t NY = 5;

    // Input array - host side
     float b[NX][NY] ={ 
        {0.7943 ,   0.6020 ,   0.7482  ,  0.9133  ,  0.9961},
        {0.3112 ,   0.2630 ,   0.4505  ,  0.1524  ,  0.0782},
        {0.5285 ,   0.6541 ,   0.0838  ,  0.8258  ,  0.4427},
        {0.1656 ,   0.6892 ,   0.2290  ,  0.5383  ,  0.1067}
    };


    // Output array - host side
    float c[NX][NY] = { 0 };

    cufftHandle plan;
    cufftComplex *data; // Holds both the input and the output - device side
    int n[NRANK] = {NX, NY};

    // Allocated memory and copy from host to device
    cudaMalloc((void**)&data, sizeof(cufftComplex)*NX*(NY/2+1));
    for(int i=0; i<NX; ++i){
        // Uses this because my actual array is a dynamically allocated. 
        // but here I've replaced it with a static 2D array to make it simple.
        cudaMemcpy(reinterpret_cast<float*>(data) + i*NY, b[i], sizeof(float)*NY, cudaMemcpyHostToDevice);
     }

    // Performe the fft
    cufftPlanMany(&plan, NRANK, n,NULL, 1, 0,NULL, 1, 0,CUFFT_R2C,BATCH);
    cufftSetCompatibilityMode(plan, CUFFT_COMPATIBILITY_NATIVE);
    cufftExecR2C(plan, (cufftReal*)data, data);
    cudaThreadSynchronize();
    cudaMemcpy(c, data, sizeof(float)*NX*NY, cudaMemcpyDeviceToHost);


    // Here c is a NxM matrix. I want to split it to 2 seperate NxM matrices with each   
    // having the complex and real component of the output

    // Here c is in 
    cufftDestroy(plan);
    cudaFree(data);

    return 0;
    }


编辑

按照JackOLanter的建议,我修改了如下代码.但是问题仍然没有解决.


EDIT

As suggested by JackOLanter, I modified the code as below. But still the problem is not solved.

 float  real_vec[NX][NY] = {0};       // host vector, real part
 float  imag_vec[NX][NY] = {0};       // host vector, imaginary part
cudaError  cudaStat1 = cudaMemcpy2D (real_vec, sizeof(real_vec[0]), data,  sizeof(data[0]),NY*sizeof(float2), NX, cudaMemcpyDeviceToHost);
cudaError  cudaStat2 = cudaMemcpy2D (imag_vec, sizeof(imag_vec[0]),data + 1,  sizeof(data[0]),NY*sizeof(float2), NX, cudaMemcpyDeviceToHost);

我得到的错误是无效的音调参数错误".但是我不明白为什么.对于目的地,我使用的间距大小为"float",而对于来源,我使用的间距大小为"float2"

The error i get is 'invalid pitch argument error'. But i can't understand why. For the destination I use a pitch size of 'float' while for the source i use size of 'float2'

推荐答案

您的问题和您的代码对我而言意义不大.

Your question and your code do not make much sense to me.

  1. 您正在执行批处理FFT,但是看来您既没有预见到足够的存储空间,既不能用于输入,也不用于输出数据;
  2. cufftExecR2C的输出是一个NX*(NY/2+1) float2矩阵,可以将其解释为NX*(NY+2) float矩阵.因此,对于最后一个cudaMemcpy,您没有为c(仅为NX*NY float)分配足够的空间.对于输出的连续部分,您仍然需要一个复杂的内存位置;
  3. 您的问题似乎与cufftExecR2C命令无关,但更为笼统:我如何将复杂的NX*NY矩阵拆分为包含实部和虚部的2 NX*NY实数矩阵,分别.
  1. You are performing a batched FFT, but it seems you are not foreseeing enough memory space neither for the input, nor for the output data;
  2. The output of cufftExecR2C is a NX*(NY/2+1) float2 matrix, which can be interpreted as a NX*(NY+2) float matrix. Accordingly, you are not allocating enough space for c (which is only NX*NY float) for the last cudaMemcpy. You would need still one complex memory location for the continuous component of the output;
  3. Your question does not seem to be related to the cufftExecR2C command, but is much more general: how can I split a complex NX*NY matrix into 2 NX*NY real matrices containing the real and imaginary parts, respectively.

如果我正确解释了您的问题,那么@njuffa在以下位置提出的解决方案

If I correctly interpret your question, then the solution proposed by @njuffa at

将数据复制到"cufftComplex"数据结构中?

可能是您的好主意.

编辑

下面是一个小示例,说明在将复杂向量从主机复制到设备或从设备复制到设备时,如何组装"和分解"复数向量的实部和虚部. 请添加您自己的CUDA错误检查.

In the following, a small example on how "assembling" and "disassembling" the real and imaginary parts of complex vectors when copying them from/to host to/from device. Please, add your own CUDA error checking.

#include <stdio.h>

#define N 16

int main() { 

    // Declaring, allocating and initializing a complex host vector
    float2* b = (float2*)malloc(N*sizeof(float2));
    printf("ORIGINAL DATA\n");
    for (int i=0; i<N; i++) {
        b[i].x = (float)i;
        b[i].y = 2.f*(float)i;
        printf("%f %f\n",b[i].x,b[i].y);
    }
    printf("\n\n");

    // Declaring and allocating a complex device vector
    float2 *data; cudaMalloc((void**)&data, sizeof(float2)*N);

    // Copying the complex host vector to device
    cudaMemcpy(data, b, N*sizeof(float2), cudaMemcpyHostToDevice);

    // Declaring and allocating space on the host for the real and imaginary parts of the complex vector
    float* cr = (float*)malloc(N*sizeof(float));       
    float* ci = (float*)malloc(N*sizeof(float));       

    /*******************************************************************/
    /* DISASSEMBLING THE COMPLEX DATA WHEN COPYING FROM DEVICE TO HOST */
    /*******************************************************************/
    float* tmp_d = (float*)data;

    cudaMemcpy2D(cr,        sizeof(float), tmp_d,    2*sizeof(float), sizeof(float), N, cudaMemcpyDeviceToHost);
    cudaMemcpy2D(ci,        sizeof(float), tmp_d+1,  2*sizeof(float), sizeof(float), N, cudaMemcpyDeviceToHost);

    printf("DISASSEMBLED REAL AND IMAGINARY PARTS\n");
    for (int i=0; i<N; i++)
        printf("cr[%i] = %f; ci[%i] = %f\n",i,cr[i],i,ci[i]);
    printf("\n\n");

    /******************************************************************************/
    /* REASSEMBLING THE REAL AND IMAGINARY PARTS WHEN COPYING FROM HOST TO DEVICE */
    /******************************************************************************/
    cudaMemcpy2D(tmp_d,     2*sizeof(float), cr, sizeof(float), sizeof(float), N, cudaMemcpyHostToDevice);
    cudaMemcpy2D(tmp_d + 1, 2*sizeof(float), ci, sizeof(float), sizeof(float), N, cudaMemcpyHostToDevice);

    // Copying the complex device vector to host
    cudaMemcpy(b, data, N*sizeof(float2), cudaMemcpyHostToDevice);
    printf("REASSEMBLED DATA\n");
    for (int i=0; i<N; i++) 
        printf("%f %f\n",b[i].x,b[i].y);
    printf("\n\n");

    getchar();

    return 0;
 } 

这篇关于如何在CUDA中分别获取复杂矩阵的实部和虚部?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆