如何快速获得cufftcomplex的幅度和相位 [英] how to get cufftcomplex magnitude and phase fast
问题描述
由于 cufftExecR2C
对GPU上的数据进行操作,因此结果已经在GPU上(在将它们复制回之前,主机,如果您正在这样做.)
编写您自己的cuda内核来完成此任务应该很简单.您描述的幅度是 cuComplex.h
头文件中的 cuCabs
或 cuCabsf
返回的值.通过查看该头文件中的函数,您应该能够弄清楚如何编写自己的计算相角的工具.您会注意到, cufftComplex
是只是typedef的 cuComplex
.
sz
的数组 data
中留下了类型为 cufftComplex
的一些结果.您的内核可能看起来像这样: #include< math.h>#include< cuComplex.h>#include< cufft.h>#define nTPB 256//内核的每个块的线程数#define sz 100000////或FFT中输出数据的大小...__host__ __device__ float carg(const cuComplex& z){return atan2(cuCimagf(z),cuCrealf(z));}//极角__global__ void magphase(cufftComplex * data,float * mag,float * phase,int dsz){int idx = threadIdx.x + blockDim.x * blockIdx.x;如果(idx< dsz){mag [idx] = cuCabsf(data [idx]);phase [idx] = carg(data [idx]);}}...int main(){.../*使用CUFFT计划将信号转换到适当的位置.*//*您的代码可能已经像这样:*/if(cufftExecR2C(plan,(cufftReal *)data,data)!= CUFFT_SUCCESS){fprintf(stderr,"CUFFT错误:ExecR2C转发失败");返回;}/*,那么您可以添加:*/浮动* h_mag,* h_phase,* d_mag,* d_phase;//首先使用主机malloc分配h_数组,然后...cudaMalloc((void **)& d_mag,sz * sizeof(float));cudaMalloc((void **)&d_phase,sz * sizeof(float));<(sz + nTPB-1)/nTPB,nTPB>(数据,d_mag,d_phase,sz);cudaMemcpy(h_mag,d_mag,sz * sizeof(float),cudaMemcpyDeviceToHost);cudaMemcpy(h_phase,d_phase,sz * sizeof(float),cudaMemcpyDeviceToHost);
您也可以使用推力为此创建函子幅度和相位函数,并将这些函子与 data
, mag
和 phase
一起传递给 CUBLAS 来做到这一点,结合使用向量加法和向量乘法运算.
此问题/答案可能也会引起关注.我从那里拿起了相位函数 carg
.
i have a cufftcomplex data block which is the result from cuda fft(R2C). i know the data is save as a structure with a real number followed by image number. now i want to get the amplitude=sqrt(R*R+I*I), and phase=arctan(I/R) of each complex element by a fast way(not for loop). Is there any good way to do that? or any library could do that?
Since cufftExecR2C
operates on data that is on the GPU, the results are already on the GPU, (before you copy them back to the host, if you are doing that.)
It should be straightforward to write your own cuda kernel to accomplish this. The amplitude you're describing is the value returned by cuCabs
or cuCabsf
in cuComplex.h
header file. By looking at the functions in that header file, you should be able to figure out how to write your own that computes the phase angle. You'll note that cufftComplex
is just a typedef of cuComplex
.
let's say your cufftExecR2C call left some results of type cufftComplex
in array data
of size sz
. Your kernel might look like this:
#include <math.h>
#include <cuComplex.h>
#include <cufft.h>
#define nTPB 256 // threads per block for kernel
#define sz 100000 // or whatever your output data size is from the FFT
...
__host__ __device__ float carg(const cuComplex& z) {return atan2(cuCimagf(z), cuCrealf(z));} // polar angle
__global__ void magphase(cufftComplex *data, float *mag, float *phase, int dsz){
int idx = threadIdx.x + blockDim.x*blockIdx.x;
if (idx < dsz){
mag[idx] = cuCabsf(data[idx]);
phase[idx] = carg(data[idx]);
}
}
...
int main(){
...
/* Use the CUFFT plan to transform the signal in place. */
/* Your code might be something like this already: */
if (cufftExecR2C(plan, (cufftReal*)data, data) != CUFFT_SUCCESS){
fprintf(stderr, "CUFFT error: ExecR2C Forward failed");
return;
}
/* then you might add: */
float *h_mag, *h_phase, *d_mag, *d_phase;
// malloc your h_ arrays using host malloc first, then...
cudaMalloc((void **)&d_mag, sz*sizeof(float));
cudaMalloc((void **)&d_phase, sz*sizeof(float));
magphase<<<(sz+nTPB-1)/nTPB, nTPB>>>(data, d_mag, d_phase, sz);
cudaMemcpy(h_mag, d_mag, sz*sizeof(float), cudaMemcpyDeviceToHost);
cudaMemcpy(h_phase, d_phase, sz*sizeof(float), cudaMemcpyDeviceToHost);
You can also do this using thrust by creating functors for the magnitude and phase functions, and passing these functors along with data
, mag
and phase
to thrust::transform.
I'm sure you can probably do it with CUBLAS as well, using a combination of vector add and vector multiply operations.
This question/answer may be of interest as well. I lifted my phase function carg
from there.
这篇关于如何快速获得cufftcomplex的幅度和相位的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!