加速FFTW修剪以避免大量零填充 [英] Accelerating FFTW pruning to avoid massive zero padding

查看：114 发布时间：2020/6/26 12:52:37 c++ c fft fftw pruning

本文介绍了加速FFTW修剪以避免大量零填充的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一个长度K * N的序列x(n)，并且只有第一个N元素不同于零.我假设N << K，例如N = 10和K = 100000.我想通过FFTW计算这样一个序列的FFT.这等效于具有长度为N的序列，并且对K * N的填充为零.因为N和K可能是大"的，所以我有一个很大的零填充.我正在研究是否可以节省一些计算时间，避免显式零填充.

Suppose that I have a sequence x(n) which is K * N long and that only the first N elements are different from zero. I'm assuming that N << K, say, for example, N = 10 and K = 100000. I want to calculate the FFT, by FFTW, of such a sequence. This is equivalent to having a sequence of length N and having a zero padding to K * N. Since N and K may be "large", I have a significant zero padding. I'm exploring if I can save some computation time avoid explicit zero padding.

案件K = 2

The case K = 2

让我们开始考虑案例K = 2.在这种情况下，x(n)的DFT可以写为

Let us begin by considering the case K = 2. In this case, the DFT of x(n) can be written as

如果k是偶数，即k = 2 * m，则

这意味着可以通过对长度为N而不是K * N的序列进行FFT来计算DFT的这些值.

which means that such values of the DFT can be calculated through an FFT of a sequence of length N, and not K * N.

如果k为奇数，即k = 2 * m + 1，则

这意味着可以通过对长度为N而不是K * N的序列进行FFT来再次计算DFT的此类值.

which means that such values of the DFT can be again calculated through an FFT of a sequence of length N, and not K * N.

因此，总而言之，我可以将长度为2 * N的单个FFT与长度为N的2 FFT交换.

So, in conclusion, I can exchange a single FFT of length 2 * N with 2 FFTs of length N.

任意K

The case of arbitrary K

在这种情况下，我们有

在撰写k = m * K + t时，我们有

因此，总而言之，我可以将长度为K * N的单个FFT与长度为N的K FFT交换.由于FFTW具有fftw_plan_many_dft，我可以期望在单次FFT的情况下会有所收获.

So, in conclusion, I can exchange a single FFT of length K * N with K FFTs of length N. Since the FFTW has fftw_plan_many_dft, I can expect to have some gaining against the case of a single FFT.

为验证这一点，我设置了以下代码

To verify that, I have set up the following code

#include <stdio.h>
#include <stdlib.h>     /* srand, rand */
#include <time.h>       /* time */
#include <math.h>
#include <fstream>

#include <fftw3.h>

#include "TimingCPU.h"

#define PI_d            3.141592653589793

void main() {

    const int N = 10;
    const int K = 100000;

    fftw_plan plan_zp;

    fftw_complex *h_x = (fftw_complex *)malloc(N     * sizeof(fftw_complex));
    fftw_complex *h_xzp = (fftw_complex *)calloc(N * K, sizeof(fftw_complex));
    fftw_complex *h_xpruning = (fftw_complex *)malloc(N * K * sizeof(fftw_complex));
    fftw_complex *h_xhatpruning = (fftw_complex *)malloc(N * K * sizeof(fftw_complex));
    fftw_complex *h_xhatpruning_temp = (fftw_complex *)malloc(N * K * sizeof(fftw_complex));
    fftw_complex *h_xhat = (fftw_complex *)malloc(N * K * sizeof(fftw_complex));

    // --- Random number generation of the data sequence
    srand(time(NULL));
    for (int k = 0; k < N; k++) {
        h_x[k][0] = (double)rand() / (double)RAND_MAX;
        h_x[k][1] = (double)rand() / (double)RAND_MAX;
    }

    memcpy(h_xzp, h_x, N * sizeof(fftw_complex));

    plan_zp = fftw_plan_dft_1d(N * K, h_xzp, h_xhat, FFTW_FORWARD, FFTW_ESTIMATE);
    fftw_plan plan_pruning = fftw_plan_many_dft(1, &N, K, h_xpruning, NULL, 1, N, h_xhatpruning_temp, NULL, 1, N, FFTW_FORWARD, FFTW_ESTIMATE);

    TimingCPU timerCPU;
    timerCPU.StartCounter();
    fftw_execute(plan_zp);
    printf("Stadard %f\n", timerCPU.GetCounter());

    timerCPU.StartCounter();
    double factor = -2. * PI_d / (K * N);
    for (int k = 0; k < K; k++) {
        double arg1 = factor * k;
        for (int n = 0; n < N; n++) {
            double arg = arg1 * n;
            double cosarg = cos(arg);
            double sinarg = sin(arg);
            h_xpruning[k * N + n][0] = h_x[n][0] * cosarg - h_x[n][1] * sinarg;
            h_xpruning[k * N + n][1] = h_x[n][0] * sinarg + h_x[n][1] * cosarg;
        }
    }
    printf("Optimized first step %f\n", timerCPU.GetCounter());

    timerCPU.StartCounter();
    fftw_execute(plan_pruning);
    printf("Optimized second step %f\n", timerCPU.GetCounter());
    timerCPU.StartCounter();
    for (int k = 0; k < K; k++) {
        for (int p = 0; p < N; p++) {
            h_xhatpruning[p * K + k][0] = h_xhatpruning_temp[p + k * N][0];
            h_xhatpruning[p * K + k][1] = h_xhatpruning_temp[p + k * N][1];
        }
    }
    printf("Optimized third step %f\n", timerCPU.GetCounter());

    double rmserror = 0., norm = 0.;
    for (int n = 0; n < N; n++) {
        rmserror = rmserror + (h_xhatpruning[n][0] - h_xhat[n][0]) * (h_xhatpruning[n][0] - h_xhat[n][0]) + (h_xhatpruning[n][1] - h_xhat[n][1]) * (h_xhatpruning[n][1] - h_xhat[n][1]);
        norm = norm + h_xhat[n][0] * h_xhat[n][0] + h_xhat[n][1] * h_xhat[n][1];
    }
    printf("rmserror %f\n", 100. * sqrt(rmserror / norm));

    fftw_destroy_plan(plan_zp);

}

我开发的方法包括三个步骤:

The approach I have developed consists of three steps:

将输入序列乘以缠绕"复杂指数；
执行fftw_many;
重新组织结果.

Multiplying the input sequence by "twiddle" complex exponentials;
Performing the fftw_many;
Reorganizing the results.

fftw_many比K * N输入点上的单个FFTW更快.但是，步骤1和步骤3完全破坏了这种增益.我希望第1步和第3步在计算上比第2步轻得多.

The fftw_many is faster than the single FFTW on K * N input points. However, steps #1 and #3 completely destroy such a gain. I would expect that steps #1 and #3 be computationally much lighter than step #2.

我的问题是:

步骤1和步骤3a在计算上比步骤2要求高的可能性如何?
与标准"方法相比，如何改善步骤1和3以获得净收益?

非常感谢您提供任何提示.

Thank you very much for any hint.

编辑

我正在使用Visual Studio 2013，并以发布模式进行编译.

I'm working with Visual Studio 2013 and compiling in Release mode.

加速FFTW修剪以避免大量零填充 [英] Accelerating FFTW pruning to avoid massive zero padding

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录关闭

加速FFTW修剪以避免大量零填充 [英] Accelerating FFTW pruning to avoid massive zero padding

问题描述

推荐答案

相关文章

C/C++开发最新文章

热门教程

热门工具

登录 关闭

登录关闭