安全快速的FFT [英] Safe and fast FFT

查看:108
本文介绍了安全快速的FFT的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

灵感来自Herb Sutter的引人注目的演讲不是你父亲的C ++ ,我决定使用Microsoft的Visual Studio 2010再次查看最新版本的C ++。我特别感兴趣的是Herb断言C ++是安全快速,因为我写了很多关键性能的代码。



作为一个基准,我决定尝试用各种语言写同样简单的FFT算法。



我想出了以下使用内置的复杂类型和向量集合的C ++ 11代码:

  #include< complex> 
#include< vector>

using namespace std;

//必须提供类型或MSVC ++ barfs对重载函数的模糊调用
double pi = 4 * atan(1.0);

void fft(int sign,vector< complex< double>& zs){
unsigned int j = 0;
//关于签名与无符号比较的警告
for(unsigned int i = 0; i if(i < {
auto t = zs.at(i);
zs.at(i)= zs.at(j);
zs.at(j)= t;
}
int m = zs.size()/ 2;
j ^ = m;
while((j& m)== 0){m / = 2; j ^ = m; }
}
for(unsigned int j = 1; j for(unsigned int m = 0; m auto t = pi * sign * m / j;
auto w = complex< double>(cos(t),sin(t));
for(unsigned int i = m; i complex< double> zi = zs.at(i),t = w * zs.at(i + j);
zs.at(i)= zi + t;
zs.at(i + j)= zi-t;
}
}
}

n - 元素向量其中 n 是二的整数幂。任何人寻找适用于任何 n 的快速FFT代码应该查看 FFTW



根据我的理解,来自C的传统 xs [i] 语法索引 vector 不进行边界检查,因此不是内存安全,可能是内存错误的来源,如非确定性损坏和内存访问冲突。所以我使用 xs.at(i)



现在,我希望这个代码是快速,但我不是一个C ++ 11的专家,所以我想要求改进这个代码,这将使它更成熟或更高效?

解决方案

我认为你在使用at()时过于安全。在大多数情况下,使用的索引是可以验证的,因为受到for循环中容器的大小的限制。



例如

  for(unsigned int i = 0; i  ... 
t = zs.at(i);

我在()的时候只剩下(i + j)s。它不是立即明显是否他们总是受约束(虽然如果我真的不确定我可能手动检查 - 但我不熟悉FFT足够在这种情况下有意见)。



对于每个循环迭代,还有一些固定的计算被重复:

  int m = zs.size ()/ 2; 
pi * sign
2 * j

而zs.at + j)被计算两次。



这可能是优化器可能捕获到这些 - 但如果你把这个看作性能关键,并且你的计时器测试它,我会把它们循环(或者,在zs.at(i + j)的情况下,只是参考第一次使用),并看看是否影响定时器。



二次猜测优化器:我确信对.size()的调用将被内联为,至少直接调用一个内部成员变量 - 但是给定多少次你调用它我也试验介绍zs.size()和zs.size()的局部变量 - 1个前面。



我不知道有多少差异(如果有的话)总运行时间 - 它的一些可能已经被优化器捕获,并且与所涉及的计算相比,差异可能较小 - 但是值得一试。



至于习惯我唯一的评论,真的,是size()返回一个std :: size_t(这通常是一个unsigned int的typedef - 但它更习惯使用该类型)。如果你确实想使用自动,但避免警告,你可以尝试添加ul后缀到0 - 不知道我会说这是惯用的,虽然。我想你在这里没有使用迭代器,但我可以看到为什么你不能这样做(很容易)。



更新



我给了我所有的建议,他们都有一个可衡量的性能改进 - 除了i + j和2 * j precalcs - 他们实际上造成轻微减速!我假定他们阻止了编译器优化或阻止它使用寄存器的一些东西。



总体上我有一个> 10%的性能。改进与这些建议。
我怀疑更多的可能是如果第二块循环被重构了一些,以避免跳转 - 并且这样启用SSE2指令集可以提供一个显着的提升(我没有尝试,看到一个轻微的减速)。



我认为重构,以及使用像MKL这样的cos和sin调用应该给出更大,更不脆弱的改进。这些东西都不会是语言依赖的(我知道这是最初与F#实现的比较)。



更新2 >

我忘了提前预先计算zs.size()确实有所作为。



更新3



也忘记说(在评论OP时由@xeo提醒) j检查可以归结到一个std :: swap。这是更惯用的,至少作为执行 - 在最坏的情况下应该内联到相同的代码。事实上,当我做到了,我看到没有改变的表现。在其他情况下,如果移动构造函数可用,它可以导致性能增益。


Inspired by Herb Sutter's compelling lecture Not your father's C++, I decided to take another look at the latest version of C++ using Microsoft's Visual Studio 2010. I was particularly interested by Herb's assertion that C++ is "safe and fast" because I write a lot of performance-critical code.

As a benchmark, I decided to try to write the same simple FFT algorithm in a variety of languages.

I came up with the following C++11 code that uses the built-in complex type and vector collection:

#include <complex>
#include <vector>

using namespace std;

// Must provide type or MSVC++ barfs with "ambiguous call to overloaded function"
double pi = 4 * atan(1.0);

void fft(int sign, vector<complex<double>> &zs) {
    unsigned int j=0;
    // Warning about signed vs unsigned comparison
    for(unsigned int i=0; i<zs.size()-1; ++i) {
        if (i < j) {
            auto t = zs.at(i);
            zs.at(i) = zs.at(j);
            zs.at(j) = t;
        }
        int m=zs.size()/2;
        j^=m;
        while ((j & m) == 0) { m/=2; j^=m; }
    }
    for(unsigned int j=1; j<zs.size(); j*=2)
        for(unsigned int m=0; m<j; ++m) {
            auto t = pi * sign * m / j;
            auto w = complex<double>(cos(t), sin(t));
            for(unsigned int i = m; i<zs.size(); i+=2*j) {
                complex<double> zi = zs.at(i), t = w * zs.at(i + j);
                zs.at(i) = zi + t;
                zs.at(i + j) = zi - t;
            }
        }
}

Note that this function only works for n-element vectors where n is an integral power of two. Anyone looking for fast FFT code that works for any n should look at FFTW.

As I understand it, the traditional xs[i] syntax from C for indexing a vector does not do bounds checking and, consequently, is not memory safe and can be a source of memory errors such as non-deterministic corruption and memory access violations. So I used xs.at(i) instead.

Now, I want this code to be "safe and fast" but I am not a C++11 expert so I'd like to ask for improvements to this code that would make it more idiomatic or efficient?

解决方案

I think you are being overly "safe" in your use of at(). In most of your cases the index used is trivially verifable as being constrained by the size of the container in the for loop.

e.g.

  for(unsigned int i=0; i<zs.size()-1; ++i) { 
    ...
    auto t = zs.at(i); 

The only ones I'd leave as at()s are the (i + j)s. It's not immediately obvious whether they would always be constrained (although if I was really unsure I'd probably manually check - but I'm not familiar with FFTs enough to have an opinion in this case).

There are also some fixed computations being repeated for each loop iteration:

int m=zs.size()/2;
pi * sign
2*j

And the zs.at(i + j) is computed twice.

It's possible that the optimiser may catch these - but if you are treating this as performance critical, and you have your timers testing it, I'd hoist them out of the loops (or, in the case of zs.at(i + j), just take a reference on first use) and see if that impacts the timer.

Talking of second-guessing the optimiser: I'm sure that the calls to .size() will be inlined as, at least, a direct call to an internal member variable - but given how many times you call it I'd also experiment with introducing local variables for zs.size() and zs.size()-1 upfront. They're more likely to be put into registers that way too.

I don't know how much of a difference (if any) all of this will have on your total runtime - some of it may already be caught by the optimiser, and the differences may be small compared to the computations involved - but worth a shot.

As for being idiomatic my only comment, really, is that size() returns a std::size_t (which is usually a typedef for an unsigned int - but it's more idiomatic to use that type instead). If you did want to use auto but avoid the warning you could try adding the ul suffix to the 0 - not sure I'd say that is idiomatic, though. I suppose you're already less than idiomatic in not using iterators here, but I can see why you can't do that (easily).

Update

I gave all my suggestions a try and they all had a measurable performance improvement - except the i+j and 2*j precalcs - they actually caused a slight slowdown! I presume they either prevented a compiler optimisation or prevented it from using registers for some things.

Overall I got a >10% perf. improvement with those suggestions. I suspect more could be had if the second block of loops was refactored a little to avoid the jumps - and having done so enabling SSE2 instruction set may give a significant boost (I did try it as is and saw a slight slowdown).

I think that refactoring, along with using something like MKL for the cos and sin calls should give greater, and less brittle, improvements. And neither of those things would be language dependent (I know this was originally being compared to an F# implementation).

Update 2

I forgot to mention that pre-calculating zs.size() did make a difference.

Update 3

Also forgot to say (until reminded by @xeo in comment to OP) that the block following the i < j check can be boiled down to a std::swap. This is more idiomatic and at least as performant - in the worst case should inline to the same code as written. Indeed when I did it I saw no change in the performance. In other cases it can lead to a performance gain if move constructors are available.

这篇关于安全快速的FFT的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆