用常数双指数优化幂的和 [英] Optimize sum of powers with constant double exponent

查看:135
本文介绍了用常数双指数优化幂的和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经实现了一个算法,在某些时候需要计算一个向量的元素的权力的和。功率是正的双倍,在循环期间是恒定的。我想,这个计算目前是我的程序的瓶颈,并想知道是否有一种方法加快以下代码片段:

  double SumOfPowers(std :: vector< double>& aVector,double exponent)
{
double help = 0;
size_t sizeOfaVector = aVector.size();
for(size_t k = 0; k {
help + = std :: pow(aVector [k],exponent);
}
return help;
}

我有一种感觉,好像可以利用指数是常数在循环和减少昂贵的std :: pow调用。有没有人知道一个更好的方法来实现这个,或者如果有一个即将使用的库函数来完成这项工作?

解决方案

首先,检查循环是否是矢量化的。为此,使用 -O3 (这里和下面我假设gcc编译器;我不知道其他编译器,但我希望他们有类似的选择) 。还可以添加 -ftree-vectorizer-verbose = 2 选项以获取有关哪些循环被矢量化的详细输出。你可能想要玩弄选项来获得你想要的输出。



如果循环没有被矢量化,那么你可以使它矢量化。您可能需要更改循环结构(例如,首先将所有权重计算为单独的数组,然后才计算总和),或者使用某种方式向编译器告知一些更多信息,例如 restrict 声明,请参阅使用gcc 4.7进行自动矢量化以获取更多讨论。在最坏的情况下,我认为,你可以手动实现矢量化,我记得有这样的函数,参见 Intel的参考资料SSE SIMD with C ++的实用指南。 / p>

对于Visual Studio,首先添加 / Qvec-report:2 选项以获取详细报告。上面的所有其他建议也适用,你只需要找到相应的MSVC选项。






使用 -ffast-math 选项来牺牲精度。 AFAIK,标准 pow 函数使用一些高级逻辑来检查base或exponent是否真的接近1的情况,以避免精度问题。如果这不是你的情况,你可能不需要那个逻辑。我想, -ffast-math 会丢弃它,虽然你可能想检查它。



无论如何,只需将 pow 替换为 exp(log(...)* ...)即可手动避免这些检查。这不会给你太多的加速,但你可能会注意到一些收获。此外,如果您经常将相同的向量提升到不同的指数,您可以预先计算 log s。


I have implemented an algorithm which at some point needs to calculate the sum of the powers of the elements of a vector. The power is a positive double which is constant during the loop. I figured out, that this calculation is currently the bottleneck of my program and wonder if there is a way to speed up the following code snippet:

double SumOfPowers(std::vector<double>& aVector,double exponent)
{
    double help = 0;
    size_t sizeOfaVector = aVector.size();
    for (size_t k = 0; k < sizeOfaVector; k++)
    {
        help += std::pow(aVector[k], exponent);
    }
    return help;
}

I have a feeling as if one could exploit the fact that the exponent is constant during the loop and reduce the expensive std::pow calls. Does anyone know a better way of implementing this, or if there is a ready to use library function that does the job?

解决方案

Firstly, check whether the loop is vectorized. For this, build your program with -O3 (here and in what follows I assume gcc compiler; I do not know much about other compilers, but I expect they have similar options). Add also -ftree-vectorizer-verbose=2 option to get detailed output on which loops were vectorized. You might want to play around with options to get the output you want.

If the loop was not vectorized, then you can make it vectorized. You might need to change the loop structure (for example, first compute all powers into a separate array, and only then calculate the sum), or use some way to tell some more info to the compiler such as restrict declaration, see "Auto-vectorization with gcc 4.7" for more discussion. In the worst case, I think, you can implement vectorization by hand, I remember that there were such functions, see Intel's reference or "A practical guide to SSE SIMD with C++".

For Visual Studio, start with adding /Qvec-report:2 option to get verbose reporting. All the other suggestions from above do also apply, you will just need to find corresponding MSVC options.


Another way to speed the things up is to sacrifice precision by using -ffast-math option. AFAIK, the standard pow function uses some advanced logic to check for cases when base or exponent are really close to 1 to avoid precision problems there. If this is not your case, you might not need that logic. I suppose that -ffast-math drops it, although you might want to check it.

Anyway, you can just replace pow by exp(log(...)*...) to avoid these checks manually. This will not give you much speedup, but you might notice some gain. Also, in case you frequently raise the same vector to different exponents, you can pre-calculate logs.

这篇关于用常数双指数优化幂的和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆