是否有快速的fabsf替代"float"?在C ++中? [英] Is there a fast fabsf replacement for "float" in C++?

查看:131
本文介绍了是否有快速的fabsf替代"float"?在C ++中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是在做一些基准测试,发现fabsf()通常比fabs()慢10倍.所以我反汇编了一下,结果发现double版本正在使用fabs指令,而float版本却没有.这可以改善吗?这样速度更快,但速度却不太快,恐怕可能无法正常工作,因为它的级别太低了:

I'm just doing some benchmarking and found out that fabsf() is often like 10x slower than fabs(). So I disassembled it and it turns out the double version is using fabs instruction, float version is not. Can this be improved? This is faster, but not so much and I'm afraid it may not work, it's a little too lowlevel:

float mabs(float i)
{
    (*reinterpret_cast<MUINT32*>(&i)) &= 0x7fffffff;
    return i;
}

对不起,我忘了编译器-我仍然使用旧的VS2005,没有特殊的库.

Sorry forgot about the compiler - I still use the good old VS2005, no special libs.

推荐答案

您可以使用下面的代码.它实质上是针对天真模板abs和std::abs测试您的摆弄.毫不奇怪,幼稚的模板abs胜出.好吧,这真是令人惊讶.我希望std::abs同样快.请注意,-O3实际上会使事情变慢(至少在coliru上如此).

You can easily test different possibilities using the code below. It essentially tests your bitfiddling against naive template abs, and std::abs. Not surprisingly, naive template abs wins. Well, kind of surprisingly it wins. I'd expect std::abs to be equally fast. Note that -O3 actually makes things slower (at least on coliru).

Coliru的主机系统显示以下时间:

Coliru's host system shows these timings:

random number generation: 4240 ms
naive template abs: 190 ms
ugly bitfiddling abs: 241 ms
std::abs: 204 ms
::fabsf: 202 ms

对于在Core i7上运行带有GCC 4.9 Arch的Virtualbox VM的这些计时:

And these timings for a Virtualbox VM running Arch with GCC 4.9 on a Core i7:

random number generation: 1453 ms
naive template abs: 73 ms
ugly bitfiddling abs: 97 ms
std::abs: 57 ms
::fabsf: 80 ms

MSVS2013(Windows 7 x64)上的这些计时:

And these timings on MSVS2013 (Windows 7 x64):

random number generation: 671 ms
naive template abs: 59 ms
ugly bitfiddling abs: 129 ms
std::abs: 109 ms
::fabsf: 109 ms

如果我没有在此基准代码中犯一些明显的错误(不要朝我开枪,我大约在2分钟内写下了它),我想说的就是使用std::abs或模板版本如果结果对您来说更快一点.

If I haven't made some blatantly obvious mistake in this benchmark code (don't shoot me over it, I wrote this up in about 2 minutes), I'd say just use std::abs, or the template version if that turns out to be slightly faster for you.

代码:

#include <algorithm>
#include <cmath>
#include <cstdint>
#include <cstdlib>
#include <chrono>
#include <iostream>
#include <random>
#include <vector>

#include <math.h>

using Clock = std::chrono::high_resolution_clock;
using milliseconds = std::chrono::milliseconds;

template<typename T>
T abs_template(T t)
{
  return t>0 ? t : -t;
}

float abs_ugly(float f)
{
  (*reinterpret_cast<std::uint32_t*>(&f)) &= 0x7fffffff;
  return f;
}

int main()
{
  std::random_device rd;
  std::mt19937 mersenne(rd());
  std::uniform_real_distribution<> dist(-std::numeric_limits<float>::lowest(), std::numeric_limits<float>::max());

  std::vector<float> v(100000000);

  Clock::time_point t0 = Clock::now();

  std::generate(std::begin(v), std::end(v), [&dist, &mersenne]() { return dist(mersenne); });

  Clock::time_point trand = Clock::now();

  volatile float temp;
  for (float f : v)
    temp = abs_template(f);

  Clock::time_point ttemplate = Clock::now();

  for (float f : v)
    temp = abs_ugly(f);

  Clock::time_point tugly = Clock::now();

  for (float f : v)
    temp = std::abs(f);

  Clock::time_point tstd = Clock::now();

  for (float f : v)
    temp = ::fabsf(f);

  Clock::time_point tfabsf = Clock::now();

  milliseconds random_time = std::chrono::duration_cast<milliseconds>(trand - t0);
  milliseconds template_time = std::chrono::duration_cast<milliseconds>(ttemplate - trand);
  milliseconds ugly_time = std::chrono::duration_cast<milliseconds>(tugly - ttemplate);
  milliseconds std_time = std::chrono::duration_cast<milliseconds>(tstd - tugly);
  milliseconds c_time = std::chrono::duration_cast<milliseconds>(tfabsf - tstd);
  std::cout << "random number generation: " << random_time.count() << " ms\n"
    << "naive template abs: " << template_time.count() << " ms\n"
    << "ugly bitfiddling abs: " << ugly_time.count() << " ms\n"
    << "std::abs: " << std_time.count() << " ms\n"
    << "::fabsf: " << c_time.count() << " ms\n";
}


哦,要回答您的实际问题:如果编译器无法生成更有效的代码,我怀疑是否存在一种更快的方法来节省微优化程序集,尤其是对于诸如此类的基本操作.


Oh, and to answer your actual question: if the compiler can't generate more efficient code, I doubt there is a faster way save for micro-optimized assembly, especially for elementary operations such as this.

这篇关于是否有快速的fabsf替代"float"?在C ++中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆