在哪里可以找到世界上最快的atof实施方案? [英] Where can I find the world's fastest atof implementation?

查看:63
本文介绍了在哪里可以找到世界上最快的atof实施方案?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找在IA32上非常快速的atof()实现,该实现针对美国语言环境,ASCII和非科学符号进行了优化. Windows多线程CRT在这里很痛苦地崩溃,因为它在每次调用isdigit()时检查区域设置的更改.我们目前的最佳表现源于perl + tcl的atof实现的最佳表现,并且比msvcrt.dll的atof表现高出一个数量级.我想做得更好,但是没有主意.与BCD相关的x86指令似乎很有希望,但是我无法胜过perl/tcl C代码.任何SO'ers都可以挖掘出与那里最好的人的联系吗?也欢迎使用非基于x86程序集的解决方案.

I'm looking for an extremely fast atof() implementation on IA32 optimized for US-en locale, ASCII, and non-scientific notation. The windows multithreaded CRT falls down miserably here as it checks for locale changes on every call to isdigit(). Our current best is derived from the best of perl + tcl's atof implementation, and outperforms msvcrt.dll's atof by an order of magnitude. I want to do better, but am out of ideas. The BCD related x86 instructions seemed promising, but I couldn't get it to outperform the perl/tcl C code. Can any SO'ers dig up a link to the best out there? Non x86 assembly based solutions are also welcome.

基于初始答案的说明:

对于此应用程序,大约2 ulp的误差是可以的.
要转换的数字将以小批量的形式通过网络以ascii消息的形式到达,我们的应用程序需要以尽可能低的延迟来转换它们.

Inaccuracies of ~2 ulp are fine for this application.
The numbers to be converted will arrive in ascii messages over the network in small batches and our application needs to convert them in the lowest latency possible.

推荐答案

您的准确性要求是什么?如果您真正需要它正确"(总是获得最接近指定小数点的浮点值),则可能很难超越标准库版本(除了删除已经支持的语言环境之外),因为这需要进行任意精度的算术运算.如果您愿意容忍一个或两个以上的错误(并且对次标准错误的容忍度更高),则由cruzer提出的方法可以行得通,并且可能更快,但绝对不会产生<0.5ulp的输出.您将在精度方面做得更好,分别计算整数部分和小数部分,并在末尾计算分数(例如,对于12345.6789,将其计算为12345 + 6789/10000.0,而不是6 * .1 + 7 * .01 + 8 * .001 + 9 * 0.0001),因为0.1是不合理的二进制分数,并且当您计算0.1 ^ n时,误差会迅速累积.这也使您可以使用整数而不是浮点数来进行大多数数学运算.

What is your accuracy requirement? If you truly need it "correct" (always gets the nearest floating-point value to the decimal specified), it will probably be hard to beat the standard library versions (other than removing locale support, which you've already done), since this requires doing arbitrary precision arithmetic. If you're willing to tolerate an ulp or two of error (and more than that for subnormals), the sort of approach proposed by cruzer's can work and may be faster, but it definitely will not produce <0.5ulp output. You will do better accuracy-wise to compute the integer and fractional parts separately, and compute the fraction at the end (e.g. for 12345.6789, compute it as 12345 + 6789 / 10000.0, rather than 6*.1 + 7*.01 + 8*.001 + 9*0.0001) since 0.1 is an irrational binary fraction and error will accumulate rapidly as you compute 0.1^n. This also lets you do most of the math with integers instead of floats.

自(IIRC)286起,BCD指令就尚未在硬件中实现,如今已被简单地微编码.它们不太可能具有很高的性能.

The BCD instructions haven't been implemented in hardware since (IIRC) the 286, and are simply microcoded nowadays. They are unlikely to be particularly high-performance.

这篇关于在哪里可以找到世界上最快的atof实施方案?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆