为什么`float`函数比1.0慢? [英] Why `float` function is slower than multiplying by 1.0?

查看:111
本文介绍了为什么`float`函数比1.0慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道这可能是没有问题的,但是我写的是针对HPC环境的软件,因此3.5倍的速度提高实际上有所不同.

In [1]: %timeit 10 / float(98765)            
1000000 loops, best of 3: 313 ns per loop

In [2]: %timeit 10 / (98765 * 1.0)
10000000 loops, best of 3: 80.6 ns per loop

我使用dis来查看代码,并且我认为float()会比较慢,因为它需要一个函数调用(不幸的是,我无法dis.dis(float)看到它的实际作用).

我想第二个问题是何时应该使用float(n)以及何时应该使用n * 1.0?

解决方案

因为窥孔优化器通过预先计算该乘法的结果对其进行了优化

import dis
dis.dis(compile("10 / float(98765)", "<string>", "eval"))

  1           0 LOAD_CONST               0 (10)
              3 LOAD_NAME                0 (float)
              6 LOAD_CONST               1 (98765)
              9 CALL_FUNCTION            1
             12 BINARY_DIVIDE       
             13 RETURN_VALUE        

dis.dis(compile("10 / (98765 * 1.0)", "<string>", "eval"))

  1           0 LOAD_CONST               0 (10)
              3 LOAD_CONST               3 (98765.0)
              6 BINARY_DIVIDE       
              7 RETURN_VALUE        

它将98765 * 1.0的结果存储在字节码中作为常量值.因此,它只需要加载并划分即可,在第一种情况下,我们必须调用该函数.

我们可以这样更清楚地看到

print compile("10 / (98765 * 1.0)", "<string>", "eval").co_consts
# (10, 98765, 1.0, 98765.0)

由于该值是在编译时本身预先计算的,因此第二个更快.

Python实际窥孔优化器代码中的注释2.7.9

         /* Cannot fold this operation statically since
           the result can depend on the run-time presence
           of the -Qnew flag */
 

I understand that this could be argued as a non-issue, but I write software for HPC environments, so this 3.5x speed increase actually makes a difference.

In [1]: %timeit 10 / float(98765)            
1000000 loops, best of 3: 313 ns per loop

In [2]: %timeit 10 / (98765 * 1.0)
10000000 loops, best of 3: 80.6 ns per loop

I used dis to have a look at the code, and I assume float() will be slower as it requires a function call (unfortunately I couldn't dis.dis(float) to see what it's actually doing).

I guess a second question would be when should I use float(n) and when should I use n * 1.0?

解决方案

Because Peep hole optimizer optimizes it by precalculating the result of that multiplication

import dis
dis.dis(compile("10 / float(98765)", "<string>", "eval"))

  1           0 LOAD_CONST               0 (10)
              3 LOAD_NAME                0 (float)
              6 LOAD_CONST               1 (98765)
              9 CALL_FUNCTION            1
             12 BINARY_DIVIDE       
             13 RETURN_VALUE        

dis.dis(compile("10 / (98765 * 1.0)", "<string>", "eval"))

  1           0 LOAD_CONST               0 (10)
              3 LOAD_CONST               3 (98765.0)
              6 BINARY_DIVIDE       
              7 RETURN_VALUE        

It stores the result of 98765 * 1.0 in the byte code as a constant value. So, it just has to load it and divide, where as in the first case we have to call the function.

We can see that even more clearly like this

print compile("10 / (98765 * 1.0)", "<string>", "eval").co_consts
# (10, 98765, 1.0, 98765.0)

Since the value is pre-calculated during the compile time itself, second one is faster.

Edit: As pointed out by Davidmh in the comments,

And the reason why it is not also optimising away the division is because its behaviour depends on flags, like from __future__ import division and also because of -Q flag.

Quoting the comment from the actual peephole optimizer code for Python 2.7.9,

        /* Cannot fold this operation statically since
           the result can depend on the run-time presence
           of the -Qnew flag */

这篇关于为什么`float`函数比1.0慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆