为什么`float`函数比1.0慢? [英] Why `float` function is slower than multiplying by 1.0?
问题描述
我知道这可能是没有问题的,但是我写的是针对HPC环境的软件,因此3.5倍的速度提高实际上有所不同.
In [1]: %timeit 10 / float(98765)
1000000 loops, best of 3: 313 ns per loop
In [2]: %timeit 10 / (98765 * 1.0)
10000000 loops, best of 3: 80.6 ns per loop
我使用dis
来查看代码,并且我认为float()
会比较慢,因为它需要一个函数调用(不幸的是,我无法dis.dis(float)
看到它的实际作用).>
我想第二个问题是何时应该使用float(n)
以及何时应该使用n * 1.0
?
因为窥孔优化器通过预先计算该乘法的结果对其进行了优化
import dis
dis.dis(compile("10 / float(98765)", "<string>", "eval"))
1 0 LOAD_CONST 0 (10)
3 LOAD_NAME 0 (float)
6 LOAD_CONST 1 (98765)
9 CALL_FUNCTION 1
12 BINARY_DIVIDE
13 RETURN_VALUE
dis.dis(compile("10 / (98765 * 1.0)", "<string>", "eval"))
1 0 LOAD_CONST 0 (10)
3 LOAD_CONST 3 (98765.0)
6 BINARY_DIVIDE
7 RETURN_VALUE
它将98765 * 1.0
的结果存储在字节码中作为常量值.因此,它只需要加载并划分即可,在第一种情况下,我们必须调用该函数.
我们可以这样更清楚地看到
print compile("10 / (98765 * 1.0)", "<string>", "eval").co_consts
# (10, 98765, 1.0, 98765.0)
由于该值是在编译时本身预先计算的,因此第二个更快.
/* Cannot fold this operation statically since
the result can depend on the run-time presence
of the -Qnew flag */
I understand that this could be argued as a non-issue, but I write software for HPC environments, so this 3.5x speed increase actually makes a difference.
In [1]: %timeit 10 / float(98765)
1000000 loops, best of 3: 313 ns per loop
In [2]: %timeit 10 / (98765 * 1.0)
10000000 loops, best of 3: 80.6 ns per loop
I used dis
to have a look at the code, and I assume float()
will be slower as it requires a function call (unfortunately I couldn't dis.dis(float)
to see what it's actually doing).
I guess a second question would be when should I use float(n)
and when should I use n * 1.0
?
Because Peep hole optimizer optimizes it by precalculating the result of that multiplication
import dis
dis.dis(compile("10 / float(98765)", "<string>", "eval"))
1 0 LOAD_CONST 0 (10)
3 LOAD_NAME 0 (float)
6 LOAD_CONST 1 (98765)
9 CALL_FUNCTION 1
12 BINARY_DIVIDE
13 RETURN_VALUE
dis.dis(compile("10 / (98765 * 1.0)", "<string>", "eval"))
1 0 LOAD_CONST 0 (10)
3 LOAD_CONST 3 (98765.0)
6 BINARY_DIVIDE
7 RETURN_VALUE
It stores the result of 98765 * 1.0
in the byte code as a constant value. So, it just has to load it and divide, where as in the first case we have to call the function.
We can see that even more clearly like this
print compile("10 / (98765 * 1.0)", "<string>", "eval").co_consts
# (10, 98765, 1.0, 98765.0)
Since the value is pre-calculated during the compile time itself, second one is faster.
Edit: As pointed out by Davidmh in the comments,
And the reason why it is not also optimising away the division is because its behaviour depends on flags, like
from __future__ import division
and also because of-Q
flag.
Quoting the comment from the actual peephole optimizer code for Python 2.7.9,
/* Cannot fold this operation statically since
the result can depend on the run-time presence
of the -Qnew flag */
这篇关于为什么`float`函数比1.0慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!