numpy float:在算术运算中比内建速度慢10倍? [英] numpy float: 10x slower than builtin in arithmetic operations?
问题描述
import numpy as np
s = 0
在范围内(10000000):
s + = np.float64(1)#用np.float32替换内置的浮动
- 内置float:4.9 s
- float64:10.5 s
- float32:45.0 s
为什么 float64
比浮
?为什么 float32
比float64慢5倍?
有没有办法避免使用 np.float64
,并有 numpy
函数返回内置 float
而不是 float64
?
我发现使用 numpy.float64
比Python的float慢得多,并且 numpy.float32
甚至更慢(即使我在32位机器上)。
numpy.float32
在我的32位机器上。因此,每次我使用各种numpy函数,如 numpy.random.uniform
,我将结果转换为 float32
(所以进一步的操作将以32位精度执行)。
有什么办法可以在程序或命令行中的某处设置一个变量,所有numpy函数都返回 float32
而不是 float64
?
编辑1:
numpy.float64
我想强调,我的时间不是由于以下任何:
$
- 函数调用
- numpy和python之间的转换float
- 创建对象
我更新了代码,使问题更加清晰。使用新代码,看起来使用numpy数据类型的性能提高了10倍:
from datetime import datetime
$
导入numpy为np
START_TIME = datetime.now()
在执行之前取消注释下列行之一
#s = np。 (1)
#s = np.float32(1)
#s = 1.0
在范围内(10000000):
s =(s + 8 )* s%2399232
print(s)
print('Runtime:',datetime.now() - START_TIME)
- float64:34.56s
- float32:35.11s
- float:3.53s
我也试过:
from datetime import datetime
import numpy as np
<$ (1)
给我在范围内(10000000):
s = float pre $
START_TIME = datetime.now()
s = np.float64 (s)
s =(s + 8)* s%2399232
s = np.float64(s)
打印
print('Runtime:',datetime.now() - START_TIME)
<执行时间是13.28秒;实际上,将 float64
转换为 float
的速度比原来快3倍。尽管如此,转换还是会花钱的,总的来说,它比纯python float
慢3倍以上。
我的机器是:
- Intel Core 2 Duo T9300(2.5GHz)WinXP Professional (32位)
- ActiveState Python 3.1.3.5
- Numpy 1.5.1
编辑2:
谢谢你的回答,他们帮助我理解如何处理问题。
但是我仍然想知道为什么下面的代码运行速度慢10倍,
的确切原因(基于源代码) float64
比float
。
编辑3:
我重新运行Windows 7 x64(Intel Core i7 930 @ 3.8GHz)下的代码。
再次,代码是:
from datetime import datetime
import numpy as np
START_TIME = datetime.now()
#之一(1)
#s = np.float32(1)
#s = 1.0
for i (10000000):
s =(s + 8)* s%2399232
print(s)
print('Runtime:',datetime.now() - START_TIME )
时间为:
- $
- float64:16.1s
- float32:16.1s
- float:3.2s
ul> - built-in float: 4.9 s
- float64: 10.5 s
- float32: 45.0 s
- the function calls
- the conversion between numpy and python float
- the creation of objects
- float64: 34.56s
- float32: 35.11s
- float: 3.53s
- Intel Core 2 Duo T9300 (2.5GHz)
- WinXP Professional (32-bit)
- ActiveState Python 3.1.3.5
- Numpy 1.5.1
- float64: 16.1s
- float32: 16.1s
- float: 3.2s
现在,
np
浮点数(64或32)比内置浮
。仍然是一个重要的区别。我试图找出它来自哪里。
编辑结束
numpy code>和内置数字,Python算法运行速度较慢。避免这种转换几乎可以消除所报告的所有性能下降。
详情
请注意,在我的原始代码中:
s = np.float64(1)
):
s =(s + 8)* s%2399232
类型
float
和numpy.float64
被混合在一个表达式中。也许Python不得不把它们全部转换成一种类型?
pre $ s $范围(10000000):
s =(s + np.float64(8))* s%np.float64(2399232)
如果运行时没有改变(而不是增加),那么这就是Python确实在做的事情,解释了性能拖延。
<实际上,运行时间下降了1.5倍!这怎么可能? Python可能不得不做的最糟糕的事情是这两个转换?
我不太清楚。也许Python必须动态地检查需要转换的内容,这需要花费一些时间,并被告知要执行的精确转换是否会使其更快。也许,一些完全不同的机制被用于算术(根本不涉及转换),并且在不匹配的类型上恰好是超慢的。阅读
numpy
源代码可能会有所帮助,但是这超出了我的技能范围。
无论如何,现在我们明显可以加快速度更多将转换移出循环:
q = np.float64(8)
r = np.float64 (2399232)
(1 000 000):
s =(s + q)* s%r
正如所料,运行时大幅减少了2.3倍。为了公平起见,我们现在需要改变
float
版本,通过将文字常量移出循环。这导致了一个很小的(10%)放缓。
考虑到所有这些变化,
np.float64
版本的代码现在只比等效的float
版本慢30% 5倍的可笑表现已经基本消失了。
为什么我们仍然会看到30%的延迟?numpy.float64
数字占用的空间与float
相同,所以不会是原因。对于用户定义的类型,算术运算符的分辨率可能需要更长的时间。当然不是一个主要的关注。I am getting really weird timings for the following code:
import numpy as np s = 0 for i in range(10000000): s += np.float64(1) # replace with np.float32 and built-in float
Why is
float64
twice slower thanfloat
? And why isfloat32
5 times slower than float64?Is there any way to avoid the penalty of using
np.float64
, and havenumpy
functions return built-infloat
instead offloat64
?I found that using
numpy.float64
is much slower than Python's float, andnumpy.float32
is even slower (even though I'm on a 32-bit machine).numpy.float32
on my 32-bit machine. Therefore, every time I use various numpy functions such asnumpy.random.uniform
, I convert the result tofloat32
(so that further operations would be performed at 32-bit precision).Is there any way to set a single variable somewhere in the program or in the command line, and make all numpy functions return
float32
instead offloat64
?EDIT #1:
numpy.float64 is 10 times slower than float in arithmetic calculations. It's so bad that even converting to float and back before the calculations makes the program run 3 times faster. Why? Is there anything I can do to fix it?
I want to emphasize that my timings are not due to any of the following:
I updated my code to make it clearer where the problem lies. With the new code, it would seem I see a ten-fold performance hit from using numpy data types:
from datetime import datetime import numpy as np START_TIME = datetime.now() # one of the following lines is uncommented before execution #s = np.float64(1) #s = np.float32(1) #s = 1.0 for i in range(10000000): s = (s + 8) * s % 2399232 print(s) print('Runtime:', datetime.now() - START_TIME)
The timings are:
Just for the hell of it, I also tried:
from datetime import datetime import numpy as np
START_TIME = datetime.now() s = np.float64(1) for i in range(10000000): s = float(s) s = (s + 8) * s % 2399232 s = np.float64(s) print(s) print('Runtime:', datetime.now() - START_TIME)
The execution time is 13.28 s; it's actually 3 times faster to convert the
float64
tofloat
and back than to use it as is. Still, the conversion takes its toll, so overall it's more than 3 times slower compared to the pure-pythonfloat
.My machine is:
EDIT #2:
Thank you for the answers, they help me understand how to deal with this problem.
But I still would like to know the precise reason (based on the source code perhaps) why the code below runs 10 times slow with
float64
than withfloat
.EDIT #3:
I rerun the code under the Windows 7 x64 (Intel Core i7 930 @ 3.8GHz).
Again, the code is:
from datetime import datetime import numpy as np START_TIME = datetime.now() # one of the following lines is uncommented before execution #s = np.float64(1) #s = np.float32(1) #s = 1.0 for i in range(10000000): s = (s + 8) * s % 2399232 print(s) print('Runtime:', datetime.now() - START_TIME)
The timings are:
Now both
np
floats (either 64 or 32) are 5 times slower than the built-infloat
. Still, a significant difference. I'm trying to figure out where it comes from.END OF EDITS
解决方案Summary
If an arithmetic expression contains both
numpy
and built-in numbers, Python arithmetics works slower. Avoiding this conversion removes almost all of the performance degradation I reported.Details
Note that in my original code:
s = np.float64(1) for i in range(10000000): s = (s + 8) * s % 2399232
the types
float
andnumpy.float64
are mixed up in one expression. Perhaps Python had to convert them all to one type?s = np.float64(1) for i in range(10000000): s = (s + np.float64(8)) * s % np.float64(2399232)
If the runtime is unchanged (rather than increased), it would suggest that's what Python indeed was doing under the hood, explaining the performance drag.
Actually, the runtime fell by 1.5 times! How is it possible? Isn't the worst thing that Python could possibly have to do was these two conversions?
I don't really know. Perhaps Python had to dynamically check what needs to be converted into what, which takes time, and being told what precise conversions to perform makes it faster. Perhaps, some entirely different mechanism is used for arithmetics (which doesn't involve conversions at all), and it happens to be super-slow on mismatched types. Reading
numpy
source code might help, but it's beyond my skill.Anyway, now we can obviously speed things up more by moving the conversions out of the loop:
q = np.float64(8) r = np.float64(2399232) for i in range(10000000): s = (s + q) * s % r
As expected, the runtime is reduced substantially: by another 2.3 times.
To be fair, we now need to change the
float
version slightly, by moving the literal constants out of the loop. This results in a tiny (10%) slowdown.Accounting for all these changes, the
np.float64
version of the code is now only 30% slower than the equivalentfloat
version; the ridiculous 5-fold performance hit is largely gone.Why do we still see the 30% delay?
numpy.float64
numbers take the same amount of space asfloat
, so that won't be the reason. Perhaps the resolution of the arithmetic operators takes longer for user-defined types. Certainly not a major concern.这篇关于numpy float:在算术运算中比内建速度慢10倍?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文 - float64:16.1s