速度对比.numpy vs python 标准 [英] Speed comparison. numpy vs python standard

查看:79
本文介绍了速度对比.numpy vs python 标准的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我做了一些实验,发现了许多情况,其中 python 的标准 randommath 库比 numpy 对应库更快.

I made a few experiment and found a number of cases where python's standard random and math library is faster than numpy counterpart.

我认为python的标准库对于小规模操作的速度大约快10倍,而numpy对于大规模(向量)操作要快得多.我的猜测是 numpy 有一些开销,这在小案例中占主导地位.

I think there is a tendency that python's standard library is about 10x faster for small scale operation, while numpy is much faster for large scale (vector) operations. My guess is that numpy has some overhead which becomes dominant for small cases.

我的问题是:我的直觉正确吗?对于小型(通常是标量)操作,通常建议使用标准库而不是 numpy 吗?

My question is: Is my intuition correct? And will it be in general advisable to use the standard library rather than numpy for small (typically scalar) operations?

示例如下.

import math
import random
import numpy as np

对数和指数

%timeit math.log(10)
# 158 ns ± 6.16 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit np.log(10)
# 1.64 µs ± 93.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit math.exp(3)
# 146 ns ± 8.57 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit np.exp(3)
# 1.72 µs ± 78.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

生成正态分布

%timeit random.gauss(0, 1)
# 809 ns ± 12.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit np.random.normal()
# 2.57 µs ± 14.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

选择一个随机元素

%timeit random.choices([1,2,3], k=1)
# 1.56 µs ± 55.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit np.random.choice([1,2,3], size=1)
# 23.1 µs ± 1.04 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

与numpy数组相同

arr = np.array([1,2,3])

%timeit random.choices(arr, k=1)
# 1.72 µs ± 33.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%timeit np.random.choice(arr, size=1)
# 18.4 µs ± 502 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

大数组

arr = np.arange(10000)

%timeit random.choices(arr, k=1000)
# 401 µs ± 6.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.random.choice(arr, size=1000)
# 41.7 µs ± 1.39 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

推荐答案

numpy 只是对大数据块的真正性能提升.在将 ndarray 倒入 c 编译的 numpy 函数之前确保内存块正确排列的开销通常会压倒任何时间的好处,如果数组不是相对较大.这就是为什么这么多 numpy 问题基本上都是我如何使用这个循环代码并使它更快",以及为什么它被认为是这个标签中的一个有效问题,而几乎任何其他标签都会把你扔到在他们通过标题之前代码审查.

numpy is only really a performance improvement for large blocks of data. The overhead of making sure the memory blocks line up correctly before pouring an ndarray into a c-compiled numpy function will generally overwhelm any time benefit if the array isn't relatively large. This is why so many numpy questions are basically "How do I take this loopy code and make it fast," and why it is considered a valid question in this tag where nearly any other tag will toss you to Code review before they get past the title.

所以,是的,您的观察是可以概括的.矢量化是 numpy 的重点.未矢量化的 numpy 代码总是比纯 python 代码慢,并且可以说和用手提钻敲碎一个核桃一样错误".要么找到合适的工具,要么获得更多疯狂.

So, yes, your observation is generalizable. Vectorizing is the whole point of numpy. numpy code that isn't vectorized is always slower than bare python code, and is arguably just as "wrong" as cracking a single walnut with a jackhammer. Either find the right tool or get more nuts.

这篇关于速度对比.numpy vs python 标准的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆