为什么嵌套时不添加函数的速度测试?(Python) [英] Why do speed tests of functions not add when nested? (Python)

查看:39
本文介绍了为什么嵌套时不添加函数的速度测试?(Python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试优化一些代码,所以我想我会确切地研究我的瓶颈在哪里.我有四个相互包装的函数,例如:

I am trying to optimize some code, so I thought I would look into exactly where my bottlenecks were. I have four functions that wrap eachother like:

return f1(f2(f3(f4())))

所以我单独和整体测试了每个.单独时,我基本上预先计算了前一个函数.但是,我认为它们会加起来总时间.但他们没有,当我将它们结合起来时,它显着增长.所以我决定以较小的规模来看待它.我写这个是为了测试

So I tested each individually as well as a whole. When individually i essentially precomputed the previous function. However, I assumed they would add up to the total time. But they didn't, it grew significantly as I combined them. So I decided to look at it at a smaller scale. I wrote this to test

def f1():
    return 2

def f2(num):
    return num*num

def test():
    for i in range(1000000):
        f1()
def test2():
    for i in range(1000000):
        f2(2)
def test3():
    for i in range(1000000):
        f2(f1())

我返回 test 为 .085 秒,test2 为 0.125 秒,test3 为 0.171 秒.这在两个方面让我感到困惑.1) 为什么 test3 不是 0.21 秒,以及 2) 为什么它更短,而不是我的问题变得更长?

I got back test as .085 seconds, test2 as .125 seconds and test3 as .171 seconds. This confounded me in two ways. 1) Why isn't test3 .21 seconds, and 2) Why was it shorter as opposed to my problem of it getting much longer?

推荐答案

因为你没有给我们重现你原来问题的代码,所以除了猜测之外很难做任何事情......但我可以在这里做一些猜测.

Since you haven't given us code that reproduces your original problem, it's hard to do anything but guess… but I can make some guesses here.

当您组合两个非常小的函数时,运行它的频率越高,您就越有可能将这两个函数的字节码、全局字典和局部字典等都保存在缓存中.

When you compose two very small functions, the more often you run it, the more likely you are to have the bytecode to both functions, the globals and locals dictionaries, etc. all in your cache.

另一方面,当你组合两个非常大的函数时,你很可能会在每次内部函数运行时将部分外部函数从缓存中推出,因此你最终会在缓存重新填充上花费比实际更多的时间解释您的代码.

On the other hand, when you compose two very large functions, you're very likely to push part of the outer function out of cache each time the inner function runs, so you end up spending more time in cache refills than actually interpreting your code.

最重要的是,您忘记了调用函数的成本.在 Python 中,这不仅仅是一个函数调用——您通常通过它们的全局名称来调用函数,并且 LOAD_GLOBAL 可能会非常慢.如果你写过这样的玩具作文:

On top of that, you're forgetting about the cost of calling a function. In Python, that's not just a function call—you normally call functions by their global name, and a LOAD_GLOBAL can be very slow. If you've written toy composition like this:

def test3():
    for i in range(1000000):
        f2(f1())

……您不会像这样做那样频繁地为该查找付费:

… you don't pay for that lookup as often as if you do this:

def f2():
    return 2 * f1()
def test3():
    for i in range(1000000):
        f2()

...但是您几乎可以通过将 f1 复制到适当的 locals 中来为它支付任何费用.对于上面的两个例子:

… but you can pay almost nothing for it by copying f1 into the appropriate locals. For the two examples above:

def test3():
    _f1 = f1
    for i in range(1000000):
        f2(_f1())

def f2(_f1=f1):
    return 2 * _f1()
def test3():
    for i in range(1000000):
        f2()

<小时>

您的测试功能包括您所安排的设置成本.


Your test functions include setup costs in what you're timing.

例如,如果您使用的是 Python 2.x,range(1000000) 可能会占用总时间的很大一部分.但是 test1 + test2 只做两次,而 test3 只做一次.因此,test3 中的节省足以在玩具测试中引起注意是完全合理的.但在您的实际测试中,例如每个循环需要 100 倍的时间,range 调用的成本是微不足道的.

For example, if you're using Python 2.x, the range(1000000) could take a significant fraction of the total time. But test1 + test2 only does that twice, while test3 only does it once. So, it's perfectly reasonable that the savings in test3 were enough to be noticeable in the toy test. But in your real-life test, where each loop takes, say, 100x longer, the cost of the range call is insignificant.

还值得注意的是,如果您创建了足够的内存,您最终可能会触发 malloc 调用甚至 VM 交换——这分别是缓慢和令人麻木的缓慢,并且两者都是与在循环中运行代码的通常成本相比,可变性和不可预测性要大得多.这可能不是仅仅创建和销毁几个 100 万个项目列表(应该在 20-80MB 左右)的问题,但它可能是.

It's also worth noting that if you create enough memory, you can end up triggering malloc calls or even VM swapping—which are, respectively, slow and mind-numbingly slow, and which are also both much more variable and unpredictable than the usual costs of running code in a loop. That may not be an issue just creating and destroying a few 1M-item lists (which should be on the order of 20-80MB), but it could be.

最后,您还没有向我们展示您如何计时、如何重复测试、如何汇总结果等,因此您的测试很可能无效.

Finally, you haven't shown us how you're doing the timing, how you're repeating the tests, how you're aggregating the results, etc., so it's quite possible that your tests just aren't valid.

这篇关于为什么嵌套时不添加函数的速度测试?(Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆