为什么是“any()"运行速度比使用循环慢? [英] why is "any()" running slower than using loops?

查看:59
本文介绍了为什么是“any()"运行速度比使用循环慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在一个项目中工作,该项目管理大量单词并通过大量测试通过它们来验证或不验证列表中的每个单词.有趣的是,每次我使用诸如 itertools 模块之类的更快"工具时,它们似乎都变慢了.

I've been working in a project that manages big lists of words and pass them trough a lot of tests to validate or not each word of the list. The funny thing is that each time that I've used "faster" tools like the itertools module, they seem to be slower.

最后我决定问这个问题,因为我可能做错了什么.以下代码将尝试测试 any() 函数与循环使用的性能.

Finally I decided to ask the question because it is possible that I be doing something wrong. The following code will try to test the performance of the any() function versus the use of loops.

#!/usr/bin/python3
#

import time
from unicodedata import normalize


file_path='./tests'


start=time.time()
with open(file_path, encoding='utf-8', mode='rt') as f:
    tests_list=f.read()
print('File reading done in {} seconds'.format(time.time() - start))

start=time.time()
tests_list=[line.strip() for line in normalize('NFC',tests_list).splitlines()]
print('String formalization, and list strip done in {} seconds'.format(time.time()-start))
print('{} strings'.format(len(tests_list)))


unallowed_combinations=['ab','ac','ad','ae','af','ag','ah','ai','af','ax',
                        'ae','rt','rz','bt','du','iz','ip','uy','io','ik',
                        'il','iw','ww','wp']


def combination_is_valid(string):
    if any(combination in string for combination in unallowed_combinations):
        return False

    return True


def combination_is_valid2(string):
    for combination in unallowed_combinations:
        if combination in string:
            return False

    return True


print('Testing the performance of any()')

start=time.time()
for string in tests_list:
    combination_is_valid(string)
print('combination_is_valid ended in {} seconds'.format(time.time()-start))


start=time.time()
for string in tests_list:
    combination_is_valid2(string)
print('combination_is_valid2 ended in {} seconds'.format(time.time()-start))  

前面的代码非常能代表我所做的测试,如果我们看看结果:

The previous code is pretty representative of the kind of tests I do, and if we take a look to the results:

File reading done in 0.22988605499267578 seconds
String formalization, and list strip done in 6.803032875061035 seconds
38709922 strings
Testing the performance of any()
combination_is_valid ended in 80.74802565574646 seconds
combination_is_valid2 ended in 41.69514226913452 seconds


File reading done in 0.24268722534179688 seconds
String formalization, and list strip done in 6.720442771911621 seconds
38709922 strings
Testing the performance of any()
combination_is_valid ended in 79.05265760421753 seconds
combination_is_valid2 ended in 42.24800777435303 seconds

我发现使用循环比使用 any() 快一半有点令人惊讶.对此有何解释?我做错了什么吗?

I find kinda amazing that using loops is half faster than using any(). What would be the explanation for that? Am I doing something wrong?

(我在GNU-Linux下使用python3.4)

(I used python3.4 under GNU-Linux)

推荐答案

其实any() 函数等于以下函数:

Actually the any() function is equal to following function :

def any(iterable):
    for element in iterable:
        if element:
            return True
    return False

这就像你的第二个函数,但由于 any() 本身返回一个布尔值,你不需要检查结果然后返回一个新值,所以区别性能的原因是因为您实际上使用了冗余返回和 if 条件,也在另一个函数中调用了 any.

which is like your second function, but since the any() returns a boolean value by itself, you don't need to check for the result and then return a new value, So the difference of performance is because of that you are actually use a redundant return and if conditions,also calling the any inside another function.

所以这里 any 的优点是你不需要用另一个函数包装它,因为它为你做所有的事情.

So the advantage of any here is that you don't need to wrap it with another function because it does all the things for you.

正如@interjay 在评论中提到的那样,我错过的最重要的原因似乎是您将生成器表达式传递给 any() ,它不会立即提供结果,因为它根据需要产生结果,它做了额外的工作.

Also as @interjay mentioned in comment it seems that the most important reason which I missed is that you are passing a generator expression to any() which doesn't provide the results at once and since it produce the result on demand it does an extra job.

基于 PEP 0289 -- 生成器表达式

生成器表达式的语义相当于创建一个匿名生成器函数并调用它.例如:

The semantics of a generator expression are equivalent to creating an anonymous generator function and calling it. For example:

g = (x**2 for x in range(10))
print g.next()

相当于:

def __gen(exp):
    for x in exp:
        yield x**2
g = __gen(iter(range(10)))
print g.next()

所以你可以看到,每次python想要访问下一项时,它都会调用生成器的iter函数和next方法.最后结果是在这种情况下使用 any() 是多余的.

So as you can see each time that python want to access the next item it calls the iter function and the next method of a generator.And finally the result is that it's overkill to use any() in such cases.

这篇关于为什么是“any()"运行速度比使用循环慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆