为什么 `for` 循环计算 True 值的速度要快得多? [英] Why is a `for` loop so much faster to count True values?

查看:38
本文介绍了为什么 `for` 循环计算 True 值的速度要快得多?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近在姊妹网站上回答了一个

为什么手动 for 循环要快得多?它几乎比使用 sum 快两倍.并且由于内置的​​ sum 应该比手动对列表求和快五倍(根据

解决方案

sum 相当快,但 sum 不是减速的原因.导致放缓的三个主要因素:

  • 使用生成器表达式会导致不断暂停和恢复生成器的开销.
  • 您的生成器版本无条件添加,而不是仅在数字为偶数时添加.当数字为奇数时,这会更昂贵.
  • 添加布尔值而不是整数会阻止 sum 使用其整数快速路径.
<小时>

与列表推导式相比,生成器有两个主要优点:它们占用的内存少得多,如果不需要所有元素,它们可以提前终止.它们的不是旨在在需要所有元素的情况下提供时间优势.为每个元素暂停和恢复生成器的成本非常高.

如果我们用列表推导式替换 genexp:

在 [66]: def f1(x):....: return sum(c in '02468' for c in str(x))....:在 [67] 中:def f2(x):....: return sum([c in '02468' for c in str(x)])....:在 [68] 中:x = int('1234567890'*50)在 [69] 中:%timeit f1(x)10000 个循环,最好的 5 个:每个循环 52.2 µs在 [70] 中:%timeit f2(x)10000 个循环,最好的 5 个:每个循环 40.5 µs

我们看到了直接的加速,代价是在列表上浪费了大量内存.

<小时>

如果您查看您的 genexp 版本:

def count_even_digits_spyr03_sum(n):return sum(c in "02468" for c in str(n))

你会看到它没有if.它只是将布尔值放入 sum.相比之下,你的循环:

def count_even_digits_spyr03_for(n):计数 = 0对于 str(n) 中的 c:如果 c 在02468"中:计数 += 1返回计数

仅当数字为偶数时才添加任何内容.

如果我们改变之前定义的 f2 以包含一个 if,我们会看到另一个加速:

在 [71]: def f3(x):....: return sum([True for c in str(x) if c in '02468'])....:在 [72] 中:%timeit f3(x)10000 个循环,最好的 5 个:每个循环 34.9 µs

f1 与您的原始代码相同,耗时 52.2 µs,而 f2,仅更改列表理解,耗时 40.5 µs.

<小时>

f3 中使用 True 而不是 1 可能看起来很尴尬.这是因为将其更改为 1 会激活一个最终加速.sum 有一个快速路径 用于整数,但快速路径仅对类型为 int 的对象有效.bool 不算.这是检查项目是否属于 int 类型的行:

if (PyLong_CheckExact(item)) {

一旦我们进行了最后的更改,将 True 更改为 1:

在 [73]: def f4(x):....: return sum([1 for c in str(x) if c in '02468'])....:在 [74] 中:%timeit f4(x)10000 个循环,最好的 5 个:每个循环 33.3 µs

我们看到了最后一个小小的加速.

<小时>

那么毕竟,我们能打败显式循环吗?

在 [75]: def explicit_loop(x):....: 计数 = 0....:对于 str(x) 中的 c:....: 如果 c 在 '02468':....: 计数 += 1....:返回计数....:在 [76]: %timeit explicit_loop(x)10000 个循环,最好的 5 个:每个循环 32.7 µs

没有.我们大致收支平衡,但我们没有击败它.剩下的最大问题是列表.构建它很昂贵,并且 sum 必须通过列表迭代器来检索元素,这有其自身的成本(尽管我认为那部分非常便宜).不幸的是,只要我们通过 test-digits-and-call-sum 方法,我们就没有任何摆脱列表的好方法.显式循环获胜.

我们还能走得更远吗?好吧,到目前为止,我们一直在尝试使 sum 更接近显式循环,但是如果我们坚持使用这个愚蠢的列表,我们可以脱离显式循环而只调用 len 而不是 sum:

def f5(x):return len([1 for c in str(x) if c in '02468'])

单独测试数字也不是我们可以尝试击败循环的唯一方法.进一步偏离显式循环,我们还可以尝试 str.count.str.count 直接在 C 中迭代字符串的缓冲区,避免了很多包装对象和间接.我们需要调用它 5 次,对字符串进行 5 次传递,但它仍然有回报:

def f6(x):s = str(x)返回 sum(s.count(c) for c in '02468')

不幸的是,这正是我用于计时的站点因使用过多资源而陷入缓坑"的时刻,因此我不得不切换站点.以下时间与上述时间不能直接比较:

<预><代码>>>>导入时间>>>定义 f(x):... return sum([1 for c in str(x) if c in '02468'])...>>>定义 g(x):... return len([1 for c in str(x) if c in '02468'])...>>>定义 h(x):... s = str(x)... return sum(s.count(c) for c in '02468')...>>>x = int('1234567890'*50)>>>timeit.timeit(lambda: f(x), number=10000)0.331528635986615>>>timeit.timeit(lambda: g(x), number=10000)0.30292080697836354>>>timeit.timeit(lambda: h(x), number=10000)0.15950968803372234>>>defexplicit_loop(x):...计数= 0... 对于 str(x) 中的 c:...如果 c 在 '02468' 中:...计数+= 1...返回计数...>>>timeit.timeit(lambda:explicit_loop(x), number=10000)0.3305045129964128

I recently answered a question on a sister site which asked for a function that counts all even digits of a number. One of the other answers contained two functions (which turned out to be the fastest, so far):

def count_even_digits_spyr03_for(n):
    count = 0
    for c in str(n):
        if c in "02468":
            count += 1
    return count

def count_even_digits_spyr03_sum(n):
    return sum(c in "02468" for c in str(n))

In addition I looked at using a list comprehension and list.count:

def count_even_digits_spyr03_list(n):
    return [c in "02468" for c in str(n)].count(True)

The first two functions are essentially the same, except that the first one uses an explicit counting loop, while the second one uses the built-in sum. I would have expected the second one to be faster (based on e.g. this answer), and it is what I would have recommended turning the former into if asked for a review. But, it turns out it is the other way around. Testing it with some random numbers with increasing number of digits (so the chance that any single digit is even is about 50%) I get the following timings:

Why is the manual for loop so much faster? It's almost a factor two faster than using sum. And since the built-in sum should be about five times faster than manually summing a list (as per the linked answer), it means that it is actually ten times faster! Is the saving from only having to add one to the counter for half the values, because the other half gets discarded, enough to explain this difference?


Using an if as a filter like so:

def count_even_digits_spyr03_sum2(n):
    return sum(1 for c in str(n) if c in "02468")

Improves the timing only to the same level as the list comprehension.


When extending the timings to larger numbers, and normalizing to the for loop timing, they asymptotically converge for very large numbers (>10k digits), probably due to the time str(n) takes:

解决方案

sum is quite fast, but sum isn't the cause of the slowdown. Three primary factors contribute to the slowdown:

  • The use of a generator expression causes overhead for constantly pausing and resuming the generator.
  • Your generator version adds unconditionally instead of only when the digit is even. This is more expensive when the digit is odd.
  • Adding booleans instead of ints prevents sum from using its integer fast path.

Generators offer two primary advantages over list comprehensions: they take a lot less memory, and they can terminate early if not all elements are needed. They are not designed to offer a time advantage in the case where all elements are needed. Suspending and resuming a generator once per element is pretty expensive.

If we replace the genexp with a list comprehension:

In [66]: def f1(x):
   ....:     return sum(c in '02468' for c in str(x))
   ....: 
In [67]: def f2(x):
   ....:     return sum([c in '02468' for c in str(x)])
   ....: 
In [68]: x = int('1234567890'*50)
In [69]: %timeit f1(x)
10000 loops, best of 5: 52.2 µs per loop
In [70]: %timeit f2(x)
10000 loops, best of 5: 40.5 µs per loop

we see an immediate speedup, at the cost of wasting a bunch of memory on a list.


If you look at your genexp version:

def count_even_digits_spyr03_sum(n):
    return sum(c in "02468" for c in str(n))

you'll see it has no if. It just throws booleans into sum. In constrast, your loop:

def count_even_digits_spyr03_for(n):
    count = 0
    for c in str(n):
        if c in "02468":
            count += 1
    return count

only adds anything if the digit is even.

If we change the f2 defined earlier to also incorporate an if, we see another speedup:

In [71]: def f3(x):
   ....:     return sum([True for c in str(x) if c in '02468'])
   ....: 
In [72]: %timeit f3(x)
10000 loops, best of 5: 34.9 µs per loop

f1, identical to your original code, took 52.2 µs, and f2, with just the list comprehension change, took 40.5 µs.


It probably looked pretty awkward using True instead of 1 in f3. That's because changing it to 1 activates one final speedup. sum has a fast path for integers, but the fast path only activates for objects whose type is exactly int. bool doesn't count. This is the line that checks that items are of type int:

if (PyLong_CheckExact(item)) {

Once we make the final change, changing True to 1:

In [73]: def f4(x):
   ....:     return sum([1 for c in str(x) if c in '02468'])
   ....: 
In [74]: %timeit f4(x)
10000 loops, best of 5: 33.3 µs per loop

we see one last small speedup.


So after all that, do we beat the explicit loop?

In [75]: def explicit_loop(x):
   ....:     count = 0
   ....:     for c in str(x):
   ....:         if c in '02468':
   ....:             count += 1
   ....:     return count
   ....: 
In [76]: %timeit explicit_loop(x)
10000 loops, best of 5: 32.7 µs per loop

Nope. We've roughly broken even, but we're not beating it. The big remaining problem is the list. Building it is expensive, and sum has to go through the list iterator to retrieve elements, which has its own cost (though I think that part is pretty cheap). Unfortunately, as long as we're going through the test-digits-and-call-sum approach, we don't have any good way to get rid of the list. The explicit loop wins.

Can we go further anyway? Well, we've been trying to bring the sum closer to the explicit loop so far, but if we're stuck with this dumb list, we could diverge from the explicit loop and just call len instead of sum:

def f5(x):
    return len([1 for c in str(x) if c in '02468'])

Testing digits individually isn't the only way we can try to beat the loop, too. Diverging even further from the explicit loop, we can also try str.count. str.count iterates over a string's buffer directly in C, avoiding a lot of wrapper objects and indirection. We need to call it 5 times, making 5 passes over the string, but it still pays off:

def f6(x):
    s = str(x)
    return sum(s.count(c) for c in '02468')

Unfortunately, this is the point when the site I was using for timing stuck me in the "tarpit" for using too many resources, so I had to switch sites. The following timings are not directly comparable with the timings above:

>>> import timeit
>>> def f(x):
...     return sum([1 for c in str(x) if c in '02468'])
... 
>>> def g(x):
...     return len([1 for c in str(x) if c in '02468'])
... 
>>> def h(x):
...     s = str(x)
...     return sum(s.count(c) for c in '02468')
... 
>>> x = int('1234567890'*50)
>>> timeit.timeit(lambda: f(x), number=10000)
0.331528635986615
>>> timeit.timeit(lambda: g(x), number=10000)
0.30292080697836354
>>> timeit.timeit(lambda: h(x), number=10000)
0.15950968803372234
>>> def explicit_loop(x):
...     count = 0
...     for c in str(x):
...         if c in '02468':
...             count += 1
...     return count
... 
>>> timeit.timeit(lambda: explicit_loop(x), number=10000)
0.3305045129964128

这篇关于为什么 `for` 循环计算 True 值的速度要快得多?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆