为什么`for`循环这么快才能计算True值? [英] Why is a `for` loop so much faster to count True values?

查看:78
本文介绍了为什么`for`循环这么快才能计算True值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我最近在一个姐妹网站上回答了一个问题,该问题要求一个功能来计算a的所有偶数数字. 其他答案之一包含两个功能(到目前为止是最快的):

I recently answered a question on a sister site which asked for a function that counts all even digits of a number. One of the other answers contained two functions (which turned out to be the fastest, so far):

def count_even_digits_spyr03_for(n):
    count = 0
    for c in str(n):
        if c in "02468":
            count += 1
    return count

def count_even_digits_spyr03_sum(n):
    return sum(c in "02468" for c in str(n))

此外,我还使用列表理解和list.count:

In addition I looked at using a list comprehension and list.count:

def count_even_digits_spyr03_list(n):
    return [c in "02468" for c in str(n)].count(True)

前两个功能基本相同,除了第一个功能使用显式计数循环,而第二个功能使用内置的sum.我本来希望第二个更快(基于例如此答案),这是我所建议的将前者变成是否需要审查.但是,事实证明是另一回事了.用一些数字递增的随机数进行测试(因此,任何一位数字偶数的几率约为50%),我得到以下计时:

The first two functions are essentially the same, except that the first one uses an explicit counting loop, while the second one uses the built-in sum. I would have expected the second one to be faster (based on e.g. this answer), and it is what I would have recommended turning the former into if asked for a review. But, it turns out it is the other way around. Testing it with some random numbers with increasing number of digits (so the chance that any single digit is even is about 50%) I get the following timings:

为什么手动for循环如此之快??这几乎比使用sum快两倍.而且由于内置的​​sum应该比手动添加列表大约快五倍(根据链接的答案 ),这实际上意味着它快十倍!是不是因为只需要将一半的​​值添加到计数器中而节省了成本,因为另一半被丢弃了,足以说明这种差异?

Why is the manual for loop so much faster? It's almost a factor two faster than using sum. And since the built-in sum should be about five times faster than manually summing a list (as per the linked answer), it means that it is actually ten times faster! Is the saving from only having to add one to the counter for half the values, because the other half gets discarded, enough to explain this difference?

使用if作为过滤器,如下所示:

Using an if as a filter like so:

def count_even_digits_spyr03_sum2(n):
    return sum(1 for c in str(n) if c in "02468")

仅将计时提高到与列表理解相同的水平.

Improves the timing only to the same level as the list comprehension.

将时序扩展到更大的数字并归一化为for循环时序时,它们渐近收敛于非常大的数字(> 10k位),这可能是由于str(n)花费的时间:

When extending the timings to larger numbers, and normalizing to the for loop timing, they asymptotically converge for very large numbers (>10k digits), probably due to the time str(n) takes:

推荐答案

sum相当快,但sum并不是造成速度变慢的原因.造成减速的三个主要因素:

sum is quite fast, but sum isn't the cause of the slowdown. Three primary factors contribute to the slowdown:

  • 使用生成器表达式会导致不断暂停和恢复生成器的开销.
  • 您的生成器版本无条件添加,而不是仅在数字为偶数时添加.当数字为奇数时,这会更昂贵.
  • 添加布尔值而不是整数会阻止sum使用其整数快速路径.
  • The use of a generator expression causes overhead for constantly pausing and resuming the generator.
  • Your generator version adds unconditionally instead of only when the digit is even. This is more expensive when the digit is odd.
  • Adding booleans instead of ints prevents sum from using its integer fast path.

与列表理解相比,生成器具有两个主要优点:生成器占用的内存少得多,并且如果不需要所有元素,它们可以提前终止.在没有所有元素的情况下,它们不是旨在提供时间优势.每个元素一次暂停和恢复生成器非常昂贵.

Generators offer two primary advantages over list comprehensions: they take a lot less memory, and they can terminate early if not all elements are needed. They are not designed to offer a time advantage in the case where all elements are needed. Suspending and resuming a generator once per element is pretty expensive.

如果我们用列表理解替换genexp:

If we replace the genexp with a list comprehension:

In [66]: def f1(x):
   ....:     return sum(c in '02468' for c in str(x))
   ....: 
In [67]: def f2(x):
   ....:     return sum([c in '02468' for c in str(x)])
   ....: 
In [68]: x = int('1234567890'*50)
In [69]: %timeit f1(x)
10000 loops, best of 5: 52.2 µs per loop
In [70]: %timeit f2(x)
10000 loops, best of 5: 40.5 µs per loop

我们看到立竿见影的加速,但浪费了列表上的一堆内存.

we see an immediate speedup, at the cost of wasting a bunch of memory on a list.

如果您查看genexp版本:

If you look at your genexp version:

def count_even_digits_spyr03_sum(n):
    return sum(c in "02468" for c in str(n))

您会看到它没有if.它只是将布尔值扔到sum中.相反,您的循环:

you'll see it has no if. It just throws booleans into sum. In constrast, your loop:

def count_even_digits_spyr03_for(n):
    count = 0
    for c in str(n):
        if c in "02468":
            count += 1
    return count

仅在数字为偶数时添加任何内容.

only adds anything if the digit is even.

如果我们将前面定义的f2更改为还包含if,则会看到另一个加速:

If we change the f2 defined earlier to also incorporate an if, we see another speedup:

In [71]: def f3(x):
   ....:     return sum([True for c in str(x) if c in '02468'])
   ....: 
In [72]: %timeit f3(x)
10000 loops, best of 5: 34.9 µs per loop

f1与原始代码相同,花费了52.2 µs,而f2仅对列表理解进行了更改,花费了40.5 µs.

f1, identical to your original code, took 52.2 µs, and f2, with just the list comprehension change, took 40.5 µs.

使用True而不是f3中的1可能看起来很尴尬.那是因为将其更改为1会激活一个最终的加速. sum具有快速路径用于整数,但快速路径仅对类型恰好为int的对象激活. bool不计算在内.这是检查项目是否为int类型的行:

It probably looked pretty awkward using True instead of 1 in f3. That's because changing it to 1 activates one final speedup. sum has a fast path for integers, but the fast path only activates for objects whose type is exactly int. bool doesn't count. This is the line that checks that items are of type int:

if (PyLong_CheckExact(item)) {

完成最终更改后,将True更改为1:

Once we make the final change, changing True to 1:

In [73]: def f4(x):
   ....:     return sum([1 for c in str(x) if c in '02468'])
   ....: 
In [74]: %timeit f4(x)
10000 loops, best of 5: 33.3 µs per loop

我们看到了最后一个小加速.

we see one last small speedup.

那么,毕竟,我们打败了显式循环吗?

So after all that, do we beat the explicit loop?

In [75]: def explicit_loop(x):
   ....:     count = 0
   ....:     for c in str(x):
   ....:         if c in '02468':
   ....:             count += 1
   ....:     return count
   ....: 
In [76]: %timeit explicit_loop(x)
10000 loops, best of 5: 32.7 µs per loop

不.我们已经收支相抵了,但我们没有击败它.剩下的最大问题是列表.构建它很昂贵,并且sum必须通过列表迭代器来检索元素,而元素有其自身的成本(尽管我认为那部分很便宜).不幸的是,只要我们通过测试数字和呼叫sum方法,我们就没有摆脱该列表的好方法.显式循环获胜.

Nope. We've roughly broken even, but we're not beating it. The big remaining problem is the list. Building it is expensive, and sum has to go through the list iterator to retrieve elements, which has its own cost (though I think that part is pretty cheap). Unfortunately, as long as we're going through the test-digits-and-call-sum approach, we don't have any good way to get rid of the list. The explicit loop wins.

反正我们可以走得更远吗?好吧,到目前为止,我们一直在尝试使sum更接近显式循环,但是如果我们坚持使用此哑表,则可以脱离显式循环,而只需调用len而不是sum :

Can we go further anyway? Well, we've been trying to bring the sum closer to the explicit loop so far, but if we're stuck with this dumb list, we could diverge from the explicit loop and just call len instead of sum:

def f5(x):
    return len([1 for c in str(x) if c in '02468'])

单独测试数字并不是我们尝试打破循环的唯一方法.与显式循环相比,我们还可以尝试str.count. str.count直接在C中循环访问字符串缓冲区,避免了很多包装对象和间接调用.我们需要调用它5次,对字符串进行5次传递,但是仍然可以得到回报:

Testing digits individually isn't the only way we can try to beat the loop, too. Diverging even further from the explicit loop, we can also try str.count. str.count iterates over a string's buffer directly in C, avoiding a lot of wrapper objects and indirection. We need to call it 5 times, making 5 passes over the string, but it still pays off:

def f6(x):
    s = str(x)
    return sum(s.count(c) for c in '02468')

不幸的是,这就是我用于计时的站点将我困在"tarpit"中以占用过多资源的原因,因此我不得不切换站点.以下时间不能与上述时间直接比较:

Unfortunately, this is the point when the site I was using for timing stuck me in the "tarpit" for using too many resources, so I had to switch sites. The following timings are not directly comparable with the timings above:

>>> import timeit
>>> def f(x):
...     return sum([1 for c in str(x) if c in '02468'])
... 
>>> def g(x):
...     return len([1 for c in str(x) if c in '02468'])
... 
>>> def h(x):
...     s = str(x)
...     return sum(s.count(c) for c in '02468')
... 
>>> x = int('1234567890'*50)
>>> timeit.timeit(lambda: f(x), number=10000)
0.331528635986615
>>> timeit.timeit(lambda: g(x), number=10000)
0.30292080697836354
>>> timeit.timeit(lambda: h(x), number=10000)
0.15950968803372234
>>> def explicit_loop(x):
...     count = 0
...     for c in str(x):
...         if c in '02468':
...             count += 1
...     return count
... 
>>> timeit.timeit(lambda: explicit_loop(x), number=10000)
0.3305045129964128

这篇关于为什么`for`循环这么快才能计算True值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆