为什么是“地图"?ThreeSum版本这么慢? [英] Why is the "map" version of ThreeSum so slow?

查看:52
本文介绍了为什么是“地图"?ThreeSum版本这么慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我预计 ThreeSum 的 Python 实现会很慢:

def count(a):"""ThreeSum: 给定 N 个不同的整数,有多少个三元组和正好为零?"""N = len(a)cnt = 0对于范围(N)中的我:对于范围内的 j (i+1, N):对于范围内的 k (j+1, N):如果 sum([a[i], a[j], a[k]]) == 0:cnt += 1返回cnt

但我很震惊,这个版本看起来也很慢:

def count_python(a):"""ThreeSum 使用 itertools"""return sum(map(lambda X: sum(X)==0, itertools.combinations(a, r=3)))

谁能推荐一个更快的 Python 实现?这两种实现看起来都很慢......谢谢

...

答案摘要:以下是 O(N^3) 版本(出于教育目的,未在现实生活中使用)版本的此线程中提供的所有各种版本的运行在我的机器上的运行情况:

56 秒 RUNNING count_slow...
28 秒运行 count_itertools,由 Ashwini Chaudhary 编写...
14 秒 RUNNING count_fixed,由 roippi 编写...
11 秒运行 count_itertools(更快),由 Veedrak 编写...
08 秒 RUNNING count_enumerate,由 roippi 编写...

*注意:需要修改 Veedrak 的解决方案以获得正确的计数输出:
sum(1 for x, y, z in itertools.combinations(a, r=3) if x+y==-z)

解决方案

提供第二个答案.从各种评论来看,您似乎主要关心为什么这个特定的 O(n**3) 算法在从 java 移植时很慢.让我们潜入.

def count(a):"""ThreeSum: 给定 N 个不同的整数,有多少个三元组和正好为零?"""N = len(a)cnt = 0对于范围(N)中的我:对于范围内的 j (i+1, N):对于范围内的 k (j+1, N):如果 sum([a[i], a[j], a[k]]) == 0:cnt += 1返回cnt

立即出现的一个主要问题是,您正在做一些 Java 代码几乎肯定不会做的事情:具体化一个 3 元素列表只是为了将三个数字相加!

if sum([a[i], a[j], a[k]]) == 0:

呸!把它写成

如果 a[i] + a[j] + a[k] == 0:

一些基准测试表明,您这样做会增加 50% 以上的开销只是.哎呀.

<小时>

这里的另一个问题是您在应该使用迭代的地方使用索引.在 python 中尽量避免编写这样的代码:

for i in range(len(some_list)):do_something(some_list[i])

而只是写:

 for x in some_list:做某事(x)

如果您明确需要您所在的索引(正如您在代码中实际所做的那样),请使用 enumerate:

 for i,x in enumerate(some_list):#等等

这通常是一种风格(尽管它比鸭子类型和迭代器协议更深入) - 但它也是一种性能.为了查找a[i]的值,该调用被转换为a.__getitem__(i),然后python必须动态解析一个__getitem__ 方法查找,调用它,并返回值.每次.这不是一个疯狂的开销 - 至少在内置类型上 - 但如果你在循环中做很多事情,它就会加起来.另一方面,将 a 视为可迭代对象,可以避免很多开销.

考虑到这一变化,您可以再次重写您的函数:

def count_enumerate(a):cnt = 0对于 i, x 在 enumerate(a) 中:对于 j, y 在 enumerate(a[i+1:], i+1):对于 a[j+1:] 中的 z:如果 x + y + z == 0:cnt += 1返回cnt

让我们看看一些时间:

%timeit count(range(-100,100))1 个循环,最好的 3 个:每个循环 394 毫秒%timeit count_fixed(range(-100,100)) #just fix your sum() line10 个循环,最好的 3 个:每个循环 158 毫秒%timeit count_enumerate(range(-100,100))10 个循环,最好的 3 个:每个循环 88.9 毫秒

这与它要进行的速度一样快.您可以通过将所有内容都包含在一个推导式中而不是执行 cnt += 1 来减少百分之一左右,但这非常小.

我尝试了一些 itertools 实现,但实际上我无法让它们比这个显式循环版本更快.如果您考虑一下,这是有道理的 - 对于每次迭代,itertools.combinations 版本必须重新绑定所有三个变量所引用的内容,而显式循环会作弊"" 并且很少重新绑定外部循环中的变量.

然而,现实检查时间:在一切都说完并完成之后,您仍然可以期望 cPython 运行此算法的速度比现代 JVM 慢一个数量级.python 中内置了太多的抽象,阻碍了快速循环.如果您关心速度(并且您无法修复您的算法 - 请参阅我的其他答案),请使用 numpy 之类的东西将所有时间都花在 C 中,或者使用不同的 python 实现.

<小时>

后记:pypy

为了好玩,我在包含 1000 个元素的列表上运行 count_fixed,在 cPython 和 pypy 上都运行.

cPython:

在 [81]: timeit.timeit('count_fixed(range(-500,500))', setup='from __main__ import count_fixed', number = 1)出[81]:19.230753898620605

pypy:

<预><代码>>>>>timeit.timeit('count_fixed(range(-500,500))', setup='from __main__ import count_fixed', number = 1)0.6961538791656494

快!

我可能会在稍后添加一些 Java 测试以进行比较:-)

I expected this Python implementation of ThreeSum to be slow:

def count(a):
       """ThreeSum: Given N distinct integers, how many triples sum to exactly zero?"""
       N = len(a)
       cnt = 0
       for i in range(N):
         for j in range(i+1, N):
           for k in range(j+1, N):
             if sum([a[i], a[j], a[k]]) == 0:
               cnt += 1
       return cnt 

But I was shocked that this version looks pretty slow too:

def count_python(a):
  """ThreeSum using itertools"""
  return sum(map(lambda X: sum(X)==0, itertools.combinations(a, r=3))) 

Can anyone recommend a faster Python implementation? Both implementations just seem so slow... Thanks

...

ANSWER SUMMARY: Here is how the runs of all the various versions provided in this thread of the O(N^3) (for educational purposes, not used in real life) version worked out on my machine:

56 sec RUNNING count_slow...
28 sec RUNNING count_itertools, written by Ashwini Chaudhary...
14 sec RUNNING count_fixed, written by roippi...
11 sec RUNNING count_itertools (faster), written by Veedrak...
08 sec RUNNING count_enumerate, written by roippi...

*Note: Needed to modify Veedrak's solution to this to get the correct count output:
sum(1 for x, y, z in itertools.combinations(a, r=3) if x+y==-z)

解决方案

Supplying a second answer. From various comments, it looks like you're primarily concerned about why this particular O(n**3) algorithm is slow when being ported over from java. Let's dive in.

def count(a):
       """ThreeSum: Given N distinct integers, how many triples sum to exactly zero?"""
       N = len(a)
       cnt = 0
       for i in range(N):
         for j in range(i+1, N):
           for k in range(j+1, N):
             if sum([a[i], a[j], a[k]]) == 0:
               cnt += 1
       return cnt

One major problem that immediately pops out is that you're doing something your java code almost certainly isn't doing: materializing a 3-element list just to add three numbers together!

if sum([a[i], a[j], a[k]]) == 0:

Yuck! Just write that as

if a[i] + a[j] + a[k] == 0:

Some benchmarking shows that you're adding 50%+ overhead just by doing that. Yikes.


The other issue here is that you're using indexing where you should be using iteration. In python try to avoid writing code like this:

for i in range(len(some_list)):
    do_something(some_list[i])

And instead just write:

for x in some_list:
    do_something(x)

And if you explicitly need the index that you're on (as you actually do in your code), use enumerate:

for i,x in enumerate(some_list):
    #etc

This is, in general, a style thing (though it goes deeper than that, with duck typing and the iterator protocol) - but it is also a performance thing. In order to look up the value of a[i], that call is converted to a.__getitem__(i), then python has to dynamically resolve a __getitem__ method lookup, call it, and return the value. Every time. It's not a crazy amount of overhead - at least on builtin types - but it adds up if you're doing it a lot in a loop. Treating a as an iterable, on the other hand, sidesteps a lot of that overhead.

So taking that change in mind, you can rewrite your function once again:

def count_enumerate(a):
    cnt = 0
    for i, x in enumerate(a):
        for j, y in enumerate(a[i+1:], i+1):
            for z in a[j+1:]:
                if x + y + z == 0:
                    cnt += 1
    return cnt

Let's look at some timings:

%timeit count(range(-100,100))
1 loops, best of 3: 394 ms per loop

%timeit count_fixed(range(-100,100)) #just fixing your sum() line
10 loops, best of 3: 158 ms per loop

%timeit count_enumerate(range(-100,100))
10 loops, best of 3: 88.9 ms per loop

And that's about as fast as it's going to go. You can shave off a percent or so by wrapping everything in a comprehension instead of doing cnt += 1 but that's pretty minor.

I've toyed around with a few itertools implementations but I actually can't get them to go faster than this explicit loop version. This makes sense if you think about it - for every iteration, the itertools.combinations version has to rebind what all three variables refer to, whereas the explicit loops get to "cheat" and rebind the variables in the outer loops far less often.

Reality check time, though: after everything is said and done, you can still expect cPython to run this algorithm an order of magnitude slower than a modern JVM would. There is simply too much abstraction built in to python that gets in the way of looping quickly. If you care about speed (and you can't fix your algorithm - see my other answer), either use something like numpy to spend all of your time looping in C, or use a different implementation of python.


postscript: pypy

For fun, I ran count_fixed on a 1000-element list, on both cPython and pypy.

cPython:

In [81]: timeit.timeit('count_fixed(range(-500,500))', setup='from __main__ import count_fixed', number = 1)
Out[81]: 19.230753898620605

pypy:

>>>> timeit.timeit('count_fixed(range(-500,500))', setup='from __main__ import count_fixed', number = 1)
0.6961538791656494

Speedy!

I might add some java testing in later to compare :-)

这篇关于为什么是“地图"?ThreeSum版本这么慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆