Pypy vs Python中的计数算法性能优化(Numpy vs List) [英] Counting Algorithm Performance Optimization in Pypy vs Python (Numpy vs List)

查看:98
本文介绍了Pypy vs Python中的计数算法性能优化(Numpy vs List)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的期望是pypy可能比python快一个数量级,但结果表明pypy实际上比预期的要慢.

My expectation was that pypy could be as much as an order of magnitude faster than python, but the results indicate that pypy is in fact slower than expected.

我有两个问题:

  1. 为什么pypy与pypy相比会明显变慢?
  2. 我可以做些什么来优化算法以使pypy(或python)更快?

产生的时间:

Python 2.7.5

Python 2.7.5

  • #点:16,777,216(8 ** 3 * 32 ** 3)
  • Xrange时间:1487.15 ms
  • Xrange numpy时间:2553.98 ms
  • 点生成时间:6162.23毫秒
  • Numpy Gen时间:13894.73毫秒

Pypy 2.2.1

Pypy 2.2.1

  • #点:16,777,216(8 ** 3 * 32 ** 3)
  • Xrange时间:129.48毫秒
  • Xrange numpy时间:4644.12毫秒
  • 点生成时间:4643.82毫秒
  • Numpy Gen时间:44168.98毫秒

算法:

我正在使用一种简单的算法生成一个空间点列表,并且正在尝试优化该算法.

I am generating a list of points in space using a simple algorithm and I'm trying to optimize the algorithm.

def generate(size=32, point=(0, 0, 0), width=32):
    """
    generate points in space around a center point with a specific width and
      number of divisions (size)
    """
    X, Y, Z = point
    half = width * 0.5
    delta = width
    scale = width / size
    offset = scale * 0.5
    X = X + offset - half
    Y = Y + offset - half
    Z = Z + offset - half
    for x in xrange(size):
        x = (x * scale) + X
        for y in xrange(size):
            y = (y * scale) + Y
            for z in xrange(size):
                z = (z * scale) + Z
                yield (x, y, z)

在优化中,我开始考虑使用pypy而不是python.在比较两者时,我提出了几种不同的方案:

In optimizing, I started looking at using pypy rather than python. And in comparing the two, I came up with a few different scenarios:

  • 使用xrange计数

  • counting using xrange

rsize = 8    # size of region
csize = 32   # size of chunk
number_of_points = rsize ** 3 * csize ** 3
[x for x in xrange(number_of_points)]

  • 使用numpy和xrange进行计数

  • counting using xrange with numpy

    rsize = 8    # size of region
    csize = 32   # size of chunk
    number_of_points = rsize ** 3 * csize ** 3
    np.array([x for x in xrange(number_of_points)])
    

  • 使用上述算法运行

  • running with the algorithm above

    rsize = 8    # size of region
    csize = 32   # size of chunk
    [p
     for rp in generate(size=rsize, width=rsize*csize)
     for p in generate(size=csize, width=csize, point=rp)]
    

  • 使用numpy运行上述算法

  • running the algorithm above with numpy

    rsize = 8    # size of region
    csize = 32   # size of chunk
    np.array([p
     for rp in generate(size=rsize, width=rsize*csize)
     for p in generate(size=csize, width=csize, point=rp)])
    

  • 背景:

    我正在尝试创建体素引擎,并且我想优化算法以将生成时间降低到可管理的水平.虽然我显然无法实现接近Java/C ++的任何功能,但我想尽可能地将python(或pypy)推入.

    I am attempting to create a voxel engine, and I want to optimize my algorithm to bring down generation times to a manageable level. While I clearly won't achieve anything close to Java/C++, I'd like to push python (or pypy) as far as I can.

    我注意到,从某些早期的时间来看,列表查找要比字典快得多.并且列表也比元组快(出乎意料),尽管元组生成得更快. Numpy的读取时间比非numpy的读取时间还要快.但是,创建numpy的时间可能要慢几个数量级.

    I noticed that list lookups are significantly faster than dictionaries from some earlier timings. And lists are also faster than tuples (unexpected), though tuples generate faster. Numpy has even faster read times than non-numpy. However, numpy creation time can be orders of magnitude slower.

    因此,如果阅读是最重要的,那么使用numpy有明显的优势.但是,如果阅读和创作同等重要,那么最好选择一个简单的清单.就是说,我没有一个干净的方法来查看内存使用情况,但是我怀疑列表的存储效率远不及元组或numpy. 另外,虽然差异很小,但我发现字典上的.get比使用__ getitem __调用要快一些(即dictionary [lookup] vs. dicitonary.get(lookup))

    So if reading is most important, there is a clear advantage for using a numpy. If reading and creation are of equal importance, however, then a straight list is likely best. That said, I don't have a clean way of looking at memory usage, but I suspect that lists are far less memory efficient than tuples or numpy. Also, while it's a small difference, I found .get on a dictionary to be slightly faster than using the __ getitem __ call (i.e. dictionary[lookup] vs. dicitonary.get(lookup) )

    时间...

    Python 2.7.5

    Python 2.7.5

    阅读

     - Option 1: tuple access... 2045.51 ms
     - Option 2: tuple access (again)... 2081.97 ms    # sampling effect of cache
     - Option 3: list access... 2072.09 ms
     - Option 4: dict access... 3436.53 ms
     - Option 5: iterable creation... N/A
     - Option 6: numpy array... 1752.44 ms
    

    创造

     - Option 1: tuple creation... 690.36 ms
     - Option 2: tuple creation (again)... 716.49 ms    # sampling effect of cache
     - Option 3: list creation... 684.28 ms
     - Option 4: dict creation... 1498.94 ms
     - Option 5: iterable creation...  0.01 ms
     - Option 6: numpy creation... 3514.25 ms
    

    Pypy 2.2.1

    Pypy 2.2.1

    阅读

     - Option 1: tuple access... 243.34 ms
     - Option 2: tuple access (again)... 246.51 ms    # sampling effect of cache
     - Option 3: list access... 139.65 ms
     - Option 4: dict access... 454.65 ms
     - Option 5: iterable creation... N/A
     - Option 6: numpy array... 21.60 ms
    

    创造

     - Option 1: tuple creation... 1016.27 ms
     - Option 2: tuple creation (again)... 1063.50 ms    # sampling effect of cache
     - Option 3: list creation... 365.98 ms
     - Option 4: dict creation... 2258.44 ms
     - Option 5: iterable creation...  0.00 ms
     - Option 6: numpy creation... 12514.20 ms
    

    在所有示例中,均针对随机数据生成了随机查找.

    In all the examples, a random lookup was generated for random data.

    dsize = 10 ** 7   # or 10 million data points
    data = [(i, random.random()*dsize)
             for i in range(dsize)]
    lookup = tuple(int(random.random()*dsize) for i in range(dsize))
    

    循环非常简单:

    for x in lookup:
        data_of_specific_type[x]
    

    特定类型的数据 是将数据转换为该类型(例如元组(数据),列表(数据)等)

    And the data_of_specific_type was a conversion of data into that type (e.g. tuple(data), list(data), etc.)

    推荐答案

    部分问题在于:

    np.array([p
        for rp in generate(size=rsize, width=rsize*csize)
        for p in generate(size=csize, width=csize, point=rp)])
    

    完成创建list 并将其转换为np.array的所有工作.

    Does all the work of creating the list and converting it to an np.array.

    一种更快的方法是:

    arr = np.empty(size)
    i = 0
    for rp in generate(size=rsize, width=rsize*csize):
        for p in generate(size=csize, width=csize, point=rp):
            arr[i] = p
            i += 1
    

    这篇关于Pypy vs Python中的计数算法性能优化(Numpy vs List)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆