使用Cython优化简单的CPU绑定循环并替换列表 [英] Optimizing simple CPU-bound loops using Cython and replacing a list

查看:65
本文介绍了使用Cython优化简单的CPU绑定循环并替换列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试评估一些方法,并且在性能上遇到了绊脚石.

I am trying to evaluate some approaches, and I'm hitting a stumbling block with performance.

为什么我的cython代码这么慢?我的期望是,该代码的运行时间将比毫秒快得多(对于只有256个** 2项的2d循环,可能运行十亿分之一秒).

Why is my cython code so slow?? My expectation is that the code would run quite a bit faster (maybe nano seconds for a 2d loop with only 256 ** 2 entries) as opposed to milliseconds.

这是我的测试结果:

$ python setup.py build_ext --inplace; python test.py
running build_ext
        counter: 0.00236220359802 sec
       pycounter: 0.00323309898376 sec
      percentage: 73.1 %

我的初始代码如下:

#!/usr/bin/env python
# encoding: utf-8
# filename: loop_testing.py

def generate_coords(dim, length):
    """Generates a list of coordinates from dimensions and size
    provided.

    Parameters:
        dim -- dimension
        length -- size of each dimension

    Returns:
        A list of coordinates based on dim and length
    """
    values = []
    if dim == 2:
        for x in xrange(length):
            for y in xrange(length):
                values.append((x, y))

    if dim == 3:
        for x in xrange(length):
            for y in xrange(length):
                for z in xrange(length):
                    values.append((x, y, z))

    return values

这可以满足我的需求,但是速度很慢.对于给定的dim,length =(2,256),我在iPython上看到的时间约为2.3ms.

This works for what I need, but is slow. For a given dim, length = (2, 256), I see a timing on iPython of approximately 2.3ms.

为了加快速度,我开发了一个cython等效项(我认为这是等效项).

In an attempt to speed this up, I developed a cython equivalent (I think it's an equivalent).

#!/usr/bin/env python
# encoding: utf-8
# filename: loop_testing.pyx
# cython: boundscheck=False
# cython: wraparound=False

cimport cython
from cython.parallel cimport prange

import numpy as np
cimport numpy as np


ctypedef int DTYPE

# 2D point updater
cpdef inline void _counter_2d(DTYPE[:, :] narr, int val) nogil:
    cdef:
        DTYPE count = 0
        DTYPE index = 0
        DTYPE x, y

    for x in range(val):
        for y in range(val):
            narr[index][0] = x
            narr[index][1] = y
            index += 1

cpdef DTYPE[:, :] counter(dim=2, val=256):
    narr = np.zeros((val**dim, dim), dtype=np.dtype('i4'))
    _counter_2d(narr, val)
    return narr

def pycounter(dim=2, val=256):
    vals = []
    for x in xrange(val):
        for y in xrange(val):
            vals.append((x, y))
    return vals

计时的调用:

#!/usr/bin/env python
# filename: test.py
"""
Usage:
    test.py [options]
    test.py [options] <val>
    test.py [options] <dim> <val>

Options:
    -h --help       This Message
    -n              Number of loops [default: 10]
"""

if __name__ == "__main__":
    from docopt import docopt
    from timeit import Timer

    args = docopt(__doc__)
    dim = args.get("<dim>") or 2
    val = args.get("<val>") or 256
    n = args.get("-n") or 10
    dim = int(dim)
    val = int(val)
    n = int(n)

    tests = ['counter', 'pycounter']
    timing = {}
    for test in tests:
        code = "{}(dim=dim, val=val)".format(test)
        variables = "dim, val = ({}, {})".format(dim, val)
        setup = "from loop_testing import {}; {}".format(test, variables)
        t = Timer(code, setup=setup)
        timing[test] = t.timeit(n) / n

    for test, val in timing.iteritems():
        print "{:>20}: {} sec".format(test, val)
    print "{:>20}: {:>.3} %".format("percentage", timing['counter'] / timing['pycounter'] * 100)

作为参考,用于建立cython代码的setup.py:

And for reference, the setup.py to build the cython code:

from distutils.core import setup
from Cython.Build import cythonize
import numpy

include_path = [numpy.get_include()]

setup(
    name="looping",
    ext_modules=cythonize('loop_testing.pyx'),  # accepts a glob pattern
    include_dirs=include_path,
)

链接到工作版本: https://github.com/brianbruggeman/cython_experimentation

Link to working version: https://github.com/brianbruggeman/cython_experimentation

推荐答案

由于narr[index][0] = x分配非常依赖Python C-API,因此此Cython代码很慢.使用narr[index, 0] = x代替,它将转换为纯C,并解决了此问题.

This Cython code was slow because of the narr[index][0] = x assignment, which relies heavily on Python C-API. Using, narr[index, 0] = x instead, is translated to pure C, and solves this issue.

@perimosocordiae指出,将cythonize与注释一起使用绝对是调试此类问题的方法.

As pointed out by @perimosocordiae, using cythonize with annotations is definitely the way to go to debug such issues.

在某些情况下,值得在setup.py中为gcc明确指定编译标志

In some cases it can also be worth explicitly specifying compilation flags in setup.py for gcc,

setup(
   [...]
   extra_compile_args=['-O2', '-march=native'],
   extra_link_args=['-O2', '-march=native'])

假设合理的默认编译标志,则这没有必要.但是,例如,在我的Linux系统上,默认似乎根本没有优化,并且添加了以上标志,从而导致了显着的性能改进.

This should not be necessary, assuming reasonable default compilation flags. However, for instance, on my Linux system the default appear to be no optimization at all and adding the above flags, results in a significant performance improvement.

这篇关于使用Cython优化简单的CPU绑定循环并替换列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆