用cython加速python代码 [英] Speeding up python code with cython

查看:99
本文介绍了用cython加速python代码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个函数,它基本上只是对一个简单定义的哈希函数进行大量调用,并进行测试以查看何时找到重复项.我需要使用它进行很多模拟,因此希望它尽可能快.我正在尝试使用cython执行此操作. cython代码当前是使用普通的python整数列表调用的,这些列表的值在0到m ^ 2的范围内.

import math, random
cdef int a,b,c,d,m,pos,value, cyclelimit, nohashcalls   
def h3(int a,int b,int c,int d, int m,int x):
    return (a*x**2 + b*x+c) %m    
def floyd(inputx):
    dupefound, nohashcalls = (0,0)
    m = len(inputx)
    loops = int(m*math.log(m))
    for loopno in xrange(loops):
        if (dupefound == 1):
            break
        a = random.randrange(m)
        b = random.randrange(m)
        c = random.randrange(m)
        d = random.randrange(m)
        pos = random.randrange(m)
        value = inputx[pos]
        listofpos = [0] * m
        listofpos[pos] = 1
        setofvalues = set([value])
        cyclelimit = int(math.sqrt(m))
        for j in xrange(cyclelimit):
            pos = h3(a,b, c,d, m, inputx[pos])
            nohashcalls += 1    
            if (inputx[pos] in setofvalues):
                if (listofpos[pos]==1):
                    dupefound = 0
                else:
                    dupefound = 1
                    print "Duplicate found at position", pos, " and value", inputx[pos]
                break
            listofpos[pos] = 1
            setofvalues.add(inputx[pos])
    return dupefound, nohashcalls 

如何将inputx和listofpos转换为使用C类型的数组并以C速度访问这些数组?我还能使用其他提速方法吗?可以加快setofvalues吗?

为了比较,在我的计算机上,50次调用m = 5000的floyd()大约需要30秒.

更新:示例代码片段显示了如何调用floyd.

m = 5000
inputx = random.sample(xrange(m**2), m)
(dupefound, nohashcalls) = edcython.floyd(inputx)

解决方案

首先,似乎您必须在函数内部 中键入变量. 一个很好的例子在这里.

第二个cython -a表示注释",它为cython编译器生成的代码提供了非常出色的分解,并以彩色编码指示了它的肮脏程度(阅读:python api繁重).尝试优化任何内容时,此输出确实必不可少.

第三页,与Numpy一起使用上的著名页面解释了如何获得快速的C风格访问Numpy数组数据.不幸的是,它冗长而烦人.不过,我们很幸运,因为最新的Cython提供了类型化的内存视图,它们都易于使用和太棒了.在尝试执行其他任何操作之前,请先阅读整个页面.

大约十分钟后,我想到了这一点:

# cython: infer_types=True

# Use the C math library to avoid Python overhead.
from libc cimport math
# For boundscheck below.
import cython
# We're lazy so we'll let Numpy handle our array memory management.
import numpy as np
# You would normally also import the Numpy pxd to get faster access to the Numpy
# API, but it requires some fancier compilation options so I'll leave it out for
# this demo.
# cimport numpy as np

import random

# This is a small function that doesn't need to be exposed to Python at all. Use
# `cdef` instead of `def` and inline it.
cdef inline int h3(int a,int b,int c,int d, int m,int x):
    return (a*x**2 + b*x+c) % m

# If we want to live fast and dangerously, we tell cython not to check our array
# indices for IndexErrors. This means we CAN overrun our array and crash the
# program or screw up our stack. Use with caution. Profiling suggests that we
# aren't gaining anything in this case so I leave it on for safety.
# @cython.boundscheck(False)
# `cpdef` so that calling this function from another Cython (or C) function can
# skip the Python function call overhead, while still allowing us to use it from
# Python.
cpdef floyd(int[:] inputx):
    # Type the variables in the scope of the function.
    cdef int a,b,c,d, value, cyclelimit
    cdef unsigned int dupefound = 0
    cdef unsigned int nohashcalls = 0
    cdef unsigned int loopno, pos, j

    # `m` has type int because inputx is already a Cython memory view and
    # `infer-types` is on.
    m = inputx.shape[0]

    cdef unsigned int loops = int(m*math.log(m))

    # Again using the memory view, but letting Numpy allocate an array of zeros.
    cdef int[:] listofpos = np.zeros(m, dtype=np.int32)

    # Keep this random sampling out of the loop
    cdef int[:, :] randoms = np.random.randint(0, m, (loops, 5)).astype(np.int32)

    for loopno in range(loops):
        if (dupefound == 1):
            break

        # From our precomputed array
        a = randoms[loopno, 0]
        b = randoms[loopno, 1]
        c = randoms[loopno, 2]
        d = randoms[loopno, 3]
        pos = randoms[loopno, 4]

        value = inputx[pos]

        # Unforunately, Memory View does not support "vectorized" operations
        # like standard Numpy arrays. Otherwise we'd use listofpos *= 0 here.
        for j in range(m):
            listofpos[j] = 0

        listofpos[pos] = 1
        setofvalues = set((value,))
        cyclelimit = int(math.sqrt(m))
        for j in range(cyclelimit):
            pos = h3(a, b, c, d, m, inputx[pos])
            nohashcalls += 1
            if (inputx[pos] in setofvalues):
                if (listofpos[pos]==1):
                    dupefound = 0
                else:
                    dupefound = 1
                    print "Duplicate found at position", pos, " and value", inputx[pos]
                break
            listofpos[pos] = 1
            setofvalues.add(inputx[pos])
    return dupefound, nohashcalls

此处没有 docs.cython.org 上没有解释的技巧,在这里我自己学习了这些技巧,但有帮助看到它们全部融合在一起.

原始代码中最重要的更改是在注释中,但是它们都相当于为Cython提供了有关如何生成不使用Python API的代码的提示.

顺便说一句:我真的不知道为什么默认情况下infer_types没有打开.它让编译器 在可能的情况下隐式使用C类型而不是Python类型,这意味着您的工作量会减少.

如果对此执行cython -a,则会看到唯​​一调用Python的行是对random.sample的调用,以及对Python set()的构建或添加.

在我的计算机上,原始代码将在2.1秒内运行.我的版本运行时间为0.6秒.

下一步是从该循环中获取random.sample,但我将留给您.

我已经编辑了答案,以演示如何预先计算rand样本.这将时间缩短至 0.4秒.

I have a function which just basically makes lots of calls to a simple defined hash function and tests to see when it finds a duplicate. I need to do lots of simulations with it so would like it to be as fast as possible. I am attempting to use cython to do this. The cython code is currently called with a normal python list of integers with values in the range 0 to m^2.

import math, random
cdef int a,b,c,d,m,pos,value, cyclelimit, nohashcalls   
def h3(int a,int b,int c,int d, int m,int x):
    return (a*x**2 + b*x+c) %m    
def floyd(inputx):
    dupefound, nohashcalls = (0,0)
    m = len(inputx)
    loops = int(m*math.log(m))
    for loopno in xrange(loops):
        if (dupefound == 1):
            break
        a = random.randrange(m)
        b = random.randrange(m)
        c = random.randrange(m)
        d = random.randrange(m)
        pos = random.randrange(m)
        value = inputx[pos]
        listofpos = [0] * m
        listofpos[pos] = 1
        setofvalues = set([value])
        cyclelimit = int(math.sqrt(m))
        for j in xrange(cyclelimit):
            pos = h3(a,b, c,d, m, inputx[pos])
            nohashcalls += 1    
            if (inputx[pos] in setofvalues):
                if (listofpos[pos]==1):
                    dupefound = 0
                else:
                    dupefound = 1
                    print "Duplicate found at position", pos, " and value", inputx[pos]
                break
            listofpos[pos] = 1
            setofvalues.add(inputx[pos])
    return dupefound, nohashcalls 

How can I convert inputx and listofpos to use C type arrays and to access the arrays at C speed? Are there any other speed ups I can use? Can setofvalues be sped up?

So that there is something to compare against, 50 calls to floyd() with m = 5000 currently takes around 30 seconds on my computer.

Update: Example code snippet to show how floyd is called.

m = 5000
inputx = random.sample(xrange(m**2), m)
(dupefound, nohashcalls) = edcython.floyd(inputx)

解决方案

First of all, it seems that you must type the variables inside the function. A good example of it is here.

Second, cython -a, for "annotate", gives you a really excellent break down of the code generated by the cython compiler and a color-coded indication of how dirty (read: python api heavy) it is. This output is really essential when trying to optimize anything.

Third, the now famous page on working with Numpy explains how to get fast, C-style access to the Numpy array data. Unforunately it's verbose and annoying. We're in luck however, because more recent Cython provides Typed Memory Views, which are both easy to use and awesome. Read that entire page before you try to do anything else.

After ten minutes or so I came up with this:

# cython: infer_types=True

# Use the C math library to avoid Python overhead.
from libc cimport math
# For boundscheck below.
import cython
# We're lazy so we'll let Numpy handle our array memory management.
import numpy as np
# You would normally also import the Numpy pxd to get faster access to the Numpy
# API, but it requires some fancier compilation options so I'll leave it out for
# this demo.
# cimport numpy as np

import random

# This is a small function that doesn't need to be exposed to Python at all. Use
# `cdef` instead of `def` and inline it.
cdef inline int h3(int a,int b,int c,int d, int m,int x):
    return (a*x**2 + b*x+c) % m

# If we want to live fast and dangerously, we tell cython not to check our array
# indices for IndexErrors. This means we CAN overrun our array and crash the
# program or screw up our stack. Use with caution. Profiling suggests that we
# aren't gaining anything in this case so I leave it on for safety.
# @cython.boundscheck(False)
# `cpdef` so that calling this function from another Cython (or C) function can
# skip the Python function call overhead, while still allowing us to use it from
# Python.
cpdef floyd(int[:] inputx):
    # Type the variables in the scope of the function.
    cdef int a,b,c,d, value, cyclelimit
    cdef unsigned int dupefound = 0
    cdef unsigned int nohashcalls = 0
    cdef unsigned int loopno, pos, j

    # `m` has type int because inputx is already a Cython memory view and
    # `infer-types` is on.
    m = inputx.shape[0]

    cdef unsigned int loops = int(m*math.log(m))

    # Again using the memory view, but letting Numpy allocate an array of zeros.
    cdef int[:] listofpos = np.zeros(m, dtype=np.int32)

    # Keep this random sampling out of the loop
    cdef int[:, :] randoms = np.random.randint(0, m, (loops, 5)).astype(np.int32)

    for loopno in range(loops):
        if (dupefound == 1):
            break

        # From our precomputed array
        a = randoms[loopno, 0]
        b = randoms[loopno, 1]
        c = randoms[loopno, 2]
        d = randoms[loopno, 3]
        pos = randoms[loopno, 4]

        value = inputx[pos]

        # Unforunately, Memory View does not support "vectorized" operations
        # like standard Numpy arrays. Otherwise we'd use listofpos *= 0 here.
        for j in range(m):
            listofpos[j] = 0

        listofpos[pos] = 1
        setofvalues = set((value,))
        cyclelimit = int(math.sqrt(m))
        for j in range(cyclelimit):
            pos = h3(a, b, c, d, m, inputx[pos])
            nohashcalls += 1
            if (inputx[pos] in setofvalues):
                if (listofpos[pos]==1):
                    dupefound = 0
                else:
                    dupefound = 1
                    print "Duplicate found at position", pos, " and value", inputx[pos]
                break
            listofpos[pos] = 1
            setofvalues.add(inputx[pos])
    return dupefound, nohashcalls

There are no tricks here that aren't explained on docs.cython.org, which is where I learned them myself, but helps to see it all come together.

The most important changes to your original code are in the comments, but they all amount to giving Cython hints about how to generate code that doesn't use the Python API.

As an aside: I really don't know why infer_types is not on by default. It lets the compiler implicitly use C types instead of Python types where possible, meaning less work for you.

If you run cython -a on this, you'll see that the only lines that call into Python are your calls to random.sample, and building or adding to a Python set().

On my machine, your original code runs in 2.1 seconds. My version runs in 0.6 seconds.

The next step is to get random.sample out of that loop, but I'll leave that to you.

I have edited my answer to demonstrate how to precompute the rand samples. This brings the time down to 0.4 seconds.

这篇关于用cython加速python代码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆