在Cython中优化字符串 [英] Optimizing strings in Cython

查看:96
本文介绍了在Cython中优化字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正试图向我们的小组展示Cython增强Python性能的优点.我已经展示了几个基准,所有这些基准都可以通过以下方式加快速度:

I'm trying to demonstrate to our group the virtues of Cython for enhancing Python performance. I have shown several benchmarks, all that attain speed up by just:

  1. 编译现有的Python代码.
  2. 将cdef用于静态类型变量,尤其是在内部循环中.

但是,我们的许多代码都进行字符串操作,而且我还无法提出通过键入Python字符串来优化代码的良好示例.

However, much of our code does string manipulation, and I have not been able to come up with good examples of optimizing code by typing Python strings.

我尝试过的示例是:

cdef str a
cdef int i,j
for j in range(1000000):
   a = str([chr(i) for i in range(127)])

,但是将'a'作为字符串输入实际上会使代码运行速度变慢.我已经阅读了有关"Unicode和传递字符串"的文档,但是对于在我展示的情况下如何应用它感到困惑.我们不使用Unicode,所有内容都是纯ASCII.我们正在使用Python 2.7.2

but typing 'a' as a string actually makes the code run slower. I've read the documentation on 'Unicode and passing strings', but am confused about how it applies in the case I've shown. We don't use Unicode--everything is pure ASCII. We're using Python 2.7.2

任何建议都值得赞赏.

推荐答案

我建议您在cpython.array.array上进行操作.最好的文档是C API和Cython源代码(请参见此处).

I suggest you do your operations on cpython.array.arrays. The best documentation is the C API and the Cython source (see here).

from cpython cimport array

def cfuncA():
    cdef str a
    cdef int i,j
    for j in range(1000):
        a = ''.join([chr(i) for i in range(127)])

def cfuncB():
    cdef:
        str a
        array.array[char] arr, template = array.array('c')
        int i, j

    for j in range(1000):
        arr = array.clone(template, 127, False)

        for i in range(127):
            arr[i] = i

        a = arr.tostring()

请注意,所需的操作因您对字符串的处理而异.

Note that the operations required vary very much on what you do to your strings.

>>> python2 -m timeit -s "import pyximport; pyximport.install(); import cyytn" "cyytn.cfuncA()"
100 loops, best of 3: 14.3 msec per loop

>>> python2 -m timeit -s "import pyximport; pyximport.install(); import cyytn" "cyytn.cfuncB()"
1000 loops, best of 3: 512 usec per loop

在这种情况下,速度提高了30倍.

So that's a 30x speed-up in this case.

另外,FWIW,您可以用arr.data.as_chars[:len(arr)]替换arr.tostring()并将a键入为bytes,从而节省几微秒的时间.

Also, FWIW, you can take off another fair few µs by replacing arr.tostring() with arr.data.as_chars[:len(arr)] and typing a as bytes.

这篇关于在Cython中优化字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆