如何在cython pure模式下遍历列表 [英] How to loop over a list in cython pure mode

查看:254
本文介绍了如何在cython pure模式下遍历列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

为了加快 struct.pack()的速度,我将以下内容打包为一个int字节:

In an attempt to speed up struct.pack(), I have the following to pack an int to bytes:

import cython as c
from cython import nogil, compile, returns, locals, cfunc, pointer, address

int_bytes_buffer = c.declare(c.char[400], [0] * 400)


@locals(i = c.int, num = c.int)
@returns(c.int)
@cfunc
@nogil
@compile
def int_to_bytes(num):
    i = 0
    while num >0:
        int_bytes_buffer[i] = num%256
        num//=256
        i+=1

    return int_bytes_buffer[0]


int_to_bytes(259)

我正在尝试将此操作用于int列表,其中包括以下内容错误的代码:

I'm trying to get this to work on a list of ints, with the following bad code:

@locals(i = c.int, ints_p = pointer(c.int[100]), num = c.int)
@returns(c.int)
@cfunc
@nogil
@compile
def int_to_bytes(num):
    i = 0
    for num in ints_p:
        while num >0:
            int_bytes_buffer[i] = num%256
            num//=256
            i+=1

    return int_bytes_buffer[0]

ints = c.declare(c.int[100],  [259]*100)
int_to_bytes(address(ints))

这给了我:

    for num in ints_p:
              ^
----------------------------------------------------------

 Accessing Python global or builtin not allowed without gil

不允许全局或内置访问Python显然,我不应该在中使用$code>或在指针上循环。

Evidently I shouldn't be using in, or looping over a pointer.

如何遍历函数内部的列表数组?

How can I loop over the list-made-array inside the function?

EDIT

我正在尝试将指向该int数组的指针传递给该函数,并使其在没有gil的情况下工作,以便可以对其进行并行化。

I'm trying to pass a pointer to an array of ints to the function, and have it work without the gil so it can be parallelized.

该函数的参数应该是ints_p:

The parameter to the function should've been ints_p:

@locals(ints_p = pointer(c.int[100]), i = c.int, num = c.int)
@returns(c.int)
@cfunc
@nogil
@compile
def int_to_bytes(ints_p):
    i = 0
    for num in (*ints_p):
        while num >0:
            int_bytes_buffer[i] = num%256
            num//=256
            i+=1

    return int_bytes_buffer[0]

ints = c.declare(c.int[100],  [259]*100)
int_to_bytes(address(ints))

并且我想查看实际的整数并将其打包(不包含gil)

and I want to run over the actual ints and pack them (without the gil)

编辑2

我知道 struct.pack 。我希望使用cython和 nogil 做一个可并行化的变体。

I am aware of struct.pack. I wish to make a parallelizeable variant with cython and nogil.

推荐答案

这毫无意义:


  1. Python int可以任意大。在打包中的实际计算工作是在计算是否适合给定大小,然后将其复制到该大小的空间中。但是,您使用的是C int s数组。这些具有固定的大小。将它们提取到字节数组中基本上没有任何工作要做。您所做的只是编写了一个效率很低的 memcpy 版本。它们实际上已经作为一组连续的字节在内存中-您要做的就是这样查看它们:

  1. A Python int can be arbitrarily big. The actual computational work in "packing" it is working out if it fits in a given size and then copying it to a space of that size. However, you're using an array of C ints. These have a fixed size. There is basically no work to be done in extracting them into an array of bytes. All you have done is written a very inefficient version of memcpy. They are literally already in memory as a contiguous set of bytes - all you have to do is view them as such:

# using Numpy (no Cython)
ints = np.array([1,2,3,4,5,6,7], dtype=np.int) # some numpy array already initialized
as_bytes = ints.view(dtype=np.byte) # no data is copied - wonderfully efficient

您可以创建一个类似的方法也可以与另一个数组库或C数组一起使用:

you could make a similar approach work with another array library or with C arrays too:

# slightly pointless use of pure-Python mode since this won't
# be valid in Python.
@cython.cfunc
@cython.returns(cython.p_char)
@cython.locals(x = cython.p_int)
def cast_ptr(x):
    return cython.cast(cython.p_char,x)


  • 您说您想要nogil,所以可以并行化。当需要进行实际的计算工作时,并行化效果很好。当任务受内存访问限制时,它不能很好地工作,因为线程往往最终会互相等待访问内存。

  • You say you want nogil so it can be parallelized. Parallelization works well when there's actual computational work to be done. It doesn't work well when the task is limited by memory access, since the threads tend to end up waiting for each other for access to the memory. This task will not parallelize well.

    内存管理是一个问题。您只能写入固定大小的缓冲区。要分配大小可变的数组,您有多种选择:您可以使用 numpy 或Python array 模块(或类似)以让Python处理内存管理,或者您可以使用 malloc free 在C级。由于您声称需要 nogil ,因此必须使用C方法。但是,您不能在Cython的纯Python模式下执行此操作,因为所有内容也都必须在Python中工作,并且没有 malloc free 。如果您坚持要进行这项工作,那么您就必须放弃Cython的纯Python模式,并使用标准的Cython语法,因为您尝试执行的操作无法与两者兼容。

    Memory management is a problem. You're only capable of writing into fixed-size buffers. To allocate variable-sized arrays you have a number of choices: you can use numpy or the Python array module (or similar) to let Python take care of the memory-management or you can use malloc and free to allocate arrays on a C level. Since you claim to need nogil you have to use the C approach. However, you cannot do this from Cython's pure-Python mode since everything also has to work in Python and there is no Python equivalent of malloc and free. If you insist on trying to make this work then you need to abandon Cython's pure-Python mode and use the standard Cython syntax since what you are trying to do cannot be made compatible with both.

    请注意,当前 int_bytes_buffer 是全局数组。这意味着多个线程将共享它-对于您认为的并行化来说是一场灾难。

    Note that currently int_bytes_buffer is a global array. This means that multiple threads will share it - a disaster for your supposed parallelization.






    您需要清楚地考虑输入的内容。如果它是Python整数列表,则无法使用 nogil 进行此操作(因为您正在处理Python对象,因此需要GIL)。如果它是某些C级数组(例如Numpy, array 模块或Cython声明的C数组),则您的数据已经是您想要的格式,而您只需要


    You need to think clearly what your inputs are going to be. If it's a list of Python ints then you cannot make this work with nogil (since you are manipulating Python objects and this requires the GIL). If it's some C-level array (be it Numpy, the array module, or a Cython declared C array) then your data is already in the format you want and you just have to view it as such.

    编辑:从注释中看,这显然是XY问题(您要解决此Cython语法问题,因为您想打包一个整数列表)我添加了一种使用Cython打包Python整数列表的快速方法。这比struct pack快7倍,比将列表传递给 array.array 快5倍。它通常更快,因为它专门做一件事。

    From the comments this is clearly an X-Y problem (you're asking about fixing this Cython syntax because you want to pack a list of ints) I've added a fast way of packing a list of Python ints using Cython. This is 7x faster than struct pack and 5x faster than passing a list to array.array. It's mostly faster because it's specialized to only do one thing.

    我已经将 bytearray 用作方便的可写数据商店和 Python 内存视图(与Cython memoryview语法不太一样...)作为强制转换数据类型的方法。没有花费任何真正的精力来优化它,因此您可以进行改进。请注意,最后复制到 bytes 并不会更改可测量的时间,这说明了复制内存与总体速度无关。

    I've used bytearray as a convenient writeable data store and the Python memoryview class (not quite the same as the Cython memoryview syntax...) as a way to cast the data-types. No real effort has been spent optimising it so you may be able to improve it. Note that the copy into bytes at the end does not change the time measurable, illustrating just how irrelevant copying the memory is to the overall speed.

    @cython.boundscheck(False)
    @cython.wraparound(False)
    def packlist(a):
        out = bytearray(4*len(a))
        cdef int[::1] outview = memoryview(out).cast('i')
        cdef int i
        for i in range(len(a)):
            outview[i] = a[i]
        return bytes(out)
    

    这篇关于如何在cython pure模式下遍历列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆