Numpy 单个元素访问比列表慢 [英] Numpy individual element access slower than for lists

查看:14
本文介绍了Numpy 单个元素访问比列表慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚开始使用 Numpy,并注意到遍历 Numpy 数组中的每个元素比使用列表执行相同操作慢约 4 倍.我现在知道这违背了 Numpy 的目的,如果可能,我应该对函数进行矢量化.我的问题是为什么它慢了 4 倍.这似乎是一个相当大的数目.

我使用 %timeit

运行了以下测试

将 numpy 导入为 npb = np.eye(1000)a = b.tolist()%timeit b[100][100] #1000000 个循环,最好的 3 个:每个循环 692 ns%timeit a[100][100] #10000000 个循环,最好的 3 个:每个循环 70.7 ns%timeit b[100,100] #1000000 个循环,最好的 3 个:每个循环 343 ns%timeit b.item(100,100) #1000000 个循环,最好的 3 个:每个循环 297 ns

我尝试使用 dis.dis 来查看幕后发生了什么,但得到了:

TypeError: 不知道如何反汇编方法包装器对象

然后我尝试查看Numpy源代码但无法弄清楚哪个文件对应于数组元素访问.我很好奇是什么导致了额外的开销,更重要的是如何在未来为自己解决这个问题.似乎python无法轻松编译为C代码,因此我可以看到差异.但是有没有办法查看为每一行生成什么字节码,以了解差异?

解决方案

总而言之:从 NumPy 数组中获取项目需要创建新的 Python 对象,而列表则不是这种情况.此外,NumPy 数组的索引比列表稍微复杂一些,这可能会增加一些额外的开销.

<小时>

总结一下,您列出的 NumPy 操作执行以下操作:

  1. b[100][100] 以数组形式返回b 的第 100 行,然后获取该行索引 100 处的值,将该值作为数组返回对象(例如 np.int64 类型).
  2. b[100,100]直接返回第 100 行和第 100 列的值(不首先返回中间数组).
  3. b.item(100,100) 与上述 b[100,100] 的作用相同,只是将值转换为原生 Python 类型并返回.

在这些操作中,(1) 是最慢的,因为它需要两个连续的 NumPy 索引操作(我将在下面解释为什么这比列表索引慢).(2) 是最快的,因为只执行单个索引操作.操作 (3) 可能更慢,因为它是一个方法调用(这些在 Python 中通常很慢).

为什么 list 访问仍然比 b[100,100] 更快?

对象创建

Python 列表是指向内存中对象的指针数组.例如,列表 [1, 2, 3] 不直接包含这些整数,而是指向每个整数对象已经存在的内存地址的指针.要从列表中获取项目,Python 只需返回对该对象的引用.

NumPy 数组不是对象的集合.数组 np.array([1, 2, 3]) 只是一个连续的内存块,其位设置为表示这些整数值.要从此数组中获取整数,必须在与数组分开的内存中构造一个新的 Python 对象.例如,索引操作可能会返回一个 np.int64 对象:这个对象以前不存在,必须创建.

索引复杂度

a[100][100](从列表中获取)比 b[100,100](从数组中获取)更快的另外两个原因是:

  • 字节码操作码 BINARY_SUBSCR 在索引列表和数组时执行,但它针对 Python 列表进行了优化.

  • 处理 Python 列表整数索引的内部 C 函数非常简短.另一方面,NumPy 索引要复杂得多,并且需要执行大量代码来确定所使用的索引类型,以便返回正确的值.

下面更详细地描述了使用a[100][100]b[100,100]访问列表和数组中元素的步骤.>

字节码

为列表和数组触发相同的四个字节码操作码:

 0 LOAD_NAME 0 (a) # 列表或数组3 LOAD_CONST 0 (100) # 索引号(b[100,100] 的元组)6 BINARY_SUBSCR # 找到正确的getitem"函数7 RETURN_VALUE # 从列表或数组返回的值

注意:如果您开始为多维列表建立链式索引,例如a[100][100][100],你开始重复这些字节码指令.使用元组索引的 NumPy 数组不会发生这种情况:b[100,100,100] 仅使用四个指令.这就是为什么随着维度数量的增加,时间上的差距开始缩小.

找到正确的getitem"函数

访问列表和数组的函数是不同的,需要在每种情况下找到正确的函数.此任务由 BINARY_SUBSCR<处理/a> 操作码:

w = POP();//我们的索引v = TOP();//我们的列表或 NumPy 数组if (PyList_CheckExact(v) && PyInt_CheckExact(w)) {//我们有一个列表和一个整数吗?/* 内联:列表[int] */Py_ssize_t i = PyInt_AsSsize_t(w);如果 (i <0)i += PyList_GET_SIZE(v);如果 (i >= 0 && i < PyList_GET_SIZE(v)) {x = PyList_GET_ITEM(v, i);//为列表调用getitem"Py_INCREF(x);}别的转到slow_get;}别的慢_获取:x = PyObject_GetItem(v, w);//否则,调用另一个函数//找出需要的东西Py_DECREF(v);Py_DECREF(w);SET_TOP(x);如果 (x != NULL) 继续;休息;

此代码针对 Python 列表进行了优化.如果函数看到一个列表,它会快速调用函数PyList_GET_ITEM.现在可以在所需的索引处访问此列表(请参阅下面的下一部分).

但是,如果它没有看到列表(例如,我们有一个 NumPy 数组),它将采用slow_get"路径.这反过来调用另一个函数 PyObject_GetItem 检查对象映射到哪个getitem"函数:

PyObject_GetItem(PyObject *o, PyObject *key){PyMappingMethods *m;if (o == NULL || 键 == NULL)返回 null_error();m = o->ob_type->tp_as_mapping;if (m && m->mp_subscript)返回 m->mp_subscript(o, key);...

对于 NumPy 数组,正确的函数位于 mp_subscript.

注意在调用这个正确的get"函数之前附加的函数调用.这些调用会增加 b[100] 的开销,不过多少取决于 Python/NumPy 的编译方式、系统架构等.

从 Python 列表中获取

在上面可以看到函数 PyList_GET_ITEM 被调用.这是一个简短的函数,本质上是这样的*:

PyList_GetItem(PyObject *op, Py_ssize_t i){if (!PyList_Check(op)) {//检查是否列表PyErr_BadInternalCall();返回空;}if (i < 0 || i >= Py_SIZE(op)) {//检查 i 在范围内如果(索引错误 == NULL){indexerr = PyUnicode_FromString("列表索引超出范围");如果(索引错误 == NULL)返回空;}PyErr_SetObject(PyExc_IndexError, indexerr);返回空;}return ((PyListObject *)op) ->ob_item[i];//返回对象的引用}

* PyList_GET_ITEM 实际上是这个函数的宏形式,它做同样的事情,减去错误检查.

这意味着获取 Python 列表索引 i 处的项目相对简单.在内部,Python 检查项的类型是否为列表,i 是否在列表的正确范围内,然后返回对列表中对象的引用.

从 NumPy 数组中获取

相比之下,NumPy 必须做更多的工作才能返回请求索引处的值.

数组可以通过多种不同的方式进行索引,而 NumPy 必须决定需要哪种索引例程.各种索引例程主要由 mapping.c.

用于索引 NumPy 数组的任何内容都通过函数 prepare_index 开始解析索引并存储广播、维数等信息.这是函数的调用签名:

NPY_NO_EXPORT intprepare_index(PyArrayObject *self, PyObject *index,npy_index_info *索引,int *num, int *ndim, int *out_fancy_ndim, int allow_boolean)/* @param 被索引的数组* @param 索引对象* @param index info struct 被填充(NPY_MAXDIMS 的大小 * 2 + 1)* @param 找到的索引数* @param 索引结果的维度* 花式/高级索引部分的@param 维度* @param 是否允许布尔特殊情况*/

函数要做很多检查.即使对于b[100,100]这样相对简单的索引,也必须推断出很多信息,以便NumPy可以返回对正确值的引用(视图).

总而言之,找到 NumPy 的getitem"函数需要更长的时间,并且处理数组索引的函数必然比 Python 列表的单个函数更复杂.

I just started using Numpy and noticed that iterating through each element in a Numpy array is ~4x slower than doing the same but with a list of lists. I know now that this defeats the purpose of Numpy and I should vectorize the function if possible. My question is though why is it 4x slower. That seems like quite a large amount.

I ran the tests below using %timeit

import numpy as np
b = np.eye(1000)
a = b.tolist()

%timeit b[100][100] #1000000 loops, best of 3: 692 ns per loop
%timeit a[100][100] #10000000 loops, best of 3: 70.7 ns per loop
%timeit b[100,100] #1000000 loops, best of 3: 343 ns per loop
%timeit b.item(100,100) #1000000 loops, best of 3: 297 ns per loop

I tried to use dis.dis to see what was going on under the hood but got:

TypeError: don't know how to disassemble method-wrapper objects

Then I tried to look at the Numpy source code but couldn't figure out which file corresponded to array element access. I'm curious what accounts for the extra overhead, and more importantly how to figure this out for myself in the future. It seems like python can't be easily compiled to C code so that I can see the difference. But is there a way to see what byte code is generated for each line, to get a sense of the differences?

解决方案

In summary: getting an item from a NumPy array requires new Python objects to be created, whereas this is not the case for lists. Also, indexing is more slightly more complicated for NumPy arrays than lists which may add some additional overhead.


To recap, the NumPy operations you have listed do the following:

  1. b[100][100] returns row 100 of b as an array, and then gets the value at index 100 of this row, returning the value as an object (e.g. a np.int64 type).
  2. b[100,100] returns the value at row 100 and column 100 directly (no intermediate array is returned first).
  3. b.item(100,100) does the same as above b[100,100] except that the value is converted to a native Python type and returned.

Now of these operation, (1) is slowest because it requires two sequential NumPy indexing operations (I'll explain why this is slower than list indexing below). (2) is quickest because only a single indexing operation is performed. Operation (3) is possibly slower as it is a method call (these are generally slow in Python).

Why is list access still faster than b[100,100]?

Object creation

Python lists are arrays of pointers to objects in memory. For example, the list [1, 2, 3] does not contain those integers directly, but rather pointers to the memory addresses were each integer object already exists. To get an item from the list, Python just returns a reference to the object.

NumPy arrays are not collections of objects. The array np.array([1, 2, 3]) is just a contiguous block of memory with bits set to represent those integer values. To get an integer from this array, a new Python object must be constructed in memory separate to the array. For instance, an object of np.int64 may be returned by the indexing operation: this object did not exist previously and had to be created.

Indexing complexity

Two other reasons why a[100][100] (getting from the list) is quicker than b[100,100] (getting from the array) are that:

  • The bytecode opcode BINARY_SUBSCR is executed when indexing both lists and arrays, but it is optimised for the case of Python lists.

  • The internal C function handling integer indexing for Python lists is very short and simple. On the other hand, NumPy indexing is much more complicated and a significant amount of code is executed to determine the type of indexing being used so that the correct value can be returned.

Below, the steps for accessing elements in a list and array with a[100][100] and b[100,100] are described in more detail.

Bytecode

The same four bytecode opcodes are triggered for both lists and arrays:

  0 LOAD_NAME                0 (a)           # the list or array
  3 LOAD_CONST               0 (100)         # index number (tuple for b[100,100])
  6 BINARY_SUBSCR                            # find correct "getitem" function
  7 RETURN_VALUE                             # value returned from list or array

Note: if you start chain indexing for multi-dimensional lists, e.g. a[100][100][100], you start to repeat these bytecode instructions. This does not happen for NumPy arrays using the tuple indexing: b[100,100,100] uses just the four instructions. This is why the gap in the timings begins to close as the number of dimensions increases.

Finding the correct "getitem" function

The functions for accessing lists and arrays are different and the correct one needs to be found in each case. This task is handled by the BINARY_SUBSCR opcode:

w = POP();                                            // our index
v = TOP();                                            // our list or NumPy array
if (PyList_CheckExact(v) && PyInt_CheckExact(w)) {    // do we have a list and an int?
    /* INLINE: list[int] */
    Py_ssize_t i = PyInt_AsSsize_t(w);
        if (i < 0)
             i += PyList_GET_SIZE(v);
        if (i >= 0 && i < PyList_GET_SIZE(v)) {
             x = PyList_GET_ITEM(v, i);               // call "getitem" for lists
             Py_INCREF(x);
        }
        else
            goto slow_get;
     }
     else
       slow_get:
         x = PyObject_GetItem(v, w);                  // else, call another function
                                                      // to work out what is needed
     Py_DECREF(v);
     Py_DECREF(w);
     SET_TOP(x);
     if (x != NULL) continue;
     break;

This code is optimised for Python lists. If the function sees a list, it will quickly call the function PyList_GET_ITEM. This list can now be accessed at the required index (see next section below).

However, if it doesn't see a list (e.g. we have a NumPy array), it takes the "slow_get" path. This in turn calls another function PyObject_GetItem to check which "getitem" function the object is mapped to:

PyObject_GetItem(PyObject *o, PyObject *key)
{
    PyMappingMethods *m;

    if (o == NULL || key == NULL)
        return null_error();

    m = o->ob_type->tp_as_mapping;
    if (m && m->mp_subscript)
        return m->mp_subscript(o, key);
    ...

In the case of NumPy arrays, the correct function is located in mp_subscript in the PyMappingMethods structure.

Notice the additional function calls before this correct "get" function can be called. These calls add to the overhead for b[100], although how much will depend on how Python/NumPy was compiled, the system architecture, and so on.

Getting from a Python list

Above it was seen that the function PyList_GET_ITEM is called. This is a short function that essentially looks like this*:

PyList_GetItem(PyObject *op, Py_ssize_t i)
{
    if (!PyList_Check(op)) {                            // check if list
        PyErr_BadInternalCall();
        return NULL;
    }
    if (i < 0 || i >= Py_SIZE(op)) {                    // check i is in range
        if (indexerr == NULL) {
            indexerr = PyUnicode_FromString(
                "list index out of range");
            if (indexerr == NULL)
                return NULL;
        }
        PyErr_SetObject(PyExc_IndexError, indexerr);
        return NULL;
    }
    return ((PyListObject *)op) -> ob_item[i];           // return reference to object
}

* PyList_GET_ITEM is actually the macro form of this function which does the same thing, minus error checking.

This means that getting the item at index i of a Python list is relatively simple. Internally, Python checks whether the type of the item being is a list, whether i is in the correct range for the list, and then returns the reference to the object in the list.

Getting from a NumPy array

In contrast, NumPy has to do much more work before the value at the requested index can be returned.

Arrays can be indexed in a variety of different ways and NumPy has to decide which index routine is needed. The various indexing routines are handled largely by code in mapping.c.

Anything used to index NumPy arrays passes through the function prepare_index which begins the parsing of the index and stores the information about broadcasting, number of dimensions, and so on. Here is the call signature for the function:

NPY_NO_EXPORT int
prepare_index(PyArrayObject *self, PyObject *index,
              npy_index_info *indices,
              int *num, int *ndim, int *out_fancy_ndim, int allow_boolean)

 /* @param the array being indexed
  * @param the index object
  * @param index info struct being filled (size of NPY_MAXDIMS * 2 + 1)
  * @param number of indices found
  * @param dimension of the indexing result
  * @param dimension of the fancy/advanced indices part
  * @param whether to allow the boolean special case 
  */

The function has to do a lot of checks. Even for a relatively simple index such as b[100,100], a lot of information has to be inferred so that NumPy can return a reference (view) to the correct value.

In conclusion, it takes longer for the "getitem" function for NumPy to be found and the functions handling the indexing of arrays are necessarily more complex than the single function for Python lists.

这篇关于Numpy 单个元素访问比列表慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆