用索引的NumPy数组切片Python列表-有什么快速的方法吗? [英] Slicing a Python list with a NumPy array of indices -- any fast way?
问题描述
我有一个称为a
的常规list
和一个索引b
的NumPy数组.
(不,我无法将a
转换为NumPy数组.)
I have a regular list
called a
, and a NumPy array of indices b
.
(No, it is not possible for me to convert a
to a NumPy array.)
我有什么办法可以有效地达到与"a[b]
"相同的效果?明确地说,这意味着由于其性能影响,我不想提取b
中的每个单独的int
.
Is there any way for me to the same effect as "a[b]
" efficiently? To be clear, this implies that I don't want to extract every individual int
in b
due to its performance implications.
(是的,这是我的代码中的瓶颈.这就是为什么我开始使用NumPy数组的原因.)
(Yes, this is a bottleneck in my code. That's why I'm using NumPy arrays to begin with.)
推荐答案
编写cython函数:
Write a cython function:
import cython
from cpython cimport PyList_New, PyList_SET_ITEM, Py_INCREF
@cython.wraparound(False)
@cython.boundscheck(False)
def take(list alist, Py_ssize_t[:] arr):
cdef:
Py_ssize_t i, idx, n = arr.shape[0]
list res = PyList_New(n)
object obj
for i in range(n):
idx = arr[i]
obj = alist[idx]
PyList_SET_ITEM(res, i, alist[idx])
Py_INCREF(obj)
return res
%timeit的结果:
The result of %timeit:
import numpy as np
al= list(range(10000))
aa = np.array(al)
ba = np.random.randint(0, len(a), 10000)
bl = ba.tolist()
%timeit [al[i] for i in bl]
%timeit np.take(aa, ba)
%timeit take(al, ba)
1000 loops, best of 3: 1.68 ms per loop
10000 loops, best of 3: 51.4 µs per loop
1000 loops, best of 3: 254 µs per loop
如果两个参数都是ndarray对象,则
numpy.take()
最快. cython版本比列表理解速度快5倍.
numpy.take()
is the fastest if both of the arguments are ndarray object. The cython version is 5x faster than list comprehension.
这篇关于用索引的NumPy数组切片Python列表-有什么快速的方法吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!