用于在相同长度的1d numpy数组上评估函数的1维数组的高效算法 [英] Efficient algorithm for evaluating a 1-d array of functions on a same-length 1d numpy array

查看:86
本文介绍了用于在相同长度的1d numpy数组上评估函数的1维数组的高效算法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个k个不同的函数的(大)length-N数组和一个abcissa的length-N数组.我想对abcissa处的函数求值,以返回长度为N的纵坐标数组,而关键的是,我需要非常快地完成它.

I have a (large) length-N array of k distinct functions, and a length-N array of abcissa. I want to evaluate the functions at the abcissa to return a length-N array of ordinates, and critically, I need to do it very fast.

我已经尝试过对调用np.where的以下循环,这太慢了:

I have tried the following loop over a call to np.where, which is too slow:

创建一些虚假数据来说明问题:

Create some fake data to illustrate the problem:

def trivial_functional(i): return lambda x : i*x
k = 250
func_table = [trivial_functional(j) for j in range(k)]
func_table = np.array(func_table) # possibly unnecessary

我们有一个包含250个不同功能的表格.现在,我创建了一个大型数组,其中包含这些函数的许多重复条目,以及应该对这些函数进行求值的长度相同的一组点.

We have a table of 250 distinct functions. Now I create a large array with many repeated entries of those functions, and a set of points of the same length at which these functions should be evaluated.

Npts = 1e6
abcissa_array = np.random.random(Npts)
function_indices = np.random.random_integers(0,len(func_table)-1,Npts)
func_array = func_table[function_indices]

最后,遍历数据使用的每个函数,并根据一组相关点对其进行评估:

Finally, loop over every function used by the data and evaluate it on the set of relevant points:

desired_output = np.zeros(Npts)
for func_index in set(function_indices):
    idx = np.where(function_indices==func_index)[0]
    desired_output[idx] = func_table[func_index](abcissa_array[idx])

在我的笔记本电脑上,此循环大约需要0.35秒,这是我的代码中最大的瓶颈(数量级).

This loop takes ~0.35 seconds on my laptop, the biggest bottleneck in my code by an order of magnitude.

有人看到如何避免对np.where的盲查找吗?是否有巧妙使用numba可以加快循环的速度?

Does anyone see how to avoid the blind lookup call to np.where? Is there a clever use of numba that can speed this loop up?

推荐答案

这几乎与您的(出色!)自我回答具有相同的作用,但所需的rigamarole少了一些.在我的机器上似乎也快了一点-基于粗略的测试. >

This does almost the same thing as your (excellent!) self-answer, but with a bit less rigamarole. It seems marginally faster on my machine as well -- about 30ms based on a cursory test.

def apply_indexed_fast(array, func_indices, func_table):
    func_argsort = func_indices.argsort()
    func_ranges = list(np.searchsorted(func_indices[func_argsort], range(len(func_table))))
    func_ranges.append(None)
    out = np.zeros_like(array)
    for f, start, end in zip(func_table, func_ranges, func_ranges[1:]):
        ix = func_argsort[start:end]
        out[ix] = f(array[ix])
    return out

与您一样,

这会将argsort索引序列拆分为多个块,每个索引对应于func_table中的一个函数.然后,它使用每个块为相应功能选择输入和输出索引.为了确定块边界,它使用np.searchsorted而不是np.unique,其中searchsorted(a, b)可以被视为二进制搜索算法,该算法返回a中第一个值的索引等于或大于给定值b中的一个或多个值.

Like yours, this splits a sequence of argsort indices into chunks, each of which corresponds to a function in func_table. It then uses each chunk to select input and output indices for its corresponding function. To determine the chunk boundaries, it uses np.searchsorted instead of np.unique -- where searchsorted(a, b) could be thought of as a binary search algorithm that returns the index of the first value in a equal to or greater than the given value or values in b.

然后zip函数简单地并行迭代其参数,从每个参数中返回单个项目,收集到一个元组中,然后将它们串在一起成为一个列表. (因此,zip([1, 2, 3], ['a', 'b', 'c'], ['b', 'c', 'd'])返回[(1, 'a', 'b'), (2, 'b', 'c'), (3, 'c', 'd')].)这与for语句的内置功能解包"这些元组一起,允许使用一种简洁但富有表现力的方式来并行地迭代多个序列.

Then the zip function simply iterates over its arguments in parallel, returning a single item from each one, collected together in a tuple, and stringing those together into a list. (So zip([1, 2, 3], ['a', 'b', 'c'], ['b', 'c', 'd']) returns [(1, 'a', 'b'), (2, 'b', 'c'), (3, 'c', 'd')].) This, along with the for statement's built-in ability to "unpack" those tuples, allows for a terse but expressive way to iterate over multiple sequences in parallel.

在这种情况下,我用它遍历了func_tables中的功能以及两个func_ranges的不同步副本.这样可以确保end变量中func_ranges中的项目始终比start变量中的项目领先一步.通过将None附加到func_ranges,可以确保最终块得到正确处理-当zip的任何一个参数用完项目时,zip就会停止,这会切断序列中的最终值.方便地,None值还可以用作开放式切片索引!

In this case, I've used it to iterate over the functions in func_tables alongside two out-of-sync copies of func_ranges. This ensures that the item from func_ranges in the end variable is always one step ahead of the item in the start variable. By appending None to func_ranges, I ensure that the final chunk is handled gracefully -- zip stops when any one of its arguments runs out of items, which cuts off the final value in the sequence. Conveniently, the None value also serves as an open-ended slice index!

另一个做同样事情的技巧需要更多行,但具有较低的内存开销,尤其是与zipitertools等效时,

Another trick that does the same thing requires a few more lines, but has lower memory overhead, especially when used with the itertools equivalent of zip, izip:

range_iter_a = iter(func_ranges)   # create generators that iterate over the 
range_iter_b = iter(func_ranges)   # values in `func_ranges` without making copies
next(range_iter_b, None)           # advance the second generator by one
for f, start, end in itertools.izip(func_table, range_iter_a, range_iter_b):
    ...

但是,这些基于低开销生成器的方法有时会比普通列表慢一些.另外,请注意,在Python 3中,zip的行为更像izip.

However, these low-overhead generator-based approaches can sometimes be a bit slower than vanilla lists. Also, note that in Python 3, zip behaves more like izip.

这篇关于用于在相同长度的1d numpy数组上评估函数的1维数组的高效算法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆