将索引列表转换为2D numpy数组的最快方法 [英] Fastest way to convert a list of indices to 2D numpy array of ones

查看:92
本文介绍了将索引列表转换为2D numpy数组的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个索引列表

a = [
  [1,2,4],
  [0,2,3],
  [1,3,4],
  [0,2]]

最快的方法是将其转换为一个numpy的数组,其中每个索引都显示1的位置?

What's the fastest way to convert this to a numpy array of ones, where each index shows the position where 1 would occur?

即我想要的是:

output = array([
  [0,1,1,0,1],
  [1,0,1,1,0],
  [0,1,0,1,1],
  [1,0,1,0,0]])

我事先知道数组的最大大小.我知道我可以遍历每个列表,并在每个索引位置插入1,但是有没有一种更快/矢量化的方法来做到这一点?

I know the max size of the array beforehand. I know I could loop through each list and insert a 1 into at each index position, but is there a faster/vectorized way to do this?

我的用例可能有成千上万的行/列,而我需要这样做数千次,所以速度越快越好.

My use case could have thousands of rows/cols and I need to do this thousands of times, so the faster the better.

推荐答案

这是怎么回事:

ncol = 5
nrow = len(a)
out = np.zeros((nrow, ncol), int)
out[np.arange(nrow).repeat([*map(len,a)]), np.concatenate(a)] = 1
out
# array([[0, 1, 1, 0, 1],
#        [1, 0, 1, 1, 0],
#        [0, 1, 0, 1, 1],
#        [1, 0, 1, 0, 0]])

以下是1000x1000二进制数组的计时,请注意,我使用了上面的优化版本,请参见下面的函数pp:

Here are timings for a 1000x1000 binary array, note that I use an optimized version of the above, see function pp below:

pp 21.717635259992676 ms
ts 37.10938713003998 ms
u9 37.32933565042913 ms

产生计时的代码:

import itertools as it
import numpy as np

def make_data(n,m):
    I,J = np.where(np.random.random((n,m))<np.random.random((n,1)))
    return [*map(np.ndarray.tolist, np.split(J, I.searchsorted(np.arange(1,n))))]

def pp():
    sz = np.fromiter(map(len,a),int,nrow)
    out = np.zeros((nrow,ncol),int)
    out[np.arange(nrow).repeat(sz),np.fromiter(it.chain.from_iterable(a),int,sz.sum())] = 1
    return out

def ts():
    out = np.zeros((nrow,ncol),int)
    for i, ix in enumerate(a):
        out[i][ix] = 1
    return out

def u9():
    out = np.zeros((nrow,ncol),int)
    for i, (x, y) in enumerate(zip(a, out)):
        y[x] = 1
        out[i] = y
    return out

nrow,ncol = 1000,1000
a = make_data(nrow,ncol)

from timeit import timeit
assert (pp()==ts()).all()
assert (pp()==u9()).all()

print("pp", timeit(pp,number=100)*10, "ms")
print("ts", timeit(ts,number=100)*10, "ms")
print("u9", timeit(u9,number=100)*10, "ms")

这篇关于将索引列表转换为2D numpy数组的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆