高效地将列表的不均匀列表转换为最小填充数组,并用nan填充 [英] efficiently convert uneven list of lists to minimal containing array padded with nan

查看:72
本文介绍了高效地将列表的不均匀列表转换为最小填充数组,并用nan填充的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑列表l

l = [[1, 2, 3], [1, 2]]

如果将其转换为np.array,我将得到一个一维对象数组,第一个位置为[1, 2, 3],第二个位置为[1, 2].

if I convert this to a np.array I'll get a one dimensional object array with [1, 2, 3] in the first position and [1, 2] in the second position.

print(np.array(l))

[[1, 2, 3] [1, 2]]

我想要这个

print(np.array([[1, 2, 3], [1, 2, np.nan]]))

[[  1.   2.   3.]
 [  1.   2.  nan]]


我可以使用循环来做到这一点,但是我们都知道循环是不受欢迎的


I can do this with a loop, but we all know how unpopular loops are

def box_pir(l):
    lengths = [i for i in map(len, l)]
    shape = (len(l), max(lengths))
    a = np.full(shape, np.nan)
    for i, r in enumerate(l):
        a[i, :lengths[i]] = r
    return a

print(box_pir(l))

[[  1.   2.   3.]
 [  1.   2.  nan]]


如何快速,矢量化地完成此操作?


how do I do this in a fast, vectorized way?

定时

timing

设置功能

setup functions

%%cython
import numpy as np

def box_pir_cython(l):
    lengths = [len(item) for item in l]
    shape = (len(l), max(lengths))
    a = np.full(shape, np.nan)
    for i, r in enumerate(l):
        a[i, :lengths[i]] = r
    return a


def box_divikar(v):
    lens = np.array([len(item) for item in v])
    mask = lens[:,None] > np.arange(lens.max())
    out = np.full(mask.shape, np.nan)
    out[mask] = np.concatenate(v)
    return out

def box_hpaulj(LoL):
    return np.array(list(zip_longest(*LoL, fillvalue=np.nan))).T

def box_simon(LoL):
    max_len = len(max(LoL, key=len))
    return np.array([x + [np.nan]*(max_len-len(x)) for x in LoL])

def box_dawg(LoL):
    cols=len(max(LoL, key=len))
    rows=len(LoL)
    AoA=np.empty((rows,cols, ))
    AoA.fill(np.nan)
    for idx in range(rows):
        AoA[idx,0:len(LoL[idx])]=LoL[idx]
    return AoA

def box_pir(l):
    lengths = [len(item) for item in l]
    shape = (len(l), max(lengths))
    a = np.full(shape, np.nan)
    for i, r in enumerate(l):
        a[i, :lengths[i]] = r
    return a

def box_pandas(l):
    return pd.DataFrame(l).values

推荐答案

这似乎与 this question ,其中填充使用zeros而不是NaNs.有趣的方法以及基于broadcastingboolean-indexing mine 发布于此.因此,我只需要修改帖子中的一行即可解决这种情况-

This seems to be a close one of this question, where the padding was with zeros instead of NaNs. Interesting approaches were posted there, along with mine based on broadcasting and boolean-indexing. So, I would just modify one line from my post there to solve this case like so -

def boolean_indexing(v, fillval=np.nan):
    lens = np.array([len(item) for item in v])
    mask = lens[:,None] > np.arange(lens.max())
    out = np.full(mask.shape,fillval)
    out[mask] = np.concatenate(v)
    return out

样品运行-

In [32]: l
Out[32]: [[1, 2, 3], [1, 2], [3, 8, 9, 7, 3]]

In [33]: boolean_indexing(l)
Out[33]: 
array([[  1.,   2.,   3.,  nan,  nan],
       [  1.,   2.,  nan,  nan,  nan],
       [  3.,   8.,   9.,   7.,   3.]])

In [34]: boolean_indexing(l,-1)
Out[34]: 
array([[ 1,  2,  3, -1, -1],
       [ 1,  2, -1, -1, -1],
       [ 3,  8,  9,  7,  3]])

对于该Q& A上所有已发布的方法,我在其中几乎未发布运行时结果,这可能很有用.

I have posted few runtime results there for all the posted approaches on that Q&A, which could be useful.

这篇关于高效地将列表的不均匀列表转换为最小填充数组,并用nan填充的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆