numpy:将由nans分隔的一维块数组拆分为块列表 [英] numpy: split 1D array of chunks separated by nans into a list of the chunks

查看:93
本文介绍了numpy:将由nans分隔的一维块数组拆分为块列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个numpy数组,其中只有一些值有效,其余为nan. 示例:

I have a numpy array with only some values being valid and the rest being nan. example:

[nan,nan, 1 , 2 , 3 , nan, nan, 10, 11 , nan, nan, nan, 23, 1, nan, 7, 8]

我想将其拆分为包含每次有效数据的大块列表.结果将是

I would like to split it into a list of chunks containing every time the valid data. The result would be

[[1,2,3], [10,11], [23,1], [7,8]]

我设法通过遍历数组,检查isfinite()并产生(开始,停止)索引来完成它.

I managed to get it done by iterating over the array, checking isfinite() and producing (start,stop) indexes.

但是...实在太慢了...

However... It is painfully slow...

您也许有更好的主意吗?

Do you perhaps have a better idea?

推荐答案

这里是另一种可能性:

import numpy as np
nan = np.nan

def using_clump(a):
    return [a[s] for s in np.ma.clump_unmasked(np.ma.masked_invalid(a))]

x = [nan,nan, 1 , 2 , 3 , nan, nan, 10, 11 , nan, nan, nan, 23, 1, nan, 7, 8]

In [56]: using_clump(x)
Out[56]: 
[array([ 1.,  2.,  3.]),
 array([ 10.,  11.]),
 array([ 23.,   1.]),
 array([ 7.,  8.])]


一些基准比较了using_clump和using_groupby:


Some benchmarks comparing using_clump and using_groupby:

import itertools as IT
groupby = IT.groupby
def using_groupby(a):
    return [list(v) for k,v in groupby(a,np.isfinite) if k]


In [58]: %timeit using_clump(x)
10000 loops, best of 3: 37.3 us per loop

In [59]: %timeit using_groupby(x)
10000 loops, best of 3: 53.1 us per loop

对于较大的阵列,性能甚至更好:

The performance is even better for larger arrays:

In [9]: x = x*1000
In [12]: %timeit using_clump(x)
100 loops, best of 3: 5.69 ms per loop

In [13]: %timeit using_groupby(x)
10 loops, best of 3: 60 ms per loop

这篇关于numpy:将由nans分隔的一维块数组拆分为块列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆