当nan序列>时,删除数组的一部分.连续20个 [英] remove part of an array when nan sequence > 20 in a row

查看:72
本文介绍了当nan序列>时,删除数组的一部分.连续20个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我可以使用掩码或

y = y[~np.isnan(x)]
x = x[~np.isnan(x)]

现在,仅在有很多零件(例如连续20个NaN)的情况下才需要删除零件. 有谁知道如何处理这个问题?

Now, I need only remove parts when there are many (let's say 20 NaNs in a row). Does anyone know how to handle this issue?

推荐答案

这个问题有点含糊不清,但是无论如何,最好同时回答这两个版本.我不确定您是要删除一维数据上连续的NaN超过20个的节,还是要从2D数据中删除行以使NaN超过20个(任何位置),在行中.泰已经回答了后者.所以我会回答前者.

There's a bit of ambiguity in the question, but regardless, it'll be nice to answer both versions. I'm not sure if you meant that you need to remove sections where there are more than 20 consecutive NaNs on 1D data, or if you meant that you need to remove rows from 2D data such that there are more than 20 NaNs (anywhere) in the row. The latter has already been answered by Tai, so I'll answer the former.

这里的想法是找出NaN所处的索引,然后将这些索引分组为连续出现的条纹,过滤出长度不够长的条纹,最后用剩余的条纹/索引(行).

The idea here is to find out what indices the NaNs are at, and then group these indices into streaks where they occur consecutively, filter out the streaks that aren't long enough, and finally construct a mask with the remaining streaks/indices (whew).

import numpy as np

# Construct some test data
x = np.arange(150, dtype=np.float)
x[20:50] = np.NaN # remove this streak                                                                                                                                                                      
x[70:80] = np.NaN # keep this streak                                                                                                                                                                        
x[105:140] = np.NaN # remove this streak                                                                                                                                                                    
x[149] = np.NaN # keep this lone soldier                                                                                                                                                                    
print("Original (with long streaks): ", x)

# Calculate streaks, filter out streaks that are too short, apply global mask
nan_spots = np.where(np.isnan(x))
diff = np.diff(nan_spots)[0]
streaks = np.split(nan_spots[0], np.where(diff != 1)[0]+1)
long_streaks = set(np.hstack([streak for streak in streaks if len(streak) > 20]))
mask = [item not in long_streaks for item in range(len(x))]
print("Filtered (without long streaks): ", x[mask])

assert len(x[mask]) == len(x) - (50 - 20) - (140-105)

输出:

Original (with long streaks):  [  0.   1.   2.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.
  14.  15.  16.  17.  18.  19.  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan  nan  nan  nan  nan  50.  51.  52.  53.  54.  55.
  56.  57.  58.  59.  60.  61.  62.  63.  64.  65.  66.  67.  68.  69.
  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  80.  81.  82.  83.
  84.  85.  86.  87.  88.  89.  90.  91.  92.  93.  94.  95.  96.  97.
  98.  99. 100. 101. 102. 103. 104.  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan  nan
 140. 141. 142. 143. 144. 145. 146. 147. 148.  nan]

Filtered (without long streaks):  [  0.   1.   2.   3.   4.   5.   6.   7.   8.   9.  10.  11.  12.  13.
  14.  15.  16.  17.  18.  19.  50.  51.  52.  53.  54.  55.  56.  57.
  58.  59.  60.  61.  62.  63.  64.  65.  66.  67.  68.  69.  nan  nan
  nan  nan  nan  nan  nan  nan  nan  nan  80.  81.  82.  83.  84.  85.
  86.  87.  88.  89.  90.  91.  92.  93.  94.  95.  96.  97.  98.  99.
 100. 101. 102. 103. 104. 140. 141. 142. 143. 144. 145. 146. 147. 148.
  nan]

如果需要,只需将相同的蒙版应用于y(即y = y[mask]).您可以将其推广到许多维度数据,但必须选择要沿其查找连续NaN的轴.

And if need be, just apply the same mask to y (i.e. y = y[mask]). You can generalize this to many dimensional data, but you'll have to pick the axis you want to find the consecutive NaNs along.

这篇关于当nan序列>时,删除数组的一部分.连续20个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆