如何选择处理潜在np.nan值的后续numpy数组 [英] How to select subsequent numpy arrays handling potential np.nan values

查看:75
本文介绍了如何选择处理潜在np.nan值的后续numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的系列:

s = pd.Series({10: np.array([[0.72260683, 0.27739317, 0.        ],
                         [0.7187053 , 0.2812947 , 0.        ],
                         [0.71435467, 0.28564533, 1.        ],
                         [0.3268072 , 0.6731928 , 0.        ],
                         [0.31941951, 0.68058049, 1.        ],
                         [0.31260015, 0.68739985, 0.        ]]), 
           20: np.array([[0.7022099 , 0.2977901 , 0.        ],
                         [0.6983866 , 0.3016134 , 0.        ],
                         [0.69411673, 0.30588327, 1.        ],
                         [0.33857735, 0.66142265, 0.        ],
                         [0.33244109, 0.66755891, 1.        ],
                         [0.32675582, 0.67324418, 0.        ]]), 
           20: np.array([[0.68811957, 0.34188043, 0.        ],
                         [0.68425783, 0.31574217, 0.        ],
                         [0.67994496, 0.32005504, 1.        ],
                         [0.34872593, 0.66127407, 1.        ],
                         [0.34276171, 0.65723829, 1.        ],
                         [0.33722803, 0.66277197, 0.        ]]),
           38: np.array([[0.68811957, 0.31188043, 0.        ],
                         [0.68425783, 0.31574217, 0.        ],
                         [0.67994496, 0.32005504, 1.        ],
                         [0.34872593, 0.65127407, 0.        ],
                         [0.34276171, 0.65723829, 1.        ],
                         [0.33722803, 0.66277197, 0.        ]]),
           np.nan: np.nan}
)

无论索引数组的最后一个元素的值是什么,我都想用np.array([1, 4, 1, 5])np.array([1, 4, 1, np.nan])返回np.nan的子集.我该怎么办?

I want to subset it with np.array([1, 4, 1, 5]) or np.array([1, 4, 1, np.nan]) returning np.nan no matter what the value is on the last element of indices array. How can I accomplish that?

请注意,我不能简单地删除系列的最后一个元素.

Please note that I can't simply remove last element of a Series.

推荐答案

您可以修改以前的答案,并删除缺少的值的Series,最后通过 Series.reindex (仅必要的Series唯一索引):

You can modify previous answer with remove missing values of Series and last add them by Series.reindex (only necessary unique index of Series):

#a = np.array([1, 4, 1, 5])
a = np.array([1, 4, 1, np.nan])

mask = s.notna()
b = np.array(s[mask].tolist())[np.arange(mask.sum()), a[mask].astype(int), 2]
print (b)
[0. 1. 0.]

c = pd.Series(b, index=s[mask].index).reindex(s.index)
print (c)
10.0    0.0
20.0    1.0
38.0    0.0
NaN     NaN
dtype: float64

如果不需要索引中的唯一值,请使用

If not unique values in index is necessary create unique MultiIndex with GroupBy.cumcount:

s = pd.Series({10: np.array([[0.72260683, 0.27739317, 0.        ],
                         [0.7187053 , 0.2812947 , 0.        ],
                         [0.71435467, 0.28564533, 1.        ],
                         [0.3268072 , 0.6731928 , 0.        ],
                         [0.31941951, 0.68058049, 1.        ],
                         [0.31260015, 0.68739985, 0.        ]]), 
           20: np.array([[0.7022099 , 0.2977901 , 0.        ],
                         [0.6983866 , 0.3016134 , 0.        ],
                         [0.69411673, 0.30588327, 1.        ],
                         [0.33857735, 0.66142265, 0.        ],
                         [0.33244109, 0.66755891, 1.        ],
                         [0.32675582, 0.67324418, 0.        ]]), 
           23: np.array([[0.68811957, 0.34188043, 0.        ],
                         [0.68425783, 0.31574217, 0.        ],
                         [0.67994496, 0.32005504, 1.        ],
                         [0.34872593, 0.66127407, 1.        ],
                         [0.34276171, 0.65723829, 1.        ],
                         [0.33722803, 0.66277197, 0.        ]]),
           38: np.array([[0.68811957, 0.31188043, 0.        ],
                         [0.68425783, 0.31574217, 0.        ],
                         [0.67994496, 0.32005504, 1.        ],
                         [0.34872593, 0.65127407, 0.        ],
                         [0.34276171, 0.65723829, 1.        ],
                         [0.33722803, 0.66277197, 0.        ]]),
           np.nan: np.nan}
).rename({23:20})

print (s)
10.0    [[0.72260683, 0.27739317, 0.0], [0.7187053, 0....
20.0    [[0.7022099, 0.2977901, 0.0], [0.6983866, 0.30...
20.0    [[0.68811957, 0.34188043, 0.0], [0.68425783, 0...
38.0    [[0.68811957, 0.31188043, 0.0], [0.68425783, 0...
NaN                                                   NaN
dtype: object


a = np.array([1, 4, 1, 2, np.nan])

s = s.to_frame('a').set_index(s.groupby(s.index).cumcount(), append=True)['a']
print (s)
10.0  0    [[0.72260683, 0.27739317, 0.0], [0.7187053, 0....
20.0  0    [[0.7022099, 0.2977901, 0.0], [0.6983866, 0.30...
      1    [[0.68811957, 0.34188043, 0.0], [0.68425783, 0...
38.0  0    [[0.68811957, 0.31188043, 0.0], [0.68425783, 0...
NaN   0                                                  NaN
Name: a, dtype: object


mask = s.notna()
b = np.array(s[mask].tolist())[np.arange(mask.sum()), a[mask].astype(int), 2]
print (b)
[0. 1. 0. 1.]

c = pd.Series(b, index=s[mask].index).reindex(s.index)
print (c)
10.0  0    0.0
20.0  0    1.0
      1    0.0
38.0  0    1.0
NaN   0    NaN
dtype: float64

最后一步,删除MultiIndex的帮助程序级别:

And in last step remove helper level of MultiIndex:

c = c.reset_index(level=-1, drop=True)
print (c)
10.0    0.0
20.0    1.0
20.0    0.0
38.0    1.0
NaN     NaN
dtype: float64

这篇关于如何选择处理潜在np.nan值的后续numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆