如何在Python中忽略nan的情况下输出NumPy 2D数组的最后一列元素? [英] How to output last column element of NumPy 2D array ignoring nan in Python?

查看:72
本文介绍了如何在Python中忽略nan的情况下输出NumPy 2D数组的最后一列元素?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个NumPy 2D数组,如下所示:

I have a NumPy 2D array as shown below:

data.dat

X1  X2  X3  X4
1   1   1   1
2   2   4   2
3   3   9   3
4   4   16  4
5   5   25  5
6   6   36  6
7   nan 49  7
8   nan 64  8
9   nan 81  nan
10  nan nan nan

现在,我如何输出每列的最后一个元素,而忽略数组中的nan.我尝试了失败的代码:

Now how do I output the last element of each column ignoring nan in the array. I tried without success the code:

A[~np.isnan(A)][-1]

使用的代码

import numpy as np
with open('data.dat', "r") as data:
    while True:
        line = data.readline()
        if not line.startswith('#'):
            break
    data_header = [i for i in line.strip().split('\t') if i]
A = np.genfromtxt('data.dat', names = data_header, dtype = float, delimiter = '\t')

推荐答案

如果A是dtype'float'的普通NumPy数组(而不是结构化数组) 那么你可以使用

If A were a plain NumPy array of dtype 'float' (instead of a structured array) then you could use

import numpy as np
nan = np.nan
A = np.array([[  1.,   1.,   1.,   1.],
              [  2.,   2.,   4.,   2.],
              [  3.,   3.,   9.,   3.],
              [  4.,   4.,  16.,   4.],
              [  5.,   5.,  25.,   5.],
              [  6.,   6.,  36.,   6.],
              [  7.,  nan,  49.,   7.],
              [  8.,  nan,  64.,   8.],
              [  9.,  nan,  81.,  nan],
              [ 10.,  nan,  nan,  nan]])

print(A[(~np.isnan(A)).cumsum(axis=0).argmax(axis=0), np.arange(A.shape[1])])

产生

array([ 10.,   6.,  81.,   8.])

给出结构化数组,例如

import numpy as np

with open('data.dat', "r") as data:
    # per Padraic Cunningham's suggestion
    A = np.genfromtxt("data.dat", names=True, delimiter = '\t')

我认为获得所需结果的最简单方法是查看结构化数组为dtype'float'的普通NumPy数组:

I think the easiest way to obtain the desired result is to view the structured array as a plain NumPy array of dtype 'float':

B = A.view('float').reshape(A.shape[0], -1)

,然后像以前一样继续进行操作:

and then proceed as before:

print(B[(~np.isnan(B)).cumsum(axis=0).argmax(axis=0), np.arange(B.shape[1])])


工作方式:

给出dtype'float'的普通NumPy数组,例如

Given a plain NumPy array of dtype 'float', such as

In [357]: B
Out[357]: 
array([[ nan,  nan,  nan,  nan],
       [  1.,   1.,   1.,   1.],
       [  2.,   2.,   4.,   2.],
       [  3.,   3.,   9.,   3.],
       [  4.,   4.,  16.,   4.],
       [  5.,   5.,  25.,   5.],
       [  6.,   6.,  36.,   6.],
       [  7.,  nan,  49.,   7.],
       [  8.,  nan,  64.,   8.],
       [  9.,  nan,  81.,  nan],
       [ 10.,  nan,  nan,  nan]])

我们可以使用np.isnan查找非nan值在哪里:

we can use np.isnan to find where the non-nan values are:

In [358]: ~np.isnan(B)
Out[358]: 
array([[False, False, False, False],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True, False,  True,  True],
       [ True, False,  True,  True],
       [ True, False,  True, False],
       [ True, False, False, False]], dtype=bool)

现在,我们可以使用cumsum计算每列的累加和(False被视为0,True被视为1):

Now we can use cumsum to compute a cumulative sum for each column (False is treated as 0, True as 1):

In [359]: (~np.isnan(B)).cumsum(axis=0)
Out[359]: 
array([[ 0,  0,  0,  0],
       [ 1,  1,  1,  1],
       [ 2,  2,  2,  2],
       [ 3,  3,  3,  3],
       [ 4,  4,  4,  4],
       [ 5,  5,  5,  5],
       [ 6,  6,  6,  6],
       [ 7,  6,  7,  7],
       [ 8,  6,  8,  8],
       [ 9,  6,  9,  8],
       [10,  6,  9,  8]])

请注意,每列中的最大值是由于 每列中最后一个True.

Notice that the maximum value in each column is due to the value achieved by the last True in each column.

因此,我们可以使用np.argmax找到每列中最大值的第一个出现的索引:

Therefore, we can find the index corresponding the the first occurrance of the maximum value in each column by using np.argmax:

In [360]: (~np.isnan(B)).cumsum(axis=0).argmax(axis=0)
Out[360]: array([10,  6,  9,  8])

这给出了每一列的行索引号. 要在数组中找到相应的值,我们可以使用:

This gives the row index number for each column. To find the corresponding value in the array we could then use:

In [361]: B[(~np.isnan(B)).cumsum(axis=0).argmax(axis=0), np.arange(B.shape[1])]
Out[361]: array([ 10.,   6.,  81.,   8.])

这篇关于如何在Python中忽略nan的情况下输出NumPy 2D数组的最后一列元素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆