如何在Python中忽略nan的情况下输出NumPy 2D数组的最后一列元素? [英] How to output last column element of NumPy 2D array ignoring nan in Python?
问题描述
我有一个NumPy 2D数组,如下所示:
I have a NumPy 2D array as shown below:
data.dat
X1 X2 X3 X4
1 1 1 1
2 2 4 2
3 3 9 3
4 4 16 4
5 5 25 5
6 6 36 6
7 nan 49 7
8 nan 64 8
9 nan 81 nan
10 nan nan nan
现在,我如何输出每列的最后一个元素,而忽略数组中的nan.我尝试了失败的代码:
Now how do I output the last element of each column ignoring nan in the array. I tried without success the code:
A[~np.isnan(A)][-1]
使用的代码
import numpy as np
with open('data.dat', "r") as data:
while True:
line = data.readline()
if not line.startswith('#'):
break
data_header = [i for i in line.strip().split('\t') if i]
A = np.genfromtxt('data.dat', names = data_header, dtype = float, delimiter = '\t')
推荐答案
如果A是dtype'float'的普通NumPy数组(而不是结构化数组) 那么你可以使用
If A were a plain NumPy array of dtype 'float' (instead of a structured array) then you could use
import numpy as np
nan = np.nan
A = np.array([[ 1., 1., 1., 1.],
[ 2., 2., 4., 2.],
[ 3., 3., 9., 3.],
[ 4., 4., 16., 4.],
[ 5., 5., 25., 5.],
[ 6., 6., 36., 6.],
[ 7., nan, 49., 7.],
[ 8., nan, 64., 8.],
[ 9., nan, 81., nan],
[ 10., nan, nan, nan]])
print(A[(~np.isnan(A)).cumsum(axis=0).argmax(axis=0), np.arange(A.shape[1])])
产生
array([ 10., 6., 81., 8.])
给出结构化数组,例如
import numpy as np
with open('data.dat', "r") as data:
# per Padraic Cunningham's suggestion
A = np.genfromtxt("data.dat", names=True, delimiter = '\t')
我认为获得所需结果的最简单方法是查看结构化数组为dtype'float'的普通NumPy数组:
I think the easiest way to obtain the desired result is to view the structured array as a plain NumPy array of dtype 'float':
B = A.view('float').reshape(A.shape[0], -1)
,然后像以前一样继续进行操作:
and then proceed as before:
print(B[(~np.isnan(B)).cumsum(axis=0).argmax(axis=0), np.arange(B.shape[1])])
工作方式:
给出dtype'float'的普通NumPy数组,例如
Given a plain NumPy array of dtype 'float', such as
In [357]: B
Out[357]:
array([[ nan, nan, nan, nan],
[ 1., 1., 1., 1.],
[ 2., 2., 4., 2.],
[ 3., 3., 9., 3.],
[ 4., 4., 16., 4.],
[ 5., 5., 25., 5.],
[ 6., 6., 36., 6.],
[ 7., nan, 49., 7.],
[ 8., nan, 64., 8.],
[ 9., nan, 81., nan],
[ 10., nan, nan, nan]])
我们可以使用np.isnan
查找非nan值在哪里:
we can use np.isnan
to find where the non-nan values are:
In [358]: ~np.isnan(B)
Out[358]:
array([[False, False, False, False],
[ True, True, True, True],
[ True, True, True, True],
[ True, True, True, True],
[ True, True, True, True],
[ True, True, True, True],
[ True, True, True, True],
[ True, False, True, True],
[ True, False, True, True],
[ True, False, True, False],
[ True, False, False, False]], dtype=bool)
现在,我们可以使用cumsum
计算每列的累加和(False被视为0,True被视为1):
Now we can use cumsum
to compute a cumulative sum for each column (False is treated as 0, True as 1):
In [359]: (~np.isnan(B)).cumsum(axis=0)
Out[359]:
array([[ 0, 0, 0, 0],
[ 1, 1, 1, 1],
[ 2, 2, 2, 2],
[ 3, 3, 3, 3],
[ 4, 4, 4, 4],
[ 5, 5, 5, 5],
[ 6, 6, 6, 6],
[ 7, 6, 7, 7],
[ 8, 6, 8, 8],
[ 9, 6, 9, 8],
[10, 6, 9, 8]])
请注意,每列中的最大值是由于 每列中最后一个True.
Notice that the maximum value in each column is due to the value achieved by the last True in each column.
因此,我们可以使用np.argmax
找到每列中最大值的第一个出现的索引:
Therefore, we can find the index corresponding the the first occurrance of the maximum value in each column by using np.argmax
:
In [360]: (~np.isnan(B)).cumsum(axis=0).argmax(axis=0)
Out[360]: array([10, 6, 9, 8])
这给出了每一列的行索引号. 要在数组中找到相应的值,我们可以使用:
This gives the row index number for each column. To find the corresponding value in the array we could then use:
In [361]: B[(~np.isnan(B)).cumsum(axis=0).argmax(axis=0), np.arange(B.shape[1])]
Out[361]: array([ 10., 6., 81., 8.])
这篇关于如何在Python中忽略nan的情况下输出NumPy 2D数组的最后一列元素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!