NumPy:计算去除 NaN 的平均值 [英] NumPy: calculate averages with NaNs removed
本文介绍了NumPy:计算去除 NaN 的平均值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
如何沿矩阵计算矩阵平均值,但要从计算中删除 nan
值?(对于 R 人员,请考虑 na.rm = TRUE
).
How can I calculate matrix mean values along a matrix, but to remove nan
values from calculation? (For R people, think na.rm = TRUE
).
这是我的[非]工作示例:
Here is my [non-]working example:
import numpy as np
dat = np.array([[1, 2, 3],
[4, 5, np.nan],
[np.nan, 6, np.nan],
[np.nan, np.nan, np.nan]])
print(dat)
print(dat.mean(1)) # [ 2. nan nan nan]
去除 NaN 后,我的预期输出将是:
With NaNs removed, my expected output would be:
array([ 2., 4.5, 6., nan])
推荐答案
我认为你想要的是一个掩码数组:
I think what you want is a masked array:
dat = np.array([[1,2,3], [4,5,nan], [nan,6,nan], [nan,nan,nan]])
mdat = np.ma.masked_array(dat,np.isnan(dat))
mm = np.mean(mdat,axis=1)
print mm.filled(np.nan) # the desired answer
组合所有时间数据
from timeit import Timer
setupstr="""
import numpy as np
from scipy.stats.stats import nanmean
dat = np.random.normal(size=(1000,1000))
ii = np.ix_(np.random.randint(0,99,size=50),np.random.randint(0,99,size=50))
dat[ii] = np.nan
"""
method1="""
mdat = np.ma.masked_array(dat,np.isnan(dat))
mm = np.mean(mdat,axis=1)
mm.filled(np.nan)
"""
N = 2
t1 = Timer(method1, setupstr).timeit(N)
t2 = Timer("[np.mean([l for l in d if not np.isnan(l)]) for d in dat]", setupstr).timeit(N)
t3 = Timer("np.array([r[np.isfinite(r)].mean() for r in dat])", setupstr).timeit(N)
t4 = Timer("np.ma.masked_invalid(dat).mean(axis=1)", setupstr).timeit(N)
t5 = Timer("nanmean(dat,axis=1)", setupstr).timeit(N)
print 'Time: %f Ratio: %f' % (t1,t1/t1 )
print 'Time: %f Ratio: %f' % (t2,t2/t1 )
print 'Time: %f Ratio: %f' % (t3,t3/t1 )
print 'Time: %f Ratio: %f' % (t4,t4/t1 )
print 'Time: %f Ratio: %f' % (t5,t5/t1 )
返回:
Time: 0.045454 Ratio: 1.000000
Time: 8.179479 Ratio: 179.950595
Time: 0.060988 Ratio: 1.341755
Time: 0.070955 Ratio: 1.561029
Time: 0.065152 Ratio: 1.433364
这篇关于NumPy:计算去除 NaN 的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文