numpy.ma(掩码)数组均值方法的返回类型不一致 [英] numpy.ma (masked) array mean method has inconsitent return type
问题描述
我注意到 numpy掩码数组均值方法在可能不应该的情况下返回不同的类型:
I noticed that the numpy masked-array mean method returns different types when it probably should not:
import numpy as np
A = np.ma.masked_equal([1,1,0], value=0)
B = np.ma.masked_equal([1,1,1], value=0) # no masked values
type(A.mean())
#numpy.float64
type(B.mean())
#numpy.ma.core.MaskedArray
其他numpy.ma.core.MaskedArray
方法似乎是一致的
type( A.sum()) == type(B.sum())
# True
type( A.prod()) == type(B.prod())
# True
type( A.std()) == type(B.std())
# True
type( A.mean()) == type(B.mean())
# False
有人可以解释吗?
更新:正如评论中指出的那样
UPDATE: As pointed out in the comments
C = np.ma.masked_array([1, 1, 1], mask=[False, False, False])
type(C.mean()) == type(A.mean())
# True
推荐答案
B.mask
开头为:
if self._mask is nomask:
result = super(MaskedArray, self).mean(axis=axis, dtype=dtype)
np.ma.nomask
是False
.
您的B
就是这种情况:
masked_array(data = [1 1 1],
mask = False,
fill_value = 0)
对于A
,遮罩是一个与data
大小匹配的数组.在B
中,它是一个标量,False
,而mean
将其作为特殊情况处理.
For A
the mask is an array that matches the data
in size. In B
it is a scalar, False
, and mean
is handling that as a special case.
我需要进一步挖掘以了解其含义.
I need to dig a bit more to see what this implies.
In [127]: np.mean(B)
Out[127]:
masked_array(data = 1.0,
mask = False,
fill_value = 0)
In [141]: super(np.ma.MaskedArray,B).mean()
Out[141]:
masked_array(data = 1.0,
mask = False,
fill_value = 0)
我不确定是否有帮助;在np.ndarray
方法与np
函数以及np.ma
方法之间存在一些循环引用,这使得很难准确地确定正在使用的代码.就像它正在使用编译的mean
方法一样,但是如何处理遮罩并不清楚.
I'm not sure that helps; there's some circular referencing between np.ndarray
methods and the np
function and the np.ma
methods, that makes it hard to identify exactly what code is being used. It like it is using the compiled mean
method, but it isn't obvious how that handles the masking.
我想知道是否要使用
np.mean(B.data) # or
B.data.mean()
和super
方法获取不是正确的方法.
and the super
method fetch isn't the right approach.
在任何情况下,相同的数组但带有矢量掩码将返回标量.
In any case, the same array, but with a vector mask returns the scalar.
In [132]: C
Out[132]:
masked_array(data = [1 1 1],
mask = [False False False],
fill_value = 0)
In [133]: C.mean()
Out[133]: 1.0
===================
====================
在没有nomask
快捷方式的情况下尝试此方法,之后会引发错误
Trying this method without the nomask
shortcut, raises an error after
dsum = self.sum(axis=axis, dtype=dtype)
cnt = self.count(axis=axis)
if cnt.shape == () and (cnt == 0):
result = masked
else:
result = dsum * 1. / cnt
self.count
在nomask
情况下返回标量,但在常规遮罩中返回np.int32
.所以cnt.shape
扼流圈.
self.count
returns a scalar in the nomask
case, but a np.int32
in the regular masking. So the cnt.shape
chokes.
trace
是尝试此super(MaskedArray...)
快捷方式"的唯一其他屏蔽方法.均码显然有些困惑.
trace
is the only other masked method that tries this super(MaskedArray...)
'shortcut'. There's clearly something kludgy about the mean code.
===================
====================
相关的错误问题: https://github.com/numpy/numpy/issues/5769
According to that the same question was raised here last year: Testing equivalence of means of Numpy MaskedArray instances raises attribute error
看起来有很多掩盖问题,而不仅仅是mean
.现在或不久的将来,开发母版中可能已有修复程序.
Looks like there are a lot of masking issues, not just with mean
. There may be fixes in the development master now, or in the near future.
这篇关于numpy.ma(掩码)数组均值方法的返回类型不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!