在dtype"object"数组上的np.isnan [英] np.isnan on arrays of dtype "object"
问题描述
我正在使用不同数据类型的numpy数组.我想知道在任何特定的数组中,哪些元素是NaN.通常,这就是np.isnan
的作用.
I'm working with numpy arrays of different data types. I would like to know, of any particular array, which elements are NaN. Normally, this is what np.isnan
is for.
但是,np.isnan
对数据类型为object
(或任何字符串数据类型)的数组不友好:
However, np.isnan
isn't friendly to arrays of data type object
(or any string data type):
>>> str_arr = np.array(["A", "B", "C"])
>>> np.isnan(str_arr)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Not implemented for this type
>>> obj_arr = np.array([1, 2, "A"], dtype=object)
>>> np.isnan(obj_arr)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
我想从这两个电话中得到的只是np.array([False, False, False])
.我不能只是在对np.isnan
的调用周围放置try
和except TypeError
并假定生成TypeError
的任何数组都不包含NaN:毕竟,我想np.isnan(np.array([1, np.NaN, "A"]))
返回
What I would like to get out of these two calls is simply np.array([False, False, False])
. I can't just put try
and except TypeError
around my call to np.isnan
and assume that any array that generates a TypeError
does not contain NaNs: after all, I'd like np.isnan(np.array([1, np.NaN, "A"]))
to return np.array([False, True, False])
.
我当前的解决方案是创建一个类型为np.float64
的新数组,循环遍历原始数组的元素,然后try
将该元素放入新数组中(如果失败,则将其保留为零),然后在新数组上调用np.isnan
.但是,这当然很慢. (至少对于大型对象数组.)
My current solution is to make a new array, of type np.float64
, loop through the elements of the original array, try
ing to put that element in the new array (and if it fails, leave it as zero) and then calling np.isnan
on the new array. However, this is of course rather slow. (At least, for large object arrays.)
def isnan(arr):
if isinstance(arr, np.ndarray) and (arr.dtype == object):
# Create a new array of dtype float64, fill it with the same values as the input array (where possible), and
# then call np.isnan on the new array. This way, np.isnan is only called once. (Much faster than calling it on
# every element in the input array.)
new_arr = np.zeros((len(arr),), dtype=np.float64)
for idx in xrange(len(arr)):
try:
new_arr[idx] = arr[idx]
except Exception:
pass
return np.isnan(new_arr)
else:
try:
return np.isnan(arr)
except TypeError:
return False
此特定实现也仅适用于一维数组,我想不出一种体面的方法使for
循环在任意数量的维上运行.
This particular implementation also only works for one-dimensional arrays, and I can't think of a decent way to make the for
loop run over an arbitrary number of dimensions.
有没有一种更有效的方法来确定object
型数组中的哪些元素是NaN?
Is there a more efficient way to figure out which elements in an object
-type array are NaN?
我正在运行Python 2.7.10.
I'm running Python 2.7.10.
请注意,[x is np.nan for x in np.array([np.nan])]
返回False
:np.nan
在内存中并不总是与其他np.nan
相同.
Note that [x is np.nan for x in np.array([np.nan])]
returns False
: np.nan
is not always the same object in memory as a different np.nan
.
我不希望将 string "nan"
视为与np.nan
等效:我希望isnan(np.array(["nan"], dtype=object))
返回np.array([False])
.
I do not want the string "nan"
to be considered equivalent to np.nan
: I want isnan(np.array(["nan"], dtype=object))
to return np.array([False])
.
多维性不是大问题. (一点点ravel
-和-reshape
修复都不会解决.:p)
The multi-dimensionality isn't a big issue. (It's nothing that a little ravel
-and-reshape
ing won't fix. :p)
依赖于is
运算符来测试两个NaN的等效性的任何函数并不总是起作用. (如果您认为他们应该这样做,请问自己is
运算符的实际作用!)
Any function that relies on the is
operator to test equivalence of two NaNs isn't always going to work. (If you think they should, ask yourself what the is
operator actually does!)
推荐答案
If you are willing to use the pandas library, a handy function that cover this case is pd.isnull:
pandas.isnull(obj)[source]
检测缺失值(数字数组中为NaN,对象数组中为None/NaN)
Detect missing values (NaN in numeric arrays, None/NaN in object arrays)
这里是一个例子:
$ python
>>> import numpy
>>> import pandas
>>> array = numpy.asarray(['a', float('nan')], dtype=object)
>>> pandas.isnull(array)
array([False, True])
这篇关于在dtype"object"数组上的np.isnan的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!