在dtype"object"数组上的np.isnan [英] np.isnan on arrays of dtype "object"

查看:231
本文介绍了在dtype"object"数组上的np.isnan的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用不同数据类型的numpy数组.我想知道在任何特定的数组中,哪些元素是NaN.通常,这就是np.isnan的作用.

I'm working with numpy arrays of different data types. I would like to know, of any particular array, which elements are NaN. Normally, this is what np.isnan is for.

但是,np.isnan对数据类型为object(或任何字符串数据类型)的数组不友好:

However, np.isnan isn't friendly to arrays of data type object (or any string data type):

>>> str_arr = np.array(["A", "B", "C"])
>>> np.isnan(str_arr)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Not implemented for this type

>>> obj_arr = np.array([1, 2, "A"], dtype=object)
>>> np.isnan(obj_arr)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

我想从这两个电话中得到的只是np.array([False, False, False]).我不能只是在对np.isnan的调用周围放置tryexcept TypeError并假定生成TypeError的任何数组都不包含NaN:毕竟,我想np.isnan(np.array([1, np.NaN, "A"]))返回.

What I would like to get out of these two calls is simply np.array([False, False, False]). I can't just put try and except TypeError around my call to np.isnan and assume that any array that generates a TypeError does not contain NaNs: after all, I'd like np.isnan(np.array([1, np.NaN, "A"])) to return np.array([False, True, False]).

我当前的解决方案是创建一个类型为np.float64的新数组,循环遍历原始数组的元素,然后try将该元素放入新数组中(如果失败,则将其保留为零),然后在新数组上调用np.isnan.但是,这当然很慢. (至少对于大型对象数组.)

My current solution is to make a new array, of type np.float64, loop through the elements of the original array, trying to put that element in the new array (and if it fails, leave it as zero) and then calling np.isnan on the new array. However, this is of course rather slow. (At least, for large object arrays.)

def isnan(arr):
    if isinstance(arr, np.ndarray) and (arr.dtype == object):
        # Create a new array of dtype float64, fill it with the same values as the input array (where possible), and
        # then call np.isnan on the new array. This way, np.isnan is only called once. (Much faster than calling it on
        # every element in the input array.)
        new_arr = np.zeros((len(arr),), dtype=np.float64)
        for idx in xrange(len(arr)):
            try:
                new_arr[idx] = arr[idx]
            except Exception:
                pass
        return np.isnan(new_arr)
    else:
        try:
            return np.isnan(arr)
        except TypeError:
            return False

此特定实现也仅适用于一维数组,我想不出一种体面的方法使for循环在任意数量的维上运行.

This particular implementation also only works for one-dimensional arrays, and I can't think of a decent way to make the for loop run over an arbitrary number of dimensions.

有没有一种更有效的方法来确定object型数组中的哪些元素是NaN?

Is there a more efficient way to figure out which elements in an object-type array are NaN?

我正在运行Python 2.7.10.

I'm running Python 2.7.10.

请注意,[x is np.nan for x in np.array([np.nan])]返回False:np.nan在内存中并不总是与其他np.nan相同.

Note that [x is np.nan for x in np.array([np.nan])] returns False: np.nan is not always the same object in memory as a different np.nan.

我不希望将 string "nan"视为与np.nan等效:我希望isnan(np.array(["nan"], dtype=object))返回np.array([False]).

I do not want the string "nan" to be considered equivalent to np.nan: I want isnan(np.array(["nan"], dtype=object)) to return np.array([False]).

多维性不是大问题. (一点点ravel-和-reshape修复都不会解决.:p)

The multi-dimensionality isn't a big issue. (It's nothing that a little ravel-and-reshapeing won't fix. :p)

依赖于is运算符来测试两个NaN的等效性的任何函数并不总是起作用. (如果您认为他们应该这样做,请问自己is运算符的实际作用!)

Any function that relies on the is operator to test equivalence of two NaNs isn't always going to work. (If you think they should, ask yourself what the is operator actually does!)

推荐答案

如果您愿意使用pandas库,则覆盖此情况的便捷函数是

If you are willing to use the pandas library, a handy function that cover this case is pd.isnull:

pandas.isnull(obj)[source]

检测缺失值(数字数组中为NaN,对象数组中为None/NaN)

Detect missing values (NaN in numeric arrays, None/NaN in object arrays)

这里是一个例子:

$ python
>>> import numpy   
>>> import pandas
>>> array = numpy.asarray(['a', float('nan')], dtype=object)
>>> pandas.isnull(array)
array([False,  True])

这篇关于在dtype"object"数组上的np.isnan的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆