具有一个falsey元素的numpy数组的真值似乎取决于dtype [英] Truth value of numpy array with one falsey element seems to depend on dtype

查看:80
本文介绍了具有一个falsey元素的numpy数组的真值似乎取决于dtype的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

import numpy as np
a = np.array([0])
b = np.array([None])
c = np.array([''])
d = np.array([' '])

为什么我们会有这种不一致的地方?

Why should we have this inconsistency:

>>> bool(a)
False
>>> bool(b)
False
>>> bool(c)
True
>>> bool(d)
False

推荐答案

对于具有一个元素的数组,该数组的真值由该元素的真值确定.

For arrays with one element, the array's truth value is determined by the truth value of that element.

主要要点是np.array([''])不是不是包含一个空Python字符串的数组.创建此数组的目的是保存每个正好一个字节的字符串,并且NumPy填充的字符串太短(带有null字符).这意味着该数组等于np.array(['\0']).

The main point to make is that np.array(['']) is not an array containing one empty Python string. This array is created to hold strings of exactly one byte each and NumPy pads strings that are too short with the null character. This means that the array is equal to np.array(['\0']).

在这方面,NumPy与将bool('\0')评估为True的Python保持一致.

In this regard, NumPy is being consistent with Python which evaluates bool('\0') as True.

实际上,NumPy数组中唯一为False的字符串是不包含任何非空格字符的字符串('\0'不是空格字符).

In fact, the only strings which are False in NumPy arrays are strings which do not contain any non-whitespace characters ('\0' is not a whitespace character).

此布尔值评估的详细信息如下.

Details of this Boolean evaluation are presented below.

导航NumPy的迷宫式源代码并不总是那么容易,但是我们可以在

Navigating NumPy's labyrinthine source code is not always easy, but we can find the code governing how values in different datatypes are mapped to Boolean values in the arraytypes.c.src file. This will explain how bool(a), bool(b), bool(c) and bool(d) are determined.

在获取该文件中的代码之前,我们可以看到在NumPy数组上调用bool()会调用内部

Before we get to the code in that file, we can see that calling bool() on a NumPy array invokes the internal _array_nonzero() function. If the array is empty, we get False. If there are two or more elements we get an error. But if the array has exactly one element, we hit the line:

return PyArray_DESCR(mp)->f->nonzero(PyArray_DATA(mp), mp);

现在, PyArray_DESCR 是保存数组各种属性的结构. f是指向另一个结构

Now, PyArray_DESCR is a struct holding various properties for the array. f is a pointer to another struct PyArray_ArrFuncs that holds the array's nonzero function. In other words, NumPy is going to call upon the array's own special nonzero function to check the Boolean value of that one element.

确定元素是否为非零显然取决于元素的数据类型.可以在

Determining whether an element is nonzero or not is obviously going to depend on the datatype of the element. The code implementing the type-specific nonzero functions can be found in the "nonzero" section of the arraytypes.c.src file.

正如我们期望的那样,如果浮点数,整数和复数为

As we'd expect, floats, integers and complex numbers are False if they're equal with zero. This explains bool(a). In the case of object arrays, None is similarly going to be evaluated as False because NumPy just calls the PyObject_IsTrue function. This explains bool(b).

要了解bool(c)bool(d)的结果,我们看到用于字符串类型数组的nonzero函数已映射到

To understand the results of bool(c) and bool(d), we see that the nonzero function for string type arrays is mapped to the STRING_nonzero function:

static npy_bool
STRING_nonzero (char *ip, PyArrayObject *ap)
{
    int len = PyArray_DESCR(ap)->elsize; // size of dtype (not string length)
    int i;
    npy_bool nonz = NPY_FALSE;

    for (i = 0; i < len; i++) {
        if (!Py_STRING_ISSPACE(*ip)) {   // if it isn't whitespace, it's True
            nonz = NPY_TRUE;
            break;
        }
        ip++;
    }
    return nonz;
}

(Unicode大小写或多或少是相同的想法.)

(The unicode case is more or less the same idea.)

因此,在具有字符串或Unicode数据类型的数组中,如果字符串仅包含空格字符,则字符串仅为False:

So in arrays with a string or unicode datatype, a string is only False if it contains only whitespace characters:

>>> bool(np.array([' ']))
False

对于问题中的数组c,确实有一个空字符\0填充了看似空的字符串:

In the case of array c in the question, there is a really a null character \0 padding the seemingly-empty string:

>>> np.array(['']) == np.array(['\0'])
array([ True], dtype=bool)

STRING_nonzero函数会看到此非空白字符,因此bool(c)True.

The STRING_nonzero function sees this non-whitespace character and so bool(c) is True.

正如该答案开头所指出的,这与Python对包含单个空字符的字符串的评估是一致的:bool('\0')也是True.

As noted at the start of this answer, this is consistent with Python's evaluation of strings containing a single null character: bool('\0') is also True.

更新: Wim通过制作字符串修复了NumPy master分支中上文详述的行为.仅包含空字符,或仅包含空格和空字符的组合,其结果为False.这意味着NumPy 1.10+将看到bool(np.array(['']))False,这与Python对空"字符串的处理更加一致.

Update: Wim has fixed the behaviour detailed above in NumPy's master branch by making strings which contain only null characters, or a mix of only whitespace and null characters, evaluate to False. This means that NumPy 1.10+ will see that bool(np.array([''])) is False, which is much more in line with Python's treatment of "empty" strings.

这篇关于具有一个falsey元素的numpy数组的真值似乎取决于dtype的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆