为什么numpy的默默将我int数组为字符串调用searchsorted什么时候? [英] Why does numpy silently convert my int array to strings when calling searchsorted?
问题描述
我发现我的code一个讨厌的错误,我忘了从 STR
整数转换 INT
在整数的有序数组中查找之前。已经修好了,我还是惊讶,这并没有引起一个明确的例外。
I found a nasty bug in my code where I forgot to convert an integer from str
to int
before looking it up in a sorted array of integers. Having fixed it, I am still surprised that this didn't cause an explicit exception.
下面是一个演示:
In [1]: import numpy as np
In [2]: a = np.arange(1000, dtype=int)
In [3]: a.searchsorted('15')
Out[3]: 150
In [4]: a.searchsorted('150')
Out[4]: 150
In [5]: a.searchsorted('1500')
Out[5]: 151
In [6]: a.searchsorted('foo')
Out[6]: 1000
随着浮动
数组这不起作用,养类型错误:无法从DTYPE投阵列数据('float64')到D型( '< U32'。根据规则),安全
我的主要问题是:为什么这不会引起异常的整数数组
这是特别令人惊讶的,因为你可以两者都做 np.arange(1000,DTYPE = INT).astype(STR)
和 np.arange (1000,DTYPE = np.float64).astype(STR,铸造='安全')
。
This is especially surprising since you can do both np.arange(1000, dtype=int).astype(str)
and np.arange(1000, dtype=np.float64).astype(str, casting='safe')
.
侧的问题:
- 为什么它转换整个数组,而不是争论?
- 为什么转换为
搜索字符串'< U32?
- why is it converting the whole array and not the argument?
- why is the search string converted to
'<U32'
?
推荐答案
此行为是因为 searchsorted
要求针和干草堆具有相同的DTYPE。这是使用 np.promote_types
,它具有(也许是不幸)的行为来实现的:
This behavior happens because searchsorted
requires the needle and haystack to have the same dtype. This is achieved using np.promote_types
, which has the (perhaps unfortunate) behavior:
>>> np.promote_types(int, str)
dtype('S11')
这意味着要获得匹配dtypes一个整数草垛和串针,唯一有效的改造就是草垛转换为字符串类型。
This means that to get matching dtypes for an integer haystack and a string needle, the only valid transformation is to convert the haystack to a string type.
一旦我们有一个共同的DTYPE,我们是否有可能与 np.can_cast
使用。这就解释了为什么花车不会变成字符串,但整数是:
Once we have a common dtype, we check if it's possible to use with np.can_cast
. This explains why floats aren't turned into strings, but ints are:
In [1]: np.can_cast(np.float, np.promote_types(np.float, str))
Out[1]: False
In [2]: np.can_cast(np.int, np.promote_types(np.int, str))
Out[2]: True
总结一下,奇怪的行为是促销规则的组合,其中数字+字符串=> string和铸造规则其中int => string是允许的。
So to summarize, the strange behavior is a combination of promotion rules where numeric + string => string, and casting rules where int => string is allowable.
这篇关于为什么numpy的默默将我int数组为字符串调用searchsorted什么时候?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!