字符串数组放入numpy.amax [英] Arrays of strings into numpy.amax
问题描述
在Python的标准 max
函数中(我也可以传入关键参数):
In the Python's standard max
function (I also can pass in a key parameter):
s = numpy.array(['one','two','three'])
max(s) # 'two' (lexicographically last)
max(s, key=len) # 'three' (longest string)
With a larger (multi-dimensional) array, I can not longer use max
, so I tried to use numpy.amax
, however I can't seem to be able to use amax
with strings...
t = np.array([['one','two','three'],['four','five','six']])
t.dtype # dtype('|S5')
numpy.amax(t, axis=0) #Error! Hoping for: [`two`, `six`]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 1833, in amax
return amax(axis, out)
TypeError: cannot perform reduce with flexible type
是否可以使用amax
(我使用不正确!),还是可以使用其他numpy
工具来做到这一点?
Is it possible to use amax
(am using it incorrectly!), or is there some other numpy
tool to do this?
推荐答案
您可以将字符串作为可变长度数据存储在numpy
数组中,而不是将它们存储为Python object
s. Numpy会将这些视为对原始Python字符串对象的引用,然后可以像预期的那样对待它们:
Instead of storing your strings as variable length data in the numpy
array, you could try storing them as Python object
s instead. Numpy will treat these as references to the original Python string objects, and you can then treat them like you might expect:
t = np.array([['one','two','three'],['four','five','six']], dtype=object)
np.min(t)
# gives 'five'
np.max(t)
# gives 'two'
请记住,这里np.min
和np.max
调用按字典顺序对字符串进行排序-因此,二"确实在五"之后.要更改比较运算符以查看每个字符串的长度,您可以尝试创建一个新的numpy
数组,该数组的形式相同,但包含每个字符串的长度而不是其引用.然后,您可以对该数组进行numpy.argmin
调用(返回最小值的索引),并在原始数组中查找字符串的值.
Keep in mind that here, the np.min
and np.max
calls are ordering the strings lexicographically - so "two" does indeed come after "five". To change the comparison operator to look at the length of each string, you could try creating a new numpy
array identical in form, but containing each string's length instead of its reference. You could then do a numpy.argmin
call on that array (which returns the index of the minimum) and look up the value of the string in the original array.
示例代码:
# Vectorize takes a Python function and converts it into a Numpy
# vector function that operates on arrays
np_len = np.vectorize(lambda x: len(x))
np_len(t)
# gives array([[3, 3, 5], [4, 4, 3]])
idx = np_len(t).argmin(0) # get the index along the 0th axis
# gives array([0, 0, 1])
result = t
for i in idx[1:]:
result = result[i]
print result
# gives "two", the string with the smallest length
这篇关于字符串数组放入numpy.amax的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!