字符串数组放入numpy.amax [英] Arrays of strings into numpy.amax

查看:195
本文介绍了字符串数组放入numpy.amax的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Python的标准 max 函数中(我也可以传入关键参数):

In the Python's standard max function (I also can pass in a key parameter):

s = numpy.array(['one','two','three'])
max(s) # 'two' (lexicographically last)
max(s, key=len) # 'three' (longest string)

对于较大的(多维)数组,我不能再使用max,因此我尝试使用

With a larger (multi-dimensional) array, I can not longer use max, so I tried to use numpy.amax, however I can't seem to be able to use amax with strings...

t = np.array([['one','two','three'],['four','five','six']])
t.dtype # dtype('|S5')
numpy.amax(t, axis=0) #Error! Hoping for: [`two`, `six`]

Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 1833, in amax
        return amax(axis, out)
TypeError: cannot perform reduce with flexible type

是否可以使用amax(我使用不正确!),还是可以使用其他numpy工具来做到这一点?

Is it possible to use amax (am using it incorrectly!), or is there some other numpy tool to do this?

推荐答案

您可以将字符串作为可变长度数据存储在numpy数组中,而不是将它们存储为Python object s. Numpy会将这些视为对原始Python字符串对象的引用,然后可以像预期的那样对待它们:

Instead of storing your strings as variable length data in the numpy array, you could try storing them as Python objects instead. Numpy will treat these as references to the original Python string objects, and you can then treat them like you might expect:

t = np.array([['one','two','three'],['four','five','six']], dtype=object)
np.min(t)
# gives 'five'
np.max(t)
# gives 'two'

请记住,这里np.minnp.max调用按字典顺序对字符串进行排序-因此,二"确实在五"之后.要更改比较运算符以查看每个字符串的长度,您可以尝试创建一个新的numpy数组,该数组的形式相同,但包含每个字符串的长度而不是其引用.然后,您可以对该数组进行numpy.argmin调用(返回最小值的索引),并在原始数组中查找字符串的值.

Keep in mind that here, the np.min and np.max calls are ordering the strings lexicographically - so "two" does indeed come after "five". To change the comparison operator to look at the length of each string, you could try creating a new numpy array identical in form, but containing each string's length instead of its reference. You could then do a numpy.argmin call on that array (which returns the index of the minimum) and look up the value of the string in the original array.

示例代码:

# Vectorize takes a Python function and converts it into a Numpy
# vector function that operates on arrays
np_len = np.vectorize(lambda x: len(x))

np_len(t)
# gives array([[3, 3, 5], [4, 4, 3]])

idx = np_len(t).argmin(0) # get the index along the 0th axis
# gives array([0, 0, 1])

result = t
for i in idx[1:]:
    result = result[i]
print result
# gives "two", the string with the smallest length

这篇关于字符串数组放入numpy.amax的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆