替换numpy数组中的值时,防止字符串被截断 [英] Prevent strings being truncated when replacing values in a numpy array
问题描述
让我说我有数组a
和b
a = np.array([1,2,3])
b = np.array(['red','red','red'])
如果我要对这些数组应用类似的花式索引
If I were to apply some fancy indexing like this to these arrays
b[a<3]="blue"
我得到的输出是
array(['blu', 'blu', 'red'], dtype='<U3')
我知道问题在于,因为numpy最初只为3个字符分配空间,因此它不能将整个单词blue都适合数组,我可以使用什么解决方法?
I understand that the issue is because of numpy initially allocating space only for 3 characters at first hence it cant fit the whole word blue into the array, what work around can I use?
我现在正在做
b = np.array([" "*100 for i in range(3)])
b[a>2] = "red"
b[a<3] = "blue"
但这只是一种解决方法,这是我代码中的错误吗?还是numpy有问题,我该如何解决?
but it's just a work around, is this a fault in my code? Or is it some issue with numpy, how can I fix this?
推荐答案
您可以通过将b
的dtype
设置为"object"
来处理可变长度的字符串:
You can handle variable length strings by setting the dtype
of b
to be "object"
:
import numpy as np
a = np.array([1,2,3])
b = np.array(['red','red','red'], dtype="object")
b[a<3] = "blue"
print(b)
此输出:
['blue' 'blue' 'red']
此dtype
将处理字符串或其他常规Python对象.这也必然意味着在幕后您将拥有一个numpy
指针数组,因此不要指望使用原始数据类型时获得的性能.
This dtype
will handle strings, or other general Python objects. This also necessarily means that under the hood you'll have a numpy
array of pointers, so don't expect the performance you get when using a primitive datatype.
这篇关于替换numpy数组中的值时,防止字符串被截断的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!