将NumPy数组矢量化重新标记为连续数字并取回 [英] Vectorized relabeling of NumPy array to consecutive numbers and retrieving back
问题描述
我有一个包含4个课程的庞大训练数据集.这些类被非连续地标记.为了能够应用顺序神经网络,必须对类别进行重新标记,以使类别中的唯一值是连续的.此外,在脚本结尾处,我必须将它们重新标记为它们的旧值.
I have a huge training dataset with 4 classes. These classes are labeled non-consecutively. To be able to apply a sequential neural network the classes have to be relabeled so that the unique values in the classes are consecutive. In addition, at the end of the script I have to relabel them back to their old values.
我知道如何用循环来重新标记它们:
I know how to relabel them with loops:
def relabel(old_classes, new_classes):
indexes=[np.where(old_classes ==np.unique(old_classes)[i]) for i in range(len(new_classes))]
for i in range(len(new_classes )):
old_classes [indexes[i]]=new_classes[i]
return old_classes
>>> old_classes = np.array([0,1,2,6,6,2,6,1,1,0])
>>> new_classes = np.arange(len(np.unique(old_classes)))
>>> relabel(old_classes,new_classes)
array([0, 1, 2, 3, 3, 2, 3, 1, 1, 0])
但这不是很好的编码,并且需要很多时间.
But this isn't nice coding and it takes quite a lot of time.
有什么想法可以对重新标记进行矢量化处理吗?
Any idea how to vectorize this relabeling?
需要明确的是,我还希望能够将它们重新标记回原来的值:
To be clear, I also want to be able to relabel them back to their old values:
>>> relabeled_classes=np.array([0, 1, 2, 3, 3, 2, 3, 1, 1, 0])
>>> old_classes = np.array([0,1,2,6])
>>> relabel(relabeled_classes,old_classes )
array([0,1,2,6,6,2,6,1,1,0])
推荐答案
We can use the optional argument return_inverse
with np.unique
to get those unique sequential IDs/tags, like so -
unq_arr, unq_tags = np.unique(old_classes,return_inverse=1)
使用unq_tags
索引到unq_arr
中以进行检索-
Index into unq_arr
with unq_tags
to retrieve back -
old_classes_retrieved = unq_arr[unq_tags]
样品运行-
In [69]: old_classes = np.array([0,1,2,6,6,2,6,1,1,0])
In [70]: unq_arr, unq_tags = np.unique(old_classes,return_inverse=1)
In [71]: unq_arr
Out[71]: array([0, 1, 2, 6])
In [72]: unq_tags
Out[72]: array([0, 1, 2, 3, 3, 2, 3, 1, 1, 0])
In [73]: old_classes_retrieved = unq_arr[unq_tags]
In [74]: old_classes_retrieved
Out[74]: array([0, 1, 2, 6, 6, 2, 6, 1, 1, 0])
这篇关于将NumPy数组矢量化重新标记为连续数字并取回的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!