通过索引/一键编码生成序列 [英] generate sequence by indices / one-hot encoding

查看:63
本文介绍了通过索引/一键编码生成序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个序列s = [4,3,1,0,5]num_classes = 6,我想生成一个形状为(len(s), num_classes)的numpy矩阵m,其中m[i,j] = 1 if s[i] == j else 0.

I have a sequence s = [4,3,1,0,5] and num_classes = 6 and I want to generate a Numpy matrix m of shape (len(s), num_classes) where m[i,j] = 1 if s[i] == j else 0.

在Numpy中是否有这样的功能,我可以在其中传递snum_classes?

Is there such a function in Numpy, where I can pass s and num_classes?

这也称为k的1或单次热编码.

This is also called 1-of-k or one-hot encoding.

timeit结果:

def b():
     m = np.zeros((len(s), num_classes))
     m[np.arange(len(s)), s] = 1
     return m

In [57]: timeit.timeit(lambda: b(), number=1000)
Out[57]: 0.012787103652954102

In [61]: timeit.timeit(lambda: (np.array(s)[:,None]==np.arange(num_classes))+0, number=1000)
Out[61]: 0.018411874771118164

推荐答案

由于每行只需要一个1,因此可以沿第一个轴使用arange(len(s))并沿第二个轴使用s进行花式索引:

Since you want a single 1 per row, you can fancy-index using arange(len(s)) along the first axis, and using s along the second:

s = [4,3,1,0,5]
n = len(s)
k = 6
m = np.zeros((n, k))
m[np.arange(n), s] = 1
m
=> 
array([[ 0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.]])

m.nonzero()
=> (array([0, 1, 2, 3, 4]), array([4, 3, 1, 0, 5]))

这可以被认为是使用索引(0,4),然后是(1,3),然后是(2,1),(3,0),(4,5).

This can be thought of as using index (0,4), then (1,3), then (2,1), (3,0), (4,5).

这篇关于通过索引/一键编码生成序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆