为什么将PyTorch中的嵌入实现为稀疏层? [英] Why are Embeddings in PyTorch implemented as Sparse Layers?
问题描述
嵌入 PyTorch中的图层在稀疏图层"下列出,限制:
Embedding Layers in PyTorch are listed under "Sparse Layers" with the limitation:
请记住,只有有限数量的优化程序支持稀疏渐变:目前为optim.SGD(cuda和cpu)和optim.Adagrad(cpu)
Keep in mind that only a limited number of optimizers support sparse gradients: currently it’s optim.SGD (cuda and cpu), and optim.Adagrad (cpu)
这是什么原因?例如,在Keras中,我可以使用任何优化程序来训练具有嵌入层的体系结构.
What is the reason for this? For example in Keras I can train an architecture with an Embedding Layer using any optimizer.
推荐答案
在仔细检查时,嵌入的稀疏渐变是可选的,可以使用 sparse
参数打开或关闭:
Upon closer inspection sparse gradients on Embeddings are optional and can be turned on or off with the sparse
parameter:
torch.nn.Embedding(num_embeddings,embedding_dim,padding_idx = None,max_norm = None,norm_type = 2,scale_grad_by_freq = False, sparse = False )
位置:
稀疏(布尔值,可选)–如果为True,则不带渐变.权重矩阵将是一个稀疏的张量.有关稀疏的更多详细信息,请参见注释渐变.
sparse (boolean, optional) – if True, gradient w.r.t. weight matrix will be a sparse tensor. See Notes for more details regarding sparse gradients.
提到的注释"是我在有关有限稀疏梯度支持有限数量的优化器的问题中引用的内容.
And the "Notes" mentioned are what I quoted in the question about a limited number of optimizers being supported for sparse gradients.
更新:
在稀疏梯度上实现某些优化方法在理论上是可行的,但在技术上却很困难.PyTorch存储库中有一个公开问题,以添加对所有优化程序的支持.
It is theoretically possible but technically difficult to implement some optimization methods on sparse gradients. There is an open issue in the PyTorch repo to add support for all optimizers.
关于原始问题,我认为可以将嵌入视为稀疏对象,因为可以直接对输入索引进行操作,而不用将它们转换为单热编码以输入到密集层中. @Maxim对我的
Regarding the original question, I believe Embeddings can be treated as sparse because it is possible to operate on the input indices directly rather than converting them to one-hot encodings for input into a dense layer. This is explained in @Maxim's answer to my related question.
这篇关于为什么将PyTorch中的嵌入实现为稀疏层?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!