为什么 PyTorch 中的嵌入实现为稀疏层? [英] Why are Embeddings in PyTorch implemented as Sparse Layers?

查看:26
本文介绍了为什么 PyTorch 中的嵌入实现为稀疏层?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

嵌入 PyTorch 中的层列在稀疏层"下,并带有限制:

Embedding Layers in PyTorch are listed under "Sparse Layers" with the limitation:

请记住,只有有限数量的优化器支持稀疏梯度:目前是 optim.SGD(cuda 和 cpu)和 optim.Adagrad(cpu)

Keep in mind that only a limited number of optimizers support sparse gradients: currently it’s optim.SGD (cuda and cpu), and optim.Adagrad (cpu)

这是什么原因?例如,在 Keras 中,我可以使用任何优化器训练带有嵌入层的架构.

What is the reason for this? For example in Keras I can train an architecture with an Embedding Layer using any optimizer.

推荐答案

仔细检查后,嵌入的稀疏梯度是可选的,可以使用 sparse 参数打开或关闭:

Upon closer inspection sparse gradients on Embeddings are optional and can be turned on or off with the sparse parameter:

class torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False)

class torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, sparse=False)

地点:

sparse (boolean, optional) – 如果为真,梯度 w.r.t.权重矩阵将是一个稀疏张量.有关稀疏的更多详细信息,请参阅注释渐变.

sparse (boolean, optional) – if True, gradient w.r.t. weight matrix will be a sparse tensor. See Notes for more details regarding sparse gradients.

提到的注释"是我在关于支持稀疏梯度的优化器数量有限的问题中引用的内容.

And the "Notes" mentioned are what I quoted in the question about a limited number of optimizers being supported for sparse gradients.

更新:

在稀疏梯度上实现一些优化方法在理论上是可行的,但技术上很难.PyTorch 存储库中有一个未决问题,用于添加对所有优化器的支持.

It is theoretically possible but technically difficult to implement some optimization methods on sparse gradients. There is an open issue in the PyTorch repo to add support for all optimizers.

关于最初的问题,我相信嵌入可以被视为稀疏的,因为可以直接对输入索引进行操作,而不是将它们转换为单热编码以输入到密集层.这在 @Maxim's 对我的 相关问题.

Regarding the original question, I believe Embeddings can be treated as sparse because it is possible to operate on the input indices directly rather than converting them to one-hot encodings for input into a dense layer. This is explained in @Maxim's answer to my related question.

这篇关于为什么 PyTorch 中的嵌入实现为稀疏层?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆