通过torch.topk派生渐变 [英] Derive the gradient through torch.topk

查看：22 发布时间：2022/4/12 14:17:48 math pytorch gradient

本文介绍了通过torch.topk派生渐变的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我要通过torch.topk函数派生渐变。

假设输入是一个向量，

然后通过参数矩阵对其进行变换，

，并选择向量的前k个值。

结果向量通过元素乘法进一步转换。

最终损失计算方式为

。

我想知道，损失相对于W是可微的吗？形式上，我们可以计算以下梯度吗？

推荐答案
topk()运算只是拾取张量的前k元素的线性变换。由于这是一种W @ X或矩阵-向量乘法运算，因此这也是可微的。

示例：下面我以两种方式计算了流水线操作topk(Wx)，并显示了两者产生的渐变是相同的。

In [1]: import torch In [2]: x1 = torch.rand(6, requires_grad = True) In [3]: W1 = torch.rand(6, 6, requires_grad = True) In [4]: x1 Out[4]: tensor([0.1511, 0.5990, 0.6338, 0.5137, 0.5203, 0.0560], requires_grad=True) In [5]: W1 Out[5]: tensor([[0.2541, 0.6699, 0.5311, 0.7801, 0.5042, 0.5475], [0.7523, 0.1331, 0.7670, 0.8132, 0.0524, 0.0269], [0.3974, 0.2880, 0.9142, 0.9906, 0.4401, 0.3984], [0.7956, 0.2071, 0.2209, 0.6192, 0.2054, 0.7693], [0.8587, 0.8415, 0.6033, 0.3812, 0.2498, 0.9813], [0.9033, 0.0417, 0.2272, 0.1576, 0.9087, 0.3284]], requires_grad=True) In [6]: y1 = W1 @ x1 In [7]: y1 Out[7]: tensor([1.4699, 1.1260, 1.5721, 0.8523, 1.3969, 0.8776], grad_fn=<MvBackward>) In [8]: yk, _ = torch.topk(y1, 3) In [9]: yk Out[9]: tensor([1.5721, 1.4699, 1.3969], grad_fn=<TopkBackward>) In [10]: loss1 = (yk ** 2).sum() In [11]: loss1.backward() In [12]: W1.grad Out[12]: tensor([[0.4442, 1.7609, 1.8633, 1.5102, 1.5296, 0.1646], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [0.4751, 1.8833, 1.9928, 1.6152, 1.6359, 0.1760], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [0.4222, 1.6734, 1.7706, 1.4352, 1.4535, 0.1564], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]])
现在让我们评估相同的操作集，但显式使用topk()作为线性变换。请注意，构造的Wk矩阵通过乘法从6个元素张量中有选择地挑选出前k个(这里是3个)元素。

In [13]: x2 = torch.tensor([0.1511, 0.5990, 0.6338, 0.5137, 0.5203, 0.0560], req ...: uires_grad=True) In [14]: W2 = torch.tensor([[0.2541, 0.6699, 0.5311, 0.7801, 0.5042, 0.5475], ...: [0.7523, 0.1331, 0.7670, 0.8132, 0.0524, 0.0269], ...: [0.3974, 0.2880, 0.9142, 0.9906, 0.4401, 0.3984], ...: [0.7956, 0.2071, 0.2209, 0.6192, 0.2054, 0.7693], ...: [0.8587, 0.8415, 0.6033, 0.3812, 0.2498, 0.9813], ...: [0.9033, 0.0417, 0.2272, 0.1576, 0.9087, 0.3284]], requires_gra ...: d=True) In [15]: y2 = W2 @ x2 In [16]: y2 Out[16]: tensor([1.4700, 1.1260, 1.5721, 0.8523, 1.3969, 0.8776], grad_fn=<MvBackward>) # Use the indices obtained earlier to construct the matrix In [19]: _ Out[19]: tensor([2, 0, 4]) In [20]: k = 3 In [21]: Wk = torch.zeros(k, y2.shape[0]) In [22]: Wk[torch.arange(k), _] = 1 In [23]: Wk.requires_grad = True In [24]: Wk Out[24]: tensor([[0., 0., 1., 0., 0., 0.], [1., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 1., 0.]], requires_grad=True) In [25]: yk2 = Wk @ y2 In [26]: yk2 Out[26]: tensor([1.5721, 1.4700, 1.3969], grad_fn=<MvBackward>) In [27]: loss2 = (yk2 ** 2).sum() In [28]: loss2.backward()

现在比较两种情况下获得的梯度：

In [29]: W2.grad Out[29]: tensor([[0.4442, 1.7611, 1.8634, 1.5103, 1.5297, 0.1646], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [0.4751, 1.8834, 1.9929, 1.6152, 1.6360, 0.1761], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [0.4222, 1.6735, 1.7707, 1.4352, 1.4536, 0.1565], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]]) In [30]: W1.grad Out[30]: tensor([[0.4442, 1.7609, 1.8633, 1.5102, 1.5296, 0.1646], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [0.4751, 1.8833, 1.9928, 1.6152, 1.6359, 0.1760], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [0.4222, 1.6734, 1.7706, 1.4352, 1.4535, 0.1564], [0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]]) In [31]: x1.grad Out[31]: tensor([4.3955, 5.2256, 6.1213, 6.4732, 3.5637, 5.6037]) In [32]: x2.grad Out[32]: tensor([4.3957, 5.2261, 6.1215, 6.4733, 3.5641, 5.6040])

如您所见，结果与我复制x1和W1的值时引入的某些浮点错误完全相同。

这篇关于通过torch.topk派生渐变的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

通过torch.topk派生渐变 [英] Derive the gradient through torch.topk

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

通过torch.topk派生渐变 [英] Derive the gradient through torch.topk

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭