为什么 PyTorch nn.Module.cuda() 不移动模块张量而只移动参数和缓冲区到 GPU? [英] Why PyTorch nn.Module.cuda() not moving Module tensor but only parameters and buffers to GPU?

查看:60
本文介绍了为什么 PyTorch nn.Module.cuda() 不移动模块张量而只移动参数和缓冲区到 GPU?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

nn.Module.cuda() 将所有模型参数和缓冲区移动到 GPU.

但为什么不是模型成员张量?

class ToyModule(torch.nn.Module):def __init__(self) ->没有任何:super(ToyModule, self).__init__()self.layer = torch.nn.Linear(2, 2)self.expected_moved_cuda_tensor = torch.tensor([0, 2, 3])def forward(self, input: torch.Tensor) ->火炬.张量:返回 self.layer(input)toy_module = ToyModule()toy_module.cuda()

next(toy_module.layer.parameters()).device>>>设备(类型 ='cuda',索引 = 0)

对于模型成员张量,设备保持不变.

<预><代码>>>>toy_module.expected_moved_cuda_tensor.device设备(类型='cpu')

解决方案

如果你在模块内定义了一个张量,它需要被注册为参数或缓冲区,以便模块知道它.

<小时>

Parameters 是要训练的张量,由 model.parameters() 返回.它们很容易注册,您需要做的就是将张量包装在 nn.Parameter 类型中,它将被自动注册.请注意,只有浮点张量可以作为参数.

class ToyModule(torch.nn.Module):def __init__(self) ->没有任何:super(ToyModule, self).__init__()self.layer = torch.nn.Linear(2, 2)# 将 expected_moved_cuda_tensor 注册为可训练参数self.expected_moved_cuda_tensor = torch.nn.Parameter(torch.tensor([0., 2., 3.]))def forward(self, input: torch.Tensor) ->火炬.张量:返回 self.layer(input)

<小时>

Buffers 是将在模块中注册的张量,因此像 .cuda() 这样的方法会影响它们,但它们不会返回通过 model.parameters().缓冲区不限于特定的数据类型.

class ToyModule(torch.nn.Module):def __init__(self) ->没有任何:super(ToyModule, self).__init__()self.layer = torch.nn.Linear(2, 2)# 注册 expected_moved_cuda_tensor 作为缓冲区# 注意:这会创建一个名为 expected_moved_cuda_tensor 的新成员变量self.register_buffer('expected_moved_cuda_tensor', torch.tensor([0, 2, 3])))def forward(self, input: torch.Tensor) ->火炬.张量:返回 self.layer(input)

<小时>

在上述两种情况下,以下代码的行为相同

<预><代码>>>>toy_module = ToyModule()>>>toy_module.cuda()>>>下一个(toy_module.layer.parameters()).device设备(类型 ='cuda',索引 = 0)>>>toy_module.expected_moved_cuda_tensor.device设备(类型 ='cuda',索引 = 0)

nn.Module.cuda() moves all model parameters and buffers to the GPU.

But why not the model member tensor?

class ToyModule(torch.nn.Module):
    def __init__(self) -> None:
        super(ToyModule, self).__init__()
        self.layer = torch.nn.Linear(2, 2)
        self.expected_moved_cuda_tensor = torch.tensor([0, 2, 3])

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        return self.layer(input)

toy_module = ToyModule()
toy_module.cuda()

next(toy_module.layer.parameters()).device
>>> device(type='cuda', index=0)

for the model member tensor, the device stays unchanged.

>>> toy_module.expected_moved_cuda_tensor.device
device(type='cpu')

解决方案

If you define a tensor inside the module it needs to be registered as either a parameter or a buffer so that the module is aware of it.


Parameters are tensors that are to be trained and will be returned by model.parameters(). They are easy to register, all you need to do is wrap the tensor in the nn.Parameter type and it will be automatically registered. Note that only floating point tensors can be parameters.

class ToyModule(torch.nn.Module):
    def __init__(self) -> None:
        super(ToyModule, self).__init__()
        self.layer = torch.nn.Linear(2, 2)
        # registering expected_moved_cuda_tensor as a trainable parameter
        self.expected_moved_cuda_tensor = torch.nn.Parameter(torch.tensor([0., 2., 3.]))

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        return self.layer(input)


Buffers are tensors that will be registered in the module so methods like .cuda() will affect them but they will not be returned by model.parameters(). Buffers are not restricted to a particular data type.

class ToyModule(torch.nn.Module):
    def __init__(self) -> None:
        super(ToyModule, self).__init__()
        self.layer = torch.nn.Linear(2, 2)
        # registering expected_moved_cuda_tensor as a buffer
        # Note: this creates a new member variable named expected_moved_cuda_tensor
        self.register_buffer('expected_moved_cuda_tensor', torch.tensor([0, 2, 3])))

    def forward(self, input: torch.Tensor) -> torch.Tensor:
        return self.layer(input)


In both of the above cases the following code behaves the same

>>> toy_module = ToyModule()
>>> toy_module.cuda()
>>> next(toy_module.layer.parameters()).device
device(type='cuda', index=0)
>>> toy_module.expected_moved_cuda_tensor.device
device(type='cuda', index=0)

这篇关于为什么 PyTorch nn.Module.cuda() 不移动模块张量而只移动参数和缓冲区到 GPU?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆