为什么 PyTorch 模型在模型内部采用多个图像尺寸? [英] Why PyTorch model takes multiple image size inside the model?

查看:12
本文介绍了为什么 PyTorch 模型在模型内部采用多个图像尺寸?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 PyTorch 中使用一个简单的对象检测模型并使用 Pytoch 模型进行推理.

I am using a simple object detection model in PyTorch and using a Pytoch Model for Inferencing.

当我在代码上使用简单的迭代器时

When I am using a simple iterator over the code

for k, image_path in enumerate(image_list):
    image = imgproc.loadImage(image_path)
    print(image.shape)
    with torch.no_grad():
        y, feature = net(x)        
    result = image.cuda()

它打印我们可变大小的图像,例如

It prints our variable sized images such as

torch.Size([1, 3, 384, 320])

torch.Size([1, 3, 704, 1024])

torch.Size([1, 3, 1280, 1280])

因此,当我使用应用相同转换的 DataLoader 使用批量推理时,代码未运行.但是,当我将所有图像的大小调整为 600.600 时,批处理运行成功.

So When I am using Batch Inferencing using a DataLoader applying the same transformation the code is not running. However, when I am resizing all the images as 600.600 the batch processing runs successfully.

我有两个疑问,

首先为什么 Pytorch 能够在深度学习模型中输入动态大小的输入以及为什么动态大小的输入在批处理中失败.

First why Pytorch is capable of inputting dynamically sized inputs in Deep Learning Model and Why dynamic sized input is failing in Batch Processing.

推荐答案

PyTorch 具有所谓的 动态计算图 (其他解释).

PyTorch has what is called a Dynamic Computational Graph (other explanation).

它允许神经网络的图形在训练或推理期间动态适应其输入大小,从一个输入到下一个输入.这就是您在第一个示例中观察到的情况:将图像作为大小为 [1, 3, 384, 320] 的张量提供给您的模型,然后将另一个图像作为大小为 [1, 3, 384, 1024] 等完全没问题,因为对于每个输入,您的模型都会动态适应.

It allows the graph of the neural network to dynamically adapt to its input size, from one input to the next, during training or inference. This is what you observe in your first example: providing an image as a Tensor of size [1, 3, 384, 320] to your model, then another one as a Tensor of size [1, 3, 384, 1024], and so forth, is completely fine, as, for each input, your model will dynamically adapt.

但是,如果您的输入实际上是输入的集合(一批),那就是另一回事了.PyTorch 的批处理将转换为具有一个额外维度的单个 Tensor 输入.例如,如果您提供一个包含 n 个图像的列表,每个图像的大小为 [1, 3, 384, 320],PyTorch 会将它们堆叠起来,以便您的模型具有单个 Tensor 输入,即形状 [n, 1, 3, 384, 320].

However, if your input is a actually a collection of inputs (a batch), it is another story. A batch, for PyTorch, will be transformed to a single Tensor input with one extra dimension. For example, if you provide a list of n images, each of the size [1, 3, 384, 320], PyTorch will stack them, so that your model has a single Tensor input, of the shape [n, 1, 3, 384, 320].

这个堆叠"只能发生在相同形状的图像之间.提供更直观"的与之前的答案相比,这种堆叠操作无法在不同形状的图像之间进行,因为网络无法猜测"不同的图像应该如何对齐"如果它们的大小不一样,则在一批中彼此放置.

This "stacking" can only happen between images of the same shape. To provide a more "intuitive" explanation than previous answers, this stacking operation cannot be done between images of different shapes, because the network cannot "guess" how the different images should "align" with one another in a batch, if they are not all the same size.

无论是在训练期间还是测试期间,如果您使用不同大小的图像创建批次,PyTorch 都会拒绝您的输入.

通常使用几种解决方案:像您一样重塑形状、添加填充(通常在图像边界上的小值或空值)以将较小的图像扩展到最大图像的大小,等等.

Several solutions are usually in use: reshaping as you did, adding padding (often small or null values on the border of your images) to extend your smaller images to the size of the biggest one, and so forth.

这篇关于为什么 PyTorch 模型在模型内部采用多个图像尺寸?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆