为什么PyTorch模型在模型中需要多个图像大小? [英] Why PyTorch model takes multiple image size inside the model?

查看:76
本文介绍了为什么PyTorch模型在模型中需要多个图像大小?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在PyTorch中使用了简单的对象检测模型,并在推理中使用了Pytoch模型.

I am using a simple object detection model in PyTorch and using a Pytoch Model for Inferencing.

当我在代码上使用简单的迭代器

When I am using a simple iterator over the code

for k, image_path in enumerate(image_list):
    image = imgproc.loadImage(image_path)
    print(image.shape)
    with torch.no_grad():
        y, feature = net(x)        
    result = image.cuda()

它会打印我们可变尺寸的图像,例如

It prints our variable sized images such as

torch.Size([1, 3, 384, 320])

torch.Size([1, 3, 704, 1024])

torch.Size([1, 3, 1280, 1280])

因此,当我通过应用相同转换的DataLoader使用批处理推断时,代码未运行.但是,当我将所有图像的大小调整为600.600时,批处理成功运行.

So When I am using Batch Inferencing using a DataLoader applying the same transformation the code is not running. However, when I am resizing all the images as 600.600 the batch processing runs successfully.

我有两个疑问,

首先,为什么Pytorch能够在深度学习模型中输入动态大小的输入,以及为什么动态大小输入在批处理中失败.

First why Pytorch is capable of inputting dynamically sized inputs in Deep Learning Model and Why dynamic sized input is failing in Batch Processing.

推荐答案

PyTorch具有称为其他说明).

PyTorch has what is called a Dynamic Computational Graph (other explanation).

在训练或推理期间,它允许神经网络的图动态适应其输入大小,从一个输入到另一个输入.这是您在第一个示例中观察到的结果:为模型提供大小为 [1、3、384、320] 的张量的图像,然后再为大小为 [1,3,384,1024] 等完全正确,因为对于每个输入,您的模型都会动态适应.

It allows the graph of the neural network to dynamically adapt to its input size, from one input to the next, during training or inference. This is what you observe in your first example: providing an image as a Tensor of size [1, 3, 384, 320] to your model, then another one as a Tensor of size [1, 3, 384, 1024], and so forth, is completely fine, as, for each input, your model will dynamically adapt.

但是,如果您输入的内容实际上是一组输入(一批),那就是另一回事了.对于PyTorch,一批将转换为具有一个额外维度的单个Tensor输入.例如,如果您提供n个图像的列表,每个图像的大小分别为 [1、3、384、320] ,PyTorch会将它们堆叠起来,以便您的模型只有一个Tensor输入,形状 [n,1,3,384,320] .

However, if your input is a actually a collection of inputs (a batch), it is another story. A batch, for PyTorch, will be transformed to a single Tensor input with one extra dimension. For example, if you provide a list of n images, each of the size [1, 3, 384, 320], PyTorch will stack them, so that your model has a single Tensor input, of the shape [n, 1, 3, 384, 320].

此堆叠"是指只能在相同形状的图像之间发生.为了提供更直观"的解决方案解释比以前的答案,该堆叠操作不能在不同形状的图像之间完成,因为网络不能猜测"图像.不同的图像应该如何对齐"?如果它们的大小不尽相同,则可以彼此批处理.

This "stacking" can only happen between images of the same shape. To provide a more "intuitive" explanation than previous answers, this stacking operation cannot be done between images of different shapes, because the network cannot "guess" how the different images should "align" with one another in a batch, if they are not all the same size.

无论是在培训还是测试过程中发生,如果您使用不同大小的图像创建批次,PyTorch都会拒绝您的输入.

通常使用几种解决方案:像您一样重塑形状,添加填充(通常在图像边框上添加小数值或空值)以将较小的图像扩展到最大图像的大小,依此类推.

Several solutions are usually in use: reshaping as you did, adding padding (often small or null values on the border of your images) to extend your smaller images to the size of the biggest one, and so forth.

这篇关于为什么PyTorch模型在模型中需要多个图像大小?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆